Lecture Materials for Week 13

November 17, 2025

Exam Day

A decorative picture that says "Good Luck"

Study Guide for Final Exam

R Functions for Model Building & Summarizing

Be able to:

  • Define what each function does:
    • lm() - Fits linear models
    • tidy() - Extracts parameter estimates (coefficients, standard errors, p-values)
    • glance() - Provides model-level statistics (R², sigma, F-statistic, etc.)
    • augment() - Adds fitted values and residuals to the original data
    • set_variable_labels() - Assigns descriptive labels to variables
    • tbl_summary() - Creates publication-ready descriptive tables
    • Functions from marginaleffects package: predictions(), plot_slopes()

Study Design Types

Understand the difference between:

  • Descriptive studies: Summarizing patterns, prevalence, or characteristics in a population
  • Predictive studies: Forecasting future outcomes using existing data
  • Causal inference/explanation: Establishing cause-and-effect relationships (typically RCTs or well-designed observational studies)

Core Regression Concepts

Review:

  • Confidence Interval (CI) vs. Prediction Interval (PI):

    • CI: estimates the mean of Y|X (narrower)
    • PI: estimates an individual Y|X (wider, accounts for both mean uncertainty and individual variability)
    • Both are narrowest at the mean of X
  • sigma: The standard deviation of residuals (from glance())

    • sigma is typically smaller than SD(Y) because the model explains some variance
    • Lower sigma indicates better model fit
  • Train/test splits: Provide unbiased estimates of out-of-sample predictive accuracy

  • Standard error: The standard deviation of a sampling distribution of a parameter estimate

  • Adding confidence bands in ggplot: geom_smooth(method="lm", se=TRUE, level=.95)

Null Hypothesis Significance Testing (NHST)

Know:

  • Null hypothesis (H₀): Baseline assumption of no effect/difference

  • Alternative hypothesis (Hₐ): Statement of an effect or difference

  • Alpha (α): Pre-set Type I error rate (e.g., .05)

  • p-value: P(data or more extreme | H₀)

    • If p < α, reject H₀
    • p-value is NOT the probability that H₀ is true
  • Type I error: Rejecting a true H₀ (false positive)

  • Type II error: Failing to reject a false H₀ (false negative, missing a real effect)

  • Two-tailed tests: Used when no directional hypothesis is specified

  • Statistical vs. practical significance: A result can be statistically significant but not meaningful in practice

  • Rejection regions: If test statistic falls in rejection region, p < α

  • Permutation tests: Approximate the null distribution by shuffling/permuting labels

  • Overall F-test: Tests whether the model explains a significant portion of variance in Y

Confidence Intervals from Model Output

Be able to:

  • Calculate a 95% CI using: estimate ± (critical t-value × SE)
  • Use qt() to find critical values for the appropriate degrees of freedom
  • Interpret CIs: If CI excludes the null value (typically 0), reject H₀

Variance Decomposition & R²

Understand:

  • SST (Total Sum of Squares): Total variance in Y
  • SSR (Regression Sum of Squares): Variance explained by the model
  • SSE (Error Sum of Squares): Unexplained variance (residual variance)
  • R² = SSR/SST: Proportion of variance explained
  • Venn diagrams: Visual representation of shared and unique variance
    • In multiple regression, predictors can share variance (overlap) and have unique contributions

Review:

  • R² increases (or stays the same) when adding predictors
  • sigma decreases when model fit improves
  • SSE is minimized when fitted values equal observed values

Bootstrap Hypothesis Testing

Be able to:

  • Compute a two-sided p-value from bootstrap results
  • Interpret the standard deviation of the bootstrap distribution as the standard error
  • Calculate how many SEs the observed statistic is from the null
  • Make decisions based on p-value and α

Interaction Models

Understand:

  • Interaction term: Tests whether the effect of one predictor depends on another predictor
  • Simple slopes: The effect of X₁ on Y at specific values of X₂
  • Centering predictors: Makes interpretation easier (intercept = predicted Y when all predictors are at their mean)
  • Interpreting coefficients:
    • Main effects when interaction is present represent “conditional” effects
    • Interaction coefficient shows how the slope changes

Be able to:

  • Compute simple slopes from model output
  • Determine if an interaction is statistically significant
  • Interpret what an interaction means substantively

Transformed Outcomes

Log-transformed Y:

  • Slope interpretation: A 1-unit increase in X is associated with a (slope × 100)% change in Y
  • Use 100 * (exp(slope) - 1) to convert to percentage change
  • Back-transform: exp(log Y) = Y
  • Intercept: predicted log(Y) when X = 0

Quadratic models (Y ~ X + X²):

  • Model curvilinear relationships
  • Vertex: The x-value where the curve reaches its maximum or minimum
    • Vertex x-coordinate = -b₁/(2 × b₂) where b₁ is the linear term and b₂ is the quadratic term
  • U-shaped: Positive quadratic term
  • Inverted U-shaped: Negative quadratic term
  • Interpret: The effect of X on Y changes across the range of X

Multiple Regression & Confounding

Key concepts:

  • Confounders: Variables that affect both the predictor and outcome, creating spurious associations
  • Adjusted effects: The effect of X on Y, holding other variables constant
  • Change in R²: Additional variance explained by adding predictors
    • ΔR² = R²(full model) - R²(reduced model)
  • Parallel slopes model: Model with multiple predictors but no interaction
    • Lines for different groups are parallel (same slope, different intercepts)
  • Residualized gain: Using baseline as a covariate when analyzing change

In RCTs:

  • Randomization ensures confounders are balanced across groups (in expectation)
  • Treatment effect can be interpreted causally
  • Baseline covariates improve precision but aren’t confounders

Be able to:

  • Compare unadjusted vs. adjusted effects
  • Identify whether a variable is a confounder
  • Interpret whether controlling for a variable changes conclusions

Using qt() and Degrees of Freedom

Know:

  • For simple regression: df = n - 2
  • For multiple regression: df = n - k - 1 (where k = number of predictors)
  • Use qt(c(.025, .975), df) for two-sided 95% CI critical values

Computing Predictions from Models

Be able to:

  • Use regression equation to predict Y for given X values
  • Predicted Y = intercept + (slope₁ × X₁) + (slope₂ × X₂) + …
  • For categorical predictors coded 0/1, the coefficient is the difference between groups

Practice Tips

  • Work through regression outputs: practice interpreting coefficients, SEs, R², and sigma
  • Calculate confidence intervals by hand using estimate ± (critical t × SE)
  • Practice identifying study types (descriptive, predictive, causal)
  • Draw Venn diagrams to understand variance decomposition
  • Interpret interaction models: compute simple slopes and understand what the interaction means
  • Practice transformations: know how to interpret log-transformed outcomes and quadratic terms
  • Work through examples of confounding: compare models with and without potential confounders
  • Use the bootstrap examples to practice computing p-values and making decisions
  • Understand the relationship between CIs, hypothesis tests, and p-values

What Won’t be on the Exam?

  • Bayesian statistics
  • Write R code from scratch
  • Perform complex derivations
  • The exam focuses on interpreting output and applying concepts, not programming

Test Structure

  • 100 questions total
  • Mix of True/False, Multiple Choice, and Matching
  • Bring a pencil and basic calculator
  • You can use four 3×5 index cards with handwritten notes (You’ll receive these in lecture)
  • Focus on understanding concepts and being able to interpret statistical output