Lecture Materials for Week 13
November 17, 2025
Exam Day

Study Guide for Final Exam
R Functions for Model Building & Summarizing
Be able to:
- Define what each function does:
lm()- Fits linear modelstidy()- Extracts parameter estimates (coefficients, standard errors, p-values)glance()- Provides model-level statistics (R², sigma, F-statistic, etc.)augment()- Adds fitted values and residuals to the original dataset_variable_labels()- Assigns descriptive labels to variablestbl_summary()- Creates publication-ready descriptive tables- Functions from
marginaleffectspackage:predictions(),plot_slopes()
Study Design Types
Understand the difference between:
- Descriptive studies: Summarizing patterns, prevalence, or characteristics in a population
- Predictive studies: Forecasting future outcomes using existing data
- Causal inference/explanation: Establishing cause-and-effect relationships (typically RCTs or well-designed observational studies)
Core Regression Concepts
Review:
Confidence Interval (CI) vs. Prediction Interval (PI):
- CI: estimates the mean of Y|X (narrower)
- PI: estimates an individual Y|X (wider, accounts for both mean uncertainty and individual variability)
- Both are narrowest at the mean of X
sigma: The standard deviation of residuals (from
glance())- sigma is typically smaller than SD(Y) because the model explains some variance
- Lower sigma indicates better model fit
Train/test splits: Provide unbiased estimates of out-of-sample predictive accuracy
Standard error: The standard deviation of a sampling distribution of a parameter estimate
Adding confidence bands in ggplot:
geom_smooth(method="lm", se=TRUE, level=.95)
Null Hypothesis Significance Testing (NHST)
Know:
Null hypothesis (H₀): Baseline assumption of no effect/difference
Alternative hypothesis (Hₐ): Statement of an effect or difference
Alpha (α): Pre-set Type I error rate (e.g., .05)
p-value: P(data or more extreme | H₀)
- If p < α, reject H₀
- p-value is NOT the probability that H₀ is true
Type I error: Rejecting a true H₀ (false positive)
Type II error: Failing to reject a false H₀ (false negative, missing a real effect)
Two-tailed tests: Used when no directional hypothesis is specified
Statistical vs. practical significance: A result can be statistically significant but not meaningful in practice
Rejection regions: If test statistic falls in rejection region, p < α
Permutation tests: Approximate the null distribution by shuffling/permuting labels
Overall F-test: Tests whether the model explains a significant portion of variance in Y
Confidence Intervals from Model Output
Be able to:
- Calculate a 95% CI using: estimate ± (critical t-value × SE)
- Use
qt()to find critical values for the appropriate degrees of freedom - Interpret CIs: If CI excludes the null value (typically 0), reject H₀
Variance Decomposition & R²
Understand:
- SST (Total Sum of Squares): Total variance in Y
- SSR (Regression Sum of Squares): Variance explained by the model
- SSE (Error Sum of Squares): Unexplained variance (residual variance)
- R² = SSR/SST: Proportion of variance explained
- Venn diagrams: Visual representation of shared and unique variance
- In multiple regression, predictors can share variance (overlap) and have unique contributions
Review:
- R² increases (or stays the same) when adding predictors
- sigma decreases when model fit improves
- SSE is minimized when fitted values equal observed values
Bootstrap Hypothesis Testing
Be able to:
- Compute a two-sided p-value from bootstrap results
- Interpret the standard deviation of the bootstrap distribution as the standard error
- Calculate how many SEs the observed statistic is from the null
- Make decisions based on p-value and α
Interaction Models
Understand:
- Interaction term: Tests whether the effect of one predictor depends on another predictor
- Simple slopes: The effect of X₁ on Y at specific values of X₂
- Centering predictors: Makes interpretation easier (intercept = predicted Y when all predictors are at their mean)
- Interpreting coefficients:
- Main effects when interaction is present represent “conditional” effects
- Interaction coefficient shows how the slope changes
Be able to:
- Compute simple slopes from model output
- Determine if an interaction is statistically significant
- Interpret what an interaction means substantively
Transformed Outcomes
Log-transformed Y:
- Slope interpretation: A 1-unit increase in X is associated with a (slope × 100)% change in Y
- Use
100 * (exp(slope) - 1)to convert to percentage change - Back-transform: exp(log Y) = Y
- Intercept: predicted log(Y) when X = 0
Quadratic models (Y ~ X + X²):
- Model curvilinear relationships
- Vertex: The x-value where the curve reaches its maximum or minimum
- Vertex x-coordinate = -b₁/(2 × b₂) where b₁ is the linear term and b₂ is the quadratic term
- U-shaped: Positive quadratic term
- Inverted U-shaped: Negative quadratic term
- Interpret: The effect of X on Y changes across the range of X
Multiple Regression & Confounding
Key concepts:
- Confounders: Variables that affect both the predictor and outcome, creating spurious associations
- Adjusted effects: The effect of X on Y, holding other variables constant
- Change in R²: Additional variance explained by adding predictors
- ΔR² = R²(full model) - R²(reduced model)
- Parallel slopes model: Model with multiple predictors but no interaction
- Lines for different groups are parallel (same slope, different intercepts)
- Residualized gain: Using baseline as a covariate when analyzing change
In RCTs:
- Randomization ensures confounders are balanced across groups (in expectation)
- Treatment effect can be interpreted causally
- Baseline covariates improve precision but aren’t confounders
Be able to:
- Compare unadjusted vs. adjusted effects
- Identify whether a variable is a confounder
- Interpret whether controlling for a variable changes conclusions
Using qt() and Degrees of Freedom
Know:
- For simple regression: df = n - 2
- For multiple regression: df = n - k - 1 (where k = number of predictors)
- Use
qt(c(.025, .975), df)for two-sided 95% CI critical values
Computing Predictions from Models
Be able to:
- Use regression equation to predict Y for given X values
- Predicted Y = intercept + (slope₁ × X₁) + (slope₂ × X₂) + …
- For categorical predictors coded 0/1, the coefficient is the difference between groups
Practice Tips
- Work through regression outputs: practice interpreting coefficients, SEs, R², and sigma
- Calculate confidence intervals by hand using estimate ± (critical t × SE)
- Practice identifying study types (descriptive, predictive, causal)
- Draw Venn diagrams to understand variance decomposition
- Interpret interaction models: compute simple slopes and understand what the interaction means
- Practice transformations: know how to interpret log-transformed outcomes and quadratic terms
- Work through examples of confounding: compare models with and without potential confounders
- Use the bootstrap examples to practice computing p-values and making decisions
- Understand the relationship between CIs, hypothesis tests, and p-values
What Won’t be on the Exam?
- Bayesian statistics
- Write R code from scratch
- Perform complex derivations
- The exam focuses on interpreting output and applying concepts, not programming
Test Structure
- 100 questions total
- Mix of True/False, Multiple Choice, and Matching
- Bring a pencil and basic calculator
- You can use four 3×5 index cards with handwritten notes (You’ll receive these in lecture)
- Focus on understanding concepts and being able to interpret statistical output