Apply and Practice Activity
Checking Assumptions of the Model
In Module 14: Moderation, we explored the concordance in Work Interference with Family (WIF) between twin pairs in the MIDUS Study. The key example involved fitting the following moderation model to assess whether zygosity moderates the relationship between the WIF scores of twin 1 and twin 2:
lm_int <- lm(wif_twin1 ~ wif_twin2.c*zyg, data = df)
In this Apply and Practice activity, you will evaluate whether the key assumptions of the linear regression model are met for this fitted model. Specifically, you will focus on checking for linearity and additivity, equal variance of errors (homoscedasticity), and the normality of residuals.
In your Posit Cloud foundations project — go to programs > module_programs > and open up the m14-moderation.qmd analysis notebook. At the bottom of the notebook, do the work to examine the assumptions of the moderation model using the Module 17 handout and the guidelines listed below. Submit a one page summary of your assessment to receive credit for this activity.
Guidelines for assessment:
Linearity and Additivity:
Objective: Evaluate if the relationship between the predictors (wif_twin2 and zyg) and the outcome (wif_twin1) is linear and additive.
Action: Create a Fitted Values vs. Residuals plot by plotting the fitted values (
.fitted) on the x-axis and the residuals (.resid) on the y-axis.Interpretation: Look for any clear patterns or curvature in the plot. The residuals should scatter randomly around the zero line, and the spread of residuals should remain constant across the fitted values.
Questions to consider:
Is there a clear non-linear trend or systematic pattern?
Are the residuals evenly spread across the range of fitted values, or do they show increasing/decreasing variance?
Equal Variance of Errors (Homoscedasticity):
Objective: Assess if the variance of the residuals is constant across all levels of the predictor variables.
Action: Use the Fitted Values vs. Residuals plot created earlier.
Interpretation: Check if the residuals show constant variability across the fitted values. If the residuals form a funnel shape (narrow at one end, wide at the other), it indicates heteroscedasticity.
Questions to consider:
Do the residuals appear randomly scattered?
Is there evidence of increasing or decreasing variability in the residuals as fitted values increase?
Normality of Errors:
Objective: Determine if the residuals follow a normal distribution.
Action: Create a histogram of the residuals and check for a bell-shaped curve.
Interpretation: The residuals should roughly follow a normal distribution. Slight deviations from normality are generally acceptable, but significant skewness or kurtosis may suggest the need for transformation.
Questions to consider:
- Does the histogram of residuals resemble a normal distribution?
Outliers:
Objective: Identify any influential outliers that may have an undue impact on the regression model’s estimates.
Action: Create a Residuals vs. Leverage plot and examine Cook’s distance to detect points with both high leverage and large residuals. Observations with high Cook’s distance are potential influential cases that could affect the model’s performance.
Interpretation: In a well-fitted model, most observations should have low leverage and low residuals. Points with high leverage or large Cook’s distance should be examined for potential data entry errors or unique characteristics that could warrant special consideration.
Questions to consider:
Are there any points in the plot with both high leverage and large residuals?
Does any observation have a Cook’s distance larger than 1, indicating it might unduly influence the model?