Effect Modification
Part 1: Background
Part 2: Does Treatment Cause Better Benevolence at Follow-up?
Part 3: Does Effectiveness Vary by Baseline Benevolence?
Key measures:
Control variables:
Question 1: Is the treatment effective?
Does the REACH forgiveness intervention increase benevolence toward the person who caused hurt, after controlling for baseline benevolence, age, and study site?
Translation: Does treatment work, on average, across all participants?
Question 2: Does effectiveness vary by baseline?
Does the treatment effect depend on participants’ baseline benevolence levels?
Translation: Does treatment work better (or worse) for people who start with low vs. high baseline benevolence?
Question 1 (Treatment effectiveness):
Question 2 (Effect modification):
Note that we have 3,965 cases left once missing data is removed.
First, let’s simply look at the overall relationship between baseline and follow-up benevolence.
Key insight: People who start with higher benevolence tend to have higher benevolence at follow-up. This line represents our expectation — where we’d predict someone to be at T2 based solely on their T1 score.
This raises an important question: Does treatment cause people to score higher than expected given their baseline?
In other words:
If someone starts with T1 = 3.0, we’d naturally expect them to be around a certain value at T2
Does treatment push them above that expected value?
This is the concept of residualized gain — gains beyond what we’d expect from baseline alone.
Let’s fit a simple model predicting T2 benevolence from T1 benevolence (ignoring treatment) and examine the residuals:
What are these residuals?
Key question: Do people in the Treatment group have systematically more positive residuals than Control?
What we just did manually:
What a regression model does automatically:
The treatment coefficient in this model is exactly the difference in mean residuals between groups!
Controlling for baseline in regression = residualized gain analysis
Advantages of the regression approach:
But the logic is identical: Both ask “Does treatment produce gains beyond what baseline alone predicts?”
Notice: The treatment effect is the same! These are mathematically equivalent approaches.
Benefits of Centering:
Makes the intercept meaningful:
The intercept now represents the predicted outcome when all centered continuous variables are at their means and categorical variables are at their reference levels (e.g., Control group, Colombia).
Improves interpretability of main effects:
In a model with an interaction (e.g., y ~ x * z):
The coefficient for x represents the effect of x when z is at its mean
The coefficient for z represents the effect of z when x is at its mean
Without centering, a value of 0 may not be meaningful for either variable, making these interpretations unclear or misleading.
Purpose: Establish baseline prediction of T2 benevolence before considering treatment.
Why start here?
Now we answer Question 1: Is the treatment effective?
df.residual = n - 1 - p = 3965 - 1 - 8 = 3956
When all control variables are held at their mean or mode (centered continuous variables = 0, categorical variables at reference level), the model simplifies to:
\[\hat{y}_{\text{T2}} = b_0 + b_{\text{Treatment}} \times \text{Treatment}\]
Plugging in the coefficients:
\[\hat{y}_{\text{T2}} = 3.34 + 0.50 \times \text{Treatment}\]
Where Treatment = 0 for Control group and Treatment = 1 for Treatment group.
For Control group (Treatment = 0): \[\hat{y}_{\text{T2}} = 3.34 + 0.50(0) = 3.34\]
For Treatment group (Treatment = 1): \[\hat{y}_{\text{T2}} = 3.34 + 0.50(1) = 3.84\]
Interpretation: When baseline benevolence, age, and site are held at their typical values, the Treatment group scores 0.50 points higher on T2 benevolence compared to Control (3.84 vs. 3.34).
95% CI for treatment effect: [0.45, 0.55]
Adjusted means — predicted T2 benevolence for each group, holding control variables at the mean/mode.
Is the REACH intervention effective?
Yes!
The treatment group scored 0.5 points higher on T2 benevolence compared to control, after adjusting for baseline levels, age, and site.
This is an additive model assumption: We assume the treatment effect is the same for everyone, regardless of their baseline benevolence.
But is this assumption reasonable?
Model 2 assumes the treatment effect is identical for everyone:
But maybe:
We need to test this empirically.
An interaction (or effect modification) occurs when the effect of one variable depends on the level of another variable.
Graphically:
Statistically:
\[\text{T2} = b_0 + b_1(\text{T1.c}) + b_2(\text{Treatment}) + b_3(\text{T1.c} \times \text{Treatment})\]
\[\text{T2} = b_0 + b_1(\text{T1.c}) + b_2(\text{Treatment}) + b_3(\text{T1.c} \times \text{Treatment})\]
The model with interaction:
Note the syntax: t1_trim_benevolence.c * treat
This automatically includes:
Main effect of T1 benevolence
Main effect of treatment
Interaction between T1 benevolence and treatment
Focus on these three terms:
t1_trim_benevolence.c (0.70): Effect of baseline benevolence for the Control group only
treatTreatment (0.50): Treatment effect when T1 benevolence is at the mean (because T1 is centered at the mean)
t1_trim_benevolence.c:treatTreatment (-0.27): The interaction
When T1.c = 0
Treatment effect = \(0.50 + (-0.27 \times 0) = 0.50\)
At the mean level of baseline benevolence, treatment participants score 0.50 points higher on T2 benevolence than controls.
When T1.c = –1
Treatment effect = \(0.50 + (-0.27 \times -1) = 0.50 + 0.27 = 0.77\)
Among people 1 point below the mean in baseline benevolence, the treatment group scores 0.77 points higher on T2 benevolence than the control group.
When T1.c = +1
Treatment effect = \(0.50 + (-0.27 \times +1) = 0.50 - 0.27 = 0.23\)
Among people 1 point above the mean in baseline benevolence, the treatment group scores 0.23 points higher on T2 benevolence than the control group.
Before formally probing the interaction, let’s see where people fall on baseline benevolence:
Strategy: Examine the treatment effect at three meaningful levels of baseline benevolence:
Question: Does the treatment work equally well at all three levels?
What this shows:
Are the lines parallel or not?
The comparisons() function (from marginaleffects) works like slopes(), but is for factor variables (i.e., treat).
Pattern:
New approach: Rather than picking arbitrary levels to test the slopes (-1.5 SD, mean, +1.5 SD), the Johnson–Neyman technique identifies the range of the moderator for which the treatment effect is reliably distinguishable from zero.
Question it answers: At which baseline benevolence levels is the treatment effect reliably different from zero?
Benefits:
The plot_slopes() function (from marginaleffects) is used here.
How to read it:
Key insight: Where does the confidence interval exclude zero?
Does treatment effectiveness vary by baseline benevolence?
Based on the interaction term: The interaction coefficient is -0.27, [CI:-0.31, -0.22].
Based on probing at specific levels: Treatment is most effective for people with low baseline benevolence.
Clinical implication: People most in need of the intervention are likely to benefit the most from it.
What we learned
You should consider testing interactions when:
You should be cautious about: