PSY 652: Research Methods in Psychology I

Effect Modification

Kimberly L. Henry: kim.henry@colostate.edu

REACH Forgiveness Study: Is there a treatment effect?

Part 1: Background

  • Research questions
  • Data preparation

Part 2: Does Treatment Cause Better Benevolence at Follow-up?

  • Model 1: Control variables
  • Model 2: Add treatment effect
  • Interpret treatment effectiveness

Part 3: Does Effectiveness Vary by Baseline Benevolence?

  • Model 3: Test for interaction
  • Probe the interaction
  • Johnson-Neyman technique
  • Clinical implications

Measures

Key measures:

  • Treatment Indicator: Treatment vs. Control (Random assignment within site)
  • T1 (Baseline): Benevolence toward the person who hurt them before intervention
  • T2 (Post-intervention): Benevolence after intervention/control period

Control variables:

  • site: Must be adjusted for given the nesting of people in sites
  • age: Additional demographic control (we could include others — but we’ll keep things simple)

Our two research questions

Question 1: Is the treatment effective?

Does the REACH forgiveness intervention increase benevolence toward the person who caused hurt, after controlling for baseline benevolence, age, and study site?

Translation: Does treatment work, on average, across all participants?

Question 2: Does effectiveness vary by baseline?

Does the treatment effect depend on participants’ baseline benevolence levels?

Translation: Does treatment work better (or worse) for people who start with low vs. high baseline benevolence?

Why these questions matter

Question 1 (Treatment effectiveness):

  • Fundamental question for any intervention study
  • Determines if the intervention has merit
  • Informs whether to invest resources in dissemination

Question 2 (Effect modification):

  • Even effective interventions don’t work equally for everyone
  • Understanding for whom interventions work helps target resources
  • Guides personalized intervention strategies

Import and prepare the data

Note that we have 3,965 cases left once missing data is removed.

Explore the concept of residualized gain

First, let’s simply look at the overall relationship between baseline and follow-up benevolence.

Understanding the baseline

Key insight: People who start with higher benevolence tend to have higher benevolence at follow-up. This line represents our expectation — where we’d predict someone to be at T2 based solely on their T1 score.

This raises an important question: Does treatment cause people to score higher than expected given their baseline?

In other words:

  • If someone starts with T1 = 3.0, we’d naturally expect them to be around a certain value at T2

  • Does treatment push them above that expected value?

This is the concept of residualized gain — gains beyond what we’d expect from baseline alone.

Making residualized gain concrete

Let’s fit a simple model predicting T2 benevolence from T1 benevolence (ignoring treatment) and examine the residuals:

Interpreting residuals

What are these residuals?

  • .fitted: Where we’d expect someone to be at T2 based on their T1 score
  • .resid: How much higher (positive) or lower (negative) they actually scored compared to expectation
  • Positive residual = Scored better than expected
  • Negative residual = Scored worse than expected

Key question: Do people in the Treatment group have systematically more positive residuals than Control?

Compare residuals by treatment condition

Visualize residuals by treatment

The connection to regression

What we just did manually:

  1. Predicted T2 from T1
  2. Calculated residuals (deviations from prediction)
  3. Compared residuals between treatment groups

What a regression model does automatically:

lm(t2_trim_benevolence ~ t1_trim_benevolence + treat)

The treatment coefficient in this model is exactly the difference in mean residuals between groups!

Controlling for baseline in regression = residualized gain analysis

Why use regression instead?

Advantages of the regression approach:

  • One model instead of two steps
  • Easier to add multiple covariates (age, site)
  • Standard errors and confidence intervals automatically correct
  • Can test interactions (does treatment effect depend on baseline?)

But the logic is identical: Both ask “Does treatment produce gains beyond what baseline alone predicts?”

Let’s verify they’re equivalent

Notice: The treatment effect is the same! These are mathematically equivalent approaches.

Fit the Formal Models

Center variables

Why center numeric variables?

Benefits of Centering:

  • Makes the intercept meaningful:
    The intercept now represents the predicted outcome when all centered continuous variables are at their means and categorical variables are at their reference levels (e.g., Control group, Colombia).

  • Improves interpretability of main effects:
    In a model with an interaction (e.g., y ~ x * z):

    • The coefficient for x represents the effect of x when z is at its mean

    • The coefficient for z represents the effect of z when x is at its mean

    • Without centering, a value of 0 may not be meaningful for either variable, making these interpretations unclear or misleading.

Question 1: Is the Treatment Effective?

Model 1: Control variables only

Purpose: Establish baseline prediction of T2 benevolence before considering treatment.

Why start here?

  • Shows what naturally predicts the outcome (baseline benevolence, age, site)
  • Provides a comparison point to see if treatment adds explanatory power
  • Common practice in randomized trials to include:
    • Baseline measure (T1 benevolence): Increases precision and statistical power by accounting for pre-existing individual differences
    • Site: Accounts for nesting/clustering (participants within sites are not independent observations)
    • Age: A potentially important covariate that may relate to forgiveness capacity (we could control for other demographic variables as well, but we’ll keep it simple)

Fit Model 1

Model 2: Add treatment

Now we answer Question 1: Is the treatment effective?

df.residual = n - 1 - p = 3965 - 1 - 8 = 3956

Model 2: Interpreting the treatment effect

When all control variables are held at their mean or mode (centered continuous variables = 0, categorical variables at reference level), the model simplifies to:

\[\hat{y}_{\text{T2}} = b_0 + b_{\text{Treatment}} \times \text{Treatment}\]

Plugging in the coefficients:

\[\hat{y}_{\text{T2}} = 3.34 + 0.50 \times \text{Treatment}\]

Where Treatment = 0 for Control group and Treatment = 1 for Treatment group.

Compute predicted values for each condition

For Control group (Treatment = 0): \[\hat{y}_{\text{T2}} = 3.34 + 0.50(0) = 3.34\]

For Treatment group (Treatment = 1): \[\hat{y}_{\text{T2}} = 3.34 + 0.50(1) = 3.84\]

Interpretation: When baseline benevolence, age, and site are held at their typical values, the Treatment group scores 0.50 points higher on T2 benevolence compared to Control (3.84 vs. 3.34).

Confidence interval

95% CI for treatment effect: [0.45, 0.55]

  • The 95% CI doesn’t include 0
  • Controlling for the selected covariates, people in the treatment condition reported reliably greater benevolence than people in the control group at the 2-week follow-up

Visualize the treatment effect

Adjusted means — predicted T2 benevolence for each group, holding control variables at the mean/mode.

Answer to Question 1

Is the REACH intervention effective?

Yes!

The treatment group scored 0.5 points higher on T2 benevolence compared to control, after adjusting for baseline levels, age, and site.

This is an additive model assumption: We assume the treatment effect is the same for everyone, regardless of their baseline benevolence.

But is this assumption reasonable?

Question 2: Does Effectiveness Vary by Baseline Benevolence?

Why test for effect modification?

Model 2 assumes the treatment effect is identical for everyone:

  • Someone starting with low benevolence: + 0.5 point gain
  • Someone starting with high benevolence: + 0.5 point gain

But maybe:

  • People with low baseline benevolence have more “room to grow” → larger gains
  • People with high baseline benevolence are already forgiving → smaller gains (ceiling effect)
  • OR the opposite pattern could occur

We need to test this empirically.

What is an interaction?

An interaction (or effect modification) occurs when the effect of one variable depends on the level of another variable.

Graphically:

  • No interaction: Parallel lines (same treatment effect across T1 levels)
  • Interaction: Non-parallel lines (treatment effect changes across T1 levels)

Statistically:

\[\text{T2} = b_0 + b_1(\text{T1.c}) + b_2(\text{Treatment}) + b_3(\text{T1.c} \times \text{Treatment})\]

Interpretation of effects in the presence of an interaction

\[\text{T2} = b_0 + b_1(\text{T1.c}) + b_2(\text{Treatment}) + b_3(\text{T1.c} \times \text{Treatment})\]

  • \(b_0\) (Intercept): The predicted T2 benevolence for the Control group at the mean level of T1 benevolence (\(\text{Treatment} = 0\) and \(\text{T1.c} = 0\)).
  • \(b_1\) (Effect of baseline benevolence): The slope of T1 benevolence for the Control group (\(\text{Treatment} = 0\)).
  • \(b_2\) (Effect of Treatment): The difference in predicted T2 benevolence between Treatment and Control groups among individuals with average T1 benevolence, that is, \(\text{T1.c} = 0\).
  • \(b_3\) (Interaction term): The change in the treatment effect for a one-unit increase in baseline benevolence — in other words, whether the treatment effect depends on T1 benevolence.

Model 3: Test for interaction

The model with interaction:

Note the syntax: t1_trim_benevolence.c * treat

This automatically includes:

  1. Main effect of T1 benevolence

  2. Main effect of treatment

  3. Interaction between T1 benevolence and treatment

Model 3: Results

Interpreting Model 3 coefficients

Focus on these three terms:

  1. t1_trim_benevolence.c (0.70): Effect of baseline benevolence for the Control group only

  2. treatTreatment (0.50): Treatment effect when T1 benevolence is at the mean (because T1 is centered at the mean)

  3. t1_trim_benevolence.c:treatTreatment (-0.27): The interaction

    • How much the treatment effect changes for each 1-point increase in baseline benevolence. The treatment effect becomes smaller as baseline benevolence increases.
    • Key question: Does the CI include zero?

Changing slope

When T1.c = 0

Treatment effect = \(0.50 + (-0.27 \times 0) = 0.50\)

At the mean level of baseline benevolence, treatment participants score 0.50 points higher on T2 benevolence than controls.

When T1.c = –1

Treatment effect = \(0.50 + (-0.27 \times -1) = 0.50 + 0.27 = 0.77\)

Among people 1 point below the mean in baseline benevolence, the treatment group scores 0.77 points higher on T2 benevolence than the control group.

When T1.c = +1

Treatment effect = \(0.50 + (-0.27 \times +1) = 0.50 - 0.27 = 0.23\)

Among people 1 point above the mean in baseline benevolence, the treatment group scores 0.23 points higher on T2 benevolence than the control group.

Understanding the baseline distribution

Before formally probing the interaction, let’s see where people fall on baseline benevolence:

Probing the interaction at these three levels

Strategy: Examine the treatment effect at three meaningful levels of baseline benevolence:

  • -1.5 SD: People with relatively low baseline benevolence
  • Mean (0): People with average baseline benevolence
  • +1.5 SD: People with relatively high baseline benevolence

Question: Does the treatment work equally well at all three levels?

Get predicted values

Visualize the interaction

What this shows:

  • Each colored line represents people at different baseline benevolence levels
  • The slope of each line shows the treatment effect for that group
  • If lines are parallel → no interaction (same treatment effect for all)
  • If lines have different slopes → interaction (treatment effect varies by baseline)

Are the lines parallel or not?

Calculate treatment effects at each level

The comparisons() function (from marginaleffects) works like slopes(), but is for factor variables (i.e., treat).

Interpreting the treatment effects

  • At -1.5 SD (low baseline): Treatment effect = 0.91 points [CI: 0.82, 1.00]
  • At Mean (average baseline): Treatment effect = 0.50 points [CI: 0.45, 0.55]
  • At +1.5 SD (high baseline): Treatment effect = 0.09 points [CI: 0.00, 0.18]

Pattern:

  • Does the treatment effect get larger or smaller as baseline increases?
  • For which groups is the CI furthest from zero?
  • Are there any baseline levels where treatment doesn’t appear effective?

An alternative way to view the interaction

Visualize across the full continuum

Johnson-Neyman Technique

New approach: Rather than picking arbitrary levels to test the slopes (-1.5 SD, mean, +1.5 SD), the Johnson–Neyman technique identifies the range of the moderator for which the treatment effect is reliably distinguishable from zero.

Question it answers: At which baseline benevolence levels is the treatment effect reliably different from zero?

Benefits:

  • Data-driven (not arbitrary)
  • Shows the full continuum
  • Identifies transition points

Johnson-Neyman Plot

The plot_slopes() function (from marginaleffects) is used here.

Interpreting the Johnson-Neyman Plot

How to read it:

  • x-axis: Full range of baseline benevolence (centered)
  • y-axis: Treatment effect (Treatment - Control difference)
  • Red horizontal line at 0: No treatment effect
  • Black line: Estimated treatment effect at each baseline level
  • Shaded region: 95% confidence interval

Key insight: Where does the confidence interval exclude zero?

  • When CI is entirely above 0 → Treatment helps
  • When CI is entirely below 0 → Treatment harms (unlikely here)
  • When CI includes 0 → Treatment effect uncertain

Answer to Question 2

Does treatment effectiveness vary by baseline benevolence?

  • Based on the interaction term: The interaction coefficient is -0.27, [CI:-0.31, -0.22].

  • Based on probing at specific levels: Treatment is most effective for people with low baseline benevolence.

  • Clinical implication: People most in need of the intervention are likely to benefit the most from it.

Key Takeaways

What we learned

  1. Build models sequentially: Controls → Main effects → Interactions
  2. Center continuous predictors for interpretability
  3. Interactions test effect modification - whether treatment works differently for different people
  4. Probe interactions at meaningful values
  5. Visualize extensively - coefficients alone don’t tell the full story
  6. Johnson-Neyman identifies regions of significance in a data-driven way
  7. Clinical implications guide intervention targeting and personalization

When to test for interactions

You should consider testing interactions when:

  • Theory suggests treatment effects might vary
  • You want to identify subgroups who benefit most
  • Baseline characteristics might modify intervention effectiveness
  • You have adequate statistical power (interactions require larger samples)

You should be cautious about:

  • Fishing for interactions without theory
  • Over-interpreting small interaction effects
  • Testing many interactions (multiple comparison issues)