Moderator Models

Module 14

Learning objectives

Learn to formulate research questions that involve effect modification
Gain skills in setting up and conducting moderation analyses using regression techniques
Fit moderation models in R
Develop the ability to interpret and report the results of moderation analyses in a meaningful way
Plot the results of a moderation model to facilitate interpretation

Overview

In all of the linear regression models we’ve considered so far, the relationship between each predictor and the outcome has been additive. This means we have assumed that each predictor contributes a fixed amount to the prediction of the outcome, independent of other variables in the model. For example, in the Bread and Peace multiple linear regression model, we assumed that the effect of growth in income on the percentage of votes procured by the incumbent party was the same regardless of the rate of fatalities. While this assumption simplifies the modeling process, it often fails to capture the more complex dynamics that occur in real-world data. In many psychological and behavioral sciences contexts, the influence of one variable on an outcome can significantly depend on the level or presence of another variable. This dependency results in a non-additive, or multiplicative, relationship between predictors and outcomes.

In these cases, the impact of a predictor on an outcome can vary depending on the presence or level of another variable. This variability is explored through moderation analysis, which helps us understand how third variables, known as moderators, influence the strength or direction of the relationship between a predictor and an outcome.

What is Moderation?
Moderation occurs when the relationship between two variables is contingent upon a third variable. This third variable, the moderator, can enhance, reduce, or even reverse the direction of the relationship between a predictor and an outcome. Moderation analysis is pivotal in psychological research because it allows us to uncover the conditions under which certain effects hold, providing a nuanced understanding of dynamics that are often missed in simpler analyses.

For example, consider a study examining the effectiveness of a new therapy for anxiety. The impact of the therapy might depend on the level of social support a patient has. In this case, social support may act as a moderator. If social support moderates the relationship between therapy and anxiety reduction, the therapy might be more effective for individuals with high social support than for those with low social support.

Why is Moderation Important?
Understanding moderation is crucial for developing more effective psychological interventions and for testing theories that predict differential effects based on certain conditions or characteristics. By identifying moderators, psychologists can:

Tailor interventions to be more effective for different groups.
Clarify the boundaries under which psychological theories are applicable.
Provide more personalized recommendations based on specific moderating factors.

Introduction to the data

In this Module, we will use data from a study published by Allen et al. (2023) in the Journal of Applied Psychology. The data is sourced from the MIDUS (Midlife in the United States) study, a nationally representative, longitudinal study funded by the National Institutes of Health, designed to investigate the physical, psychological, and social factors that influence health and well-being during midlife. Launched in the mid-1990s, MIDUS provides a valuable dataset for researchers interested in various aspects of aging, including changes in health and quality of life during the middle years of life.

Allen and colleagues collated a data frame of twins participating in MIDUS in order to isolate a causal effect of role demands on work-family conflict after accounting for the confounding effects of genetics. The authors pulled MIDUS data for 998 screened pairs of monozygotic (mz) or dizygotic (dz) twins. Given the focus on work-family conflict — Allen and colleagues only considered twin pairs in which both twins worked at least 20 hours a week in paid employment and were either married/partnered or have a child under 18 living at home. The sample included diverse dyad types, with a majority being White and aged between 25 to 65. Although we won’t replicate the findings of the Allen et al. study here — we will use their compiled data frame to study the similarity in work-family conflict between siblings in the twin pairs, and the extent to which the degree of similarity differs as a function of zygosity (i.e., a proposed moderator).

Let’s load the necessary libraries for this module.

library(marginaleffects)
library(gtsummary)
library(here)
library(broom)
library(tidyverse)

And, import the data frame, called midus_twins_workfamily.Rds.

orig_data <- read_rds(here("data", "midus_twins_workfamily.Rds"))
orig_data |> head(n = 24)

Here’s a listing of all the variables in the data frame (though we’ll only use the first 5 in this Module):

Variable	Description
id	Individual ID
fam_id	Family ID
time	Study wave: w1, w2, w3 for waves 1 through 3
zyg	Zygosity of twin pair: dz - same sex = dizygotic male-male or female-female; dz - different sex = dizygotic male-female, mz = monozygotic
wif	Work Interference with Family (wif): Measures how often job demands affect home life in the past year. Responses range from all the time (1) to never (5). Higher scores indicate greater conflict.
fiw	Family Interference with Work (fiw): Assesses how often family responsibilities impact work in the past year. Responses range from all the time (1) to never (5). Higher scores indicate greater conflict.
jd	Job Demands: Evaluates the intensity of job requirements, including too many demands, insufficient time, and frequent interruptions. Responses range from all the time (1) to never (5), reverse-scored for appropriateness. Higher scores indicate greater job demands.
fd	Family Demands: Looks at the pressure from family obligations, including excessive demands and frequent interruptions. Responses range from all the time (1) to never (5). Higher scores indicate greater family demands.
ext	Extraversion: Personality trait indicating sociability. Responses range from not at all (1) to a lot (4). Higher scores reflect greater extraversion.
agr	Agreeableness: Trait showing propensity for kindness and cooperation. Responses range from not at all (1) to a lot (4). Higher scores reflect greater agreeableness.
neu	Neuroticism: Personality trait indicating emotional instability. Responses range from not at all (1) to a lot (4). Higher scores reflect greater neuroticism.
opn	Openness: Trait related to creativity and willingness to experience. Responses range from not at all (1) to a lot (4). Higher scores reflect greater openness.
con	Conscientiousness: Trait indicating reliability and diligence. Responses range from not at all (1) to a lot (4). Higher scores reflect greater conscientiousness.
lifesat	Life Satisfaction. Responses include 1 = not at all, 2 = a little, 3 = somewhat, 4 = a lot.
mh	Mental Health Rating. Responses include 1 = poor, 2 = fair, 3 = good, 4 = very good, 5 = excellent.
ph	Physical Health Rating. Responses include 1 = poor, 2 = fair, 3 = good, 4 = very good, 5 = excellent.
comph	Comparison of Overall Health to Others Your Age. Responses include 1 = much worse, 2 = somewhat worse, 3 = about the same, 4 = somewhat better, 5 = much better.
sex	Sex of twin: 1 = male, 2 = female
age	Age of twins in years

In this module, our primary focus will be on exploring the relationship between Work Interference with Family (WIF) among siblings within a twin pair. Zygosity refers to whether the twins are monozygotic (identical) or dizygotic (fraternal). Monozygotic twins share nearly 100% of their genes, while dizygotic twins share about 50%, similar to regular siblings. We’ll focus in this module on twins who are the same sex (i.e., either both siblings are male, or both siblings are female).

Let’s perform a bit of data wrangling to prepare the data. Because we’re interested in comparing the scores within twin pairs — we will begin by pivoting the data from long to wide, so that each variable that is collected for each twin has two versions in the wide data frame. That is, for each variable that is twin-specific (e.g., including WIF), there will be two versions — wif_twin1 will denote the first twin’s score on WIF, and wif_twin2 will denote the second twin’s score on WIF. We will also limit the data to just wave 1 (since that is what we will work with in this Module) and subset the data to include just the variables we need for the exploration here. Twin pairs with missing data on these variables that we will consider will be dropped.

df <-
  orig_data |> 
  filter(time == "w1") |> 
  group_by(fam_id) |> 
  mutate(twin_identifier = case_when(row_number() == 1 ~ "_twin1",
                                     row_number() == 2 ~ "_twin2")) |> 
  ungroup() |> 
  pivot_wider(
    id_cols = c(fam_id, zyg, time, age),
    names_from = twin_identifier,
    names_sep = "",
    values_from = c(sex, wif, fiw, jd, fd, agr, opn, con, ext, neu, lifesat, mh, ph, comph),
  ) |> 
  mutate(sex_pair = case_when(sex_twin1 == 1 & sex_twin2 == 1 ~ "male twins", 
                             sex_twin1 == 2 & sex_twin2 == 2 ~ "female twins",
                             (sex_twin1 == 1 & sex_twin2 == 2) |
                               (sex_twin1 == 2 & sex_twin2 == 1) ~ "male-female twins")) |>
  select(fam_id, zyg, age, sex_pair, wif_twin1, wif_twin2) |> 
  filter(zyg %in% c("mz", "dz - same sex")) |> 
  drop_na() 

df |> head()

Concordance of WIF across twin types

Visualize the relationship

To start our exploration, we will first construct a visual representation that highlights the association between WIF for these two distinct types of twins. This initial step will provide us with a clearer understanding of the dynamics at play and set the stage for further analysis.

Let’s examine the relationship between WIF within twin pairs using a plot that is facetted by zygosity.

# set color palette for zyg
levels <- c("mz", "dz - same sex")
colors <- c("#2F9599", "#A7226E") 
my_colors_zyg <- setNames(colors, levels)

df |> 
  ggplot(mapping = aes(x = wif_twin1, y = wif_twin2, fill = zyg, color = zyg)) + 
  geom_jitter() +
  geom_smooth(method = "lm", formula = y ~ x) +
  scale_color_manual(values = my_colors_zyg) +
  scale_fill_manual(values = my_colors_zyg) +
  facet_wrap(~zyg) +
  theme_minimal() +
  theme(legend.position = "none") +
  labs(title = "Association of WIF between siblings in a twin pair",
       x = "WIF for Twin #2",
       y = "WIF for Twin #1")

Alternatively, instead of a plot facetted by zygosity, we can color the dots and lines by zygosity and put all of the data on a single plot.

df |> ggplot(mapping = aes(x = wif_twin1, y = wif_twin2, fill = zyg, color = zyg)) + 
  geom_jitter(alpha = .75) +
  geom_smooth(method = "lm", formula = y ~ x) +
  theme_minimal() +
  scale_color_manual(values = my_colors_zyg) +
  scale_fill_manual(values = my_colors_zyg) +
  labs(title = "Association of WIF between siblings in a twin pair",
       x = "WIF for Twin #2",
       y = "WIF for Twin #1",
       fill = "Zygosity",
       color = "Zygosity")

What do these graphs suggest?
For dizygotic twins (labeled dz - same sex), the relationship between their reported WIF levels is weak but positive (note the relatively flat slope relating WIF for twin 1 to WIF for twin 2). This means that if one dizygotic twin reports a high level of WIF, the other twin is only slightly more likely to report a similar level. This weak relationship suggests that factors influencing WIF might differ significantly between dizygotic twins, despite their shared environment and some shared genes.

In contrast, for monozygotic twins (labeled mz), there is a moderate positive relationship (note the positive slope relating WIF for twin 1 to WIF for twin 2). This means that if one monozygotic twin reports a high level of WIF, the other twin is moderately likely to report a similar level. This stronger relationship provides some evidence that monozygotic twins tend to experience WIF in a more similar way, which could be attributed to their identical genetic makeup.

Fit independent SLRs

Moving along, let’s fit a linear regression model to relate WIF across siblings in each twin pair. In the remainder of this Module, we will regress WIF for twin 1 on WIF for twin 2, and zygosity. Recall that it is important to define a meaningful intercept for regression models, and that is dependent on having a meaningful 0 score for all predictors. Zygosity will be treated as a dummy-coded indicator in the regression models, so a meaningful zero for this variable already exists (monozygotic twins will serve as the reference group). The other predictor, (wif_twin2, doesn’t have a meaningful zero (it ranges naturally from 1 to 5). Therefore, we will create a version of this variable where wif_twin2 is centered at a score of 1 (the lowest possible score for WIF). I’m also going to create two subsetted data frames — one that includes just the dizygotic twins (called dz) and one that includes the monozygotic twins (called mz).

df <-
  df |> 
  mutate(zyg = factor(zyg),
         zyg = relevel(zyg, ref = "mz")) |> 
  mutate(wif_twin2.c = wif_twin2 - 1)

# create subsetted dataframes
dz <- df |> filter(zyg == "dz - same sex")
mz <- df |> filter(zyg == "mz")

Now we’re ready to fit the two SLR models to relate WIF within the twin pairs:

Model for dz twins

 lm(wif_twin1 ~ wif_twin2.c, data = dz) |> 
  tidy(conf.int = TRUE, conf.level = .95) |>  
  select(term, estimate, std.error, conf.low, conf.high)

Model for mz twins

lm(wif_twin1 ~ wif_twin2.c, data = mz) |> 
  tidy(conf.int = TRUE, conf.level = .95) |> 
  select(term, estimate, std.error, conf.low, conf.high)

How should these models be interpreted?
In each of these models, the intercept represents the predicted WIF for twin 1 when twin 2 has a WIF score of 0 (recall that a score of 0 represents pairs where twin 2 has a WIF score of 1 due to our centering). The slope for wif_twin2.c represents the predicted change in wif_twin1 for a one-unit increase in wif_twin2.c. We find a larger concordance between mz twins (\(b_1\) = .34, 95% CI: .18, .49) than for dz twins (\(b_1\) = .08, 95% CI -.15, .30). Notice that a slope of 0, which would represent no concordance between twins, is a plausible value for dizygotic twins based on the 95% CI. However, 0 is not a plausible value for concordance between monozygotic twins (i.e., 0 is outside the boundaries of the 95% CI).

A combined model

Rather than fitting separate models for each twin group — let’s explore combined models.

Parallel slopes model

To begin, let’s fit a multiple linear regression model in which we regress WIF for twin 1 on WIF for twin 2 and the zygosity indicator.

lm_ps <- 
  lm(wif_twin1 ~ wif_twin2.c + zyg, data = df) 

lm_ps |> 
  tidy(conf.int = TRUE, conf.level = .95) |> 
  select(term, estimate, std.error, conf.low, conf.high)

Plot the model results

Before interpreting the output, let’s create a plot of the fitted model. Here, I feed the model object (lm_ps) into the plot_predictions() function from the marginaleffects package. This handy function works alongside ggplot2 to generate plots of predicted values from a specified model. The argument condition = list(wif_twin2.c = seq(0, 3), zyg = unique) specifies the conditions for which predictions should be plotted. wif_twin2.c = seq(0, 3) means that predictions will be made across this sequence of values for the variable wif_twin2.c, while zyg = unique means that predictions will be made for each unique value of the variable zyg (zygosity). This is essentially creating a grid of predictors (which we performed manually via the datagrid() function from marginaleffects in Module 13) and then producing the predicted scores for the outcome (i.e., y-hat) based on our fitted model. Once the data frame is created, then plot_predictions() automatically feeds the data frame into ggplot(), assigns the proper geometry, and then the usual ggplot() options can be used to make the plot look as desired.

lm_ps |> 
  plot_predictions(condition = list(wif_twin2.c = seq(0, 3), 
                                    zyg = unique)) +
  scale_color_manual(values = my_colors_zyg) +
  scale_fill_manual(values = my_colors_zyg) +
  labs(title = "Fitted concordance in WIF between twin pairs",
       subtitle = "Parallel slopes model",
       x = "Work interfering with family for twin 2",
       y = "Predicted work interfering with family for twin 1",
       fill = "Zygosity",
       color = "Zygosity") +
  theme_minimal()

This model is called a parallel slopes model. In a parallel slopes model, the slopes of the predictor variables are assumed to be parallel, which means that the effect of each predictor on the outcome is additive, holding constant all others. In other words, the parallel slopes assumption implies that the impact of one predictor on the outcome does not depend on the values of the other predictors. This assumption allows for a straightforward interpretation of the effect of each predictor on the outcome, as the effect remains constant across different levels of the other variable (or variables in the case of many predictors).

In our example, this means that the concordance in WIF between twin pairs (as captured by the slope relating wif_twin1 to wif_twin2.c) is consistent across all levels of zyg (i.e., the slopes are parallel). In our fitted model, you can see that the slope relating WIF for twin 1 to WIF for twin 2 is assumed to be the same for the zygosity groups (the slope = .24, holding constant zygosity). Thus, the two lines on the graph are parallel to one another. We also see that in this specified model the difference in WIF for twin 1 between dizygotic and monozygotic twins is the same regardless of the level of WIF for twin 2. That is, the vertical distance between the two lines is the same at every point along the x-axis.

We can use the slopes() function from the marginaleffects package to verify that the model-fitted slope relating WIF across twin pairs is indeed the same (i.e., the slopes are equal) for each type of twin pair. This function is used to calculate the estimated slopes (or marginal effects) of predictors in a linear model. The code below calculates how the effect of the key predictor variable wif_twin2.c on the outcome varies across different levels of a third variable (zyg in this case). The resulting output contains the estimated slopes and their standard errors for each level of zyg, giving us insights into how the relationship between wif_twin2.c and the outcome changes depending on the value of zyg.

lm_ps |> 
  slopes(variables = "wif_twin2.c", by = "zyg") |> 
  as_tibble() |> 
  select(term, zyg, estimate, std.error)

How should we interpret these effects?
Using the tidy() output of the parallel slopes model, the equation can be written as follows (where \(y_i\) represents wif_twin1, \(x_i\) represents wif_twin2.c, and \(z_i\) represents zyg):

\[ \hat{y_i} = 2.26+(0.24\times{x_i})+(0.01\times{z_i}) \]

Interpretation of the intercept: The intercept coefficient (2.26) represents the predicted mean outcome (WIF for twin 1) when both wif_twin2.c and zyg are zero. Therefore the intercept captures the predicted mean WIF for twin 1 among monozygotic twins in which twin 2 has a WIF score of 1 on the original scale (wif_twin2.c = 0).

Slope for wif_twin2.c: The slope coefficient for wif_twin2.c (0.24) represents the slope for WIF concordance, holding zyg constant (for example, when comparing two mz twins or when comparing two dz twins).

Slope for zyg: The slope coefficient for zyg (0.01) represents the estimated difference in the outcome (i.e., predicted WIF score for twin 1) for dz twins (coded as 1 for zyg) compared to mz twins (coded as 0 for zyg), holding wif_twin2.c constant. For example, when comparing a dizygotic twin to a monozygotic twin if both have a score of 3 for wif_twin2.c.

We can see that the estimate of the concordance in WIF is the same for the two types of twins. This is by design because we fit a parallel slopes model. In many models, this assumption of parallel slopes will be accurate. But, it’s always an assumption that we may consider. For example, in the study we are considering now — we saw earlier when fitting each model separately, that the slopes relating WIF from twin to twin did differ by group — where the concordance for mz twins seemed larger. In cases where the parallel slopes assumption is not viable — we can relax the assumption by including an interaction between the two predictors. In this way, we can allow for non-parallel slopes. Let’s see how this works in the next section.

Non-parallel slopes model

In this instance, we will treat zyg as a moderator of the concordance of WIF between twins. Rather than listing the two variables separately, e.g., lm(wif_twin1 ~ wif_twin2.c + zyg), we include an asterisk between them, e.g., lm(wif_twin1 ~ wif_twin2.c*zyg). This tells R to include the interaction of the two variables.

lm_int <- 
  lm(wif_twin1 ~ wif_twin2.c*zyg, data = df) 

lm_int |> 
  tidy(conf.int = TRUE, conf.level = .95) |> 
  select(term, estimate, std.error, conf.low, conf.high)

Plot the model results
Again, we’ll use the plot_predictions() function to create a graph of the fitted model with the interaction.

lm_int |> 
   plot_predictions(condition = list(wif_twin2.c = seq(0, 3), 
                                    zyg = unique)) +
  scale_color_manual(values = my_colors_zyg) +
  scale_fill_manual(values = my_colors_zyg) +
  labs(title = "Fitted concordance in WIF between twin pairs",
       subtitle = "Non-parallel slopes model",
       x = "Work interfering with family for twin 2",
       y = "Predicted work interfering with family for twin 1",
       fill = "Zygosity",
       color = "Zygosity") +
  theme_minimal()

How should we interpret these effects?
First, recognize that, in terms of the point estimates (listed under estimate in the table produced using tidy()) this model is simply reproducing the same information that we garnered from fitting the two models separately at the start of our exploration (in the section entitled “Fit independent SLRs”).

For reference, the equations from those two separate models are as follows, where \({y_i}\) represents the variable wif_twin1 and \({x_i}\) represents the variable wif_twin2.c.

Equation 1 (For dz twins, \({z_i}\) == “dz - same sex”)): \(\hat{y_i} = 2.52+(0.08\times{x_i})\)
Equation 2 (For mz twins, \({z_i}\) == “mz”): \(\hat{y_i} = 2.10+(0.34\times{x_i})\)

Now, let’s match this up to the output from our fitted model in which zygosity moderates the concordance of WIF, which can be written in equation form as follows (where \(x_i\) represents wif_twin2.c, \(z_i\) represents zyg, and \({x_i}\times{z_i}\) represents the interaction between wif_twin2.c and zyg:

\[ \hat{y_i} = 2.10+(0.34\times{x_i})+(0.42\times{z_i})+(-0.26\times{x_i}\times{z_i}) \]

Interpretation of the intercept: The intercept coefficient (2.10) represents the predicted mean outcome (WIF for twin 1) when both wif_twin2.c and zyg are zero. Therefore the intercept captures the predicted mean WIF for twin 1 among monozygotic twins in which twin 2 has a WIF score of 1 on the original scale (due to our centering of WIF for twin 2 at a score of 1). Notice that the intercept in the moderation model is equivalent to the intercept in Equation 2. The 95% CI for this estimate is 1.82 to 2.38, giving us a range of plausible values for the predicted outcome when all predictors are 0.

Slope for wif_twin2.c: The slope coefficient for wif_twin2.c (0.34) represents the slope for WIF concordance when zyg equals 0 (i.e., the reference group, which is mz twins in our example). Notice that this estimate matches the slope estimate in Equation 2. The 95% CI for this estimate is 0.17 to 0.50, giving us a range of plausible values for the effect of WIF for twin 2 on WIF for twin 1 among monozygotic twins. Note that the interval doesn’t contain 0 — providing evidence for concordance in WIF among monozygotic twins.

Slope for zyg: The slope coefficient for zyg (0.42) represents the estimated difference in the outcome (i.e., predicted WIF score for twin 1) for dizygotic twins (coded as 1 for zyg) compared to monozygotic twins (coded as 0 for zyg), when wif_twin2.c equals 0. For dizygotic twins, the predicted mean WIF score for twin 1 is higher by approximately 0.42 units compared to monozygotic twins, when wif_twin2.c is zero (the lowest possible WIF score). It is important to note that the 95% CI for this slope contains 0, indicating that we should be cautious in interpreting this difference as meaningful as it may not reflect a consistent pattern in the data. Notice that this value is equal to the difference between the intercepts across Equations 1 and 2: 2.52 - 2.10 = 0.42.

Slope for interaction: The interaction (i.e., moderation) effect (-0.26) represents the predicted difference in the concordance of WIF across twins (i.e, the effect of WIF for twin 2 on WIF for twin 1) for dizygotic twins as compared to monozygotic twins. Note that the slope for monozygotic twins is displayed in the output (i.e., the slope for wif_twin2.c which we already interpreted). To recover the slope for dizygotic twins, we take the slope for monozygotic twins (0.34) and add the interaction term (-0.26), so: 0.34 + (-0.26) = 0.08. The 95% CI for the interaction effect ranges from -0.52 to 0.01, which just barely includes 0. This indicates that the evidence for a differential effect of WIF between monozygotic and dizygotic twins is not strong, and we cannot confidently assert that zygosity moderates the relationship.

We can use the slopes() function from the marginaleffects package to compute the slope relating WIF across twin pairs for dizygotic and monozygotic twins based on our fitted moderation model. These slopes are commonly referred to as simple slopes. Simple slopes refer to the slopes (or effects) of the predictor variable at specific levels of the moderator variable. In this context, simple slopes help us understand how the relationship between WIF for twin 2 and WIF for twin 1 differs across levels of zygosity (dz vs. mz). Using the slopes() function, we obtain these simple slopes along with their standard errors and confidence intervals, providing a clear picture of how the effect of WIF for twin 2 on WIF for twin 1 varies by zygosity.

lm_int |> 
  slopes(variables = "wif_twin2.c", by = "zyg") |> 
  as_tibble() |> 
  select(term, zyg, estimate, std.error, conf.low, conf.high)

For monozygotic twins, the simple slope for wif_twin2.c is 0.34, with a 95% CI ranging from 0.17 to 0.50, indicating a substantial and positive concordance between WIF scores of twin 1 and twin 2. This interval does not include 0, providing strong evidence for a positive relationship.
For dizygotic twins, the simple slope for wif_twin2.c is 0.08, calculated by summing the slope for monozygotic twins (0.34) and the interaction effect (-0.26). The flatter slope indicates a much weaker relationship between the WIF scores of twin 1 and twin 2 for dizygotic twins.

The contrast between these simple slopes highlights the effect of WIF concordance across twin types. While the interaction effect’s 95% CI includes 0, suggesting caution in interpreting this moderation effect as meaningful, the distinct slopes for monozygotic versus dizygotic twins hints at a potential difference that may warrant further study. Thus, even though the interaction term itself is not definitive, the individual slopes provide insight into the varying strengths of concordance in WIF based on zygosity. A follow-up study with a larger sample size would be helpful in this case.

The annotated figure below maps all of this information onto our non-parallel slopes graph.

Summarizing models with interactions

In summary, when two variables are specified to interact in a regression model (i.e., \({x_i}\times{z_i}\)), that is \(y_i\) is regressed on \(x_i\), \(z_i\) and their interaction (\(x_i\)*\(z_i\)), then:

The regression coefficient for \(x_i\) is the effect of \(x_i\) on \(y_i\) when \(z_i\) equals 0.
The regression coefficient for \(z_i\) is the effect of \(z_i\) on \(y_i\) when \(x_i\) equals 0.
The regression coefficient for the interaction (\(x_i\)*\(z_i\)) is the difference in the effect of \(x_i\) on \(y_i\) for a one-unit increase in \(z_i\). When the moderator (\(z_i\)) is binary, this simply contrasts one group to the other because one group is coded 0 and the other group is coded 1.

In this way, we can fit a non-parallel slopes model, and allow for the possibility that the effects of each variable involved in the interaction are not strictly additive — but rather are multiplicative. That is, the effect of each variable depends on the other. For wif_twin2.c this is evidenced by the non-parallel lines — the slope is steeper for monozygotic twins and flatter for dyzygotic twins. For zyg, this is evidenced as the vertical distance between the two lines as we move from left to right on the x-axis. For instance, when wif_twin2.c equals 0, dizygotic twins have a higher predicted score for wif_twin1, but when wif_twin2.c equals 3, monozygotic twins have a higher predicted score for wif_twin1.

Wrap up

In this Module, we delved into the concept of moderation analysis within the context of linear regression models. The primary focus was on formulating research questions that incorporate moderation, which allows researchers to examine the conditions under which certain effects occur. This is particularly valuable in behavioral and psychological research, where the influence of one variable on an outcome can depend on the presence or level of another variable, known as the moderator. Understanding these interactions is crucial as they reveal how the relationship between predictors and outcomes can change, offering deeper insights into the data.

Visualization techniques covered in this Module are instrumental for demonstrating the nature of moderation models and facilitating model interpretation. By visualizing the interaction effects, researchers can more easily understand and explain how different variables interact to influence the outcome.

Overall, this Module highlighted the significance of considering interactions in regression models to capture the complexity of real-world phenomena. By doing so, we can draw more accurate and actionable conclusions from our data, leading to better-informed decisions and interventions.