A webR tutorial

Does baseline benevolence differ by marital status?

An image of a broken heart that's been stiched back together.

Background

In the REACH Forgiveness Study, researchers examined whether a brief self-directed forgiveness workbook intervention could improve forgiveness and mental health outcomes. The study was conducted across multiple countries with 4,598 participants.

In this WebR activity, we’ll use the REACH data to determine if relationship status is related to people’s capacity for benevolence toward those who have hurt them.

Research Question: Does baseline benevolence differ by marital status?

The benevolence items

After describing a past event in which they were hurt by another person, participants responded to each item by reflecting on the person who had hurt them. Items were rated using a five-point response format (1 = Strongly disagree; 5 = Strongly agree). The six benevolence items are:

Even though his/her actions hurt me, I still have goodwill for him/her.
I want us to bury the hatchet and move forward with our relationship.
Despite what he/she did, I want us to have a positive relationship again.
I have given up my hurt and resentment.
Although he/she hurt me, I put the hurt aside so we could resume our relationship.
I have released my anger so I could work on restoring our relationship to health.

Import the data

We’ll focus on the following key variables:

maritalStat: Marital status (factor with levels: “In relationship”, “Single”, “Married”, “Divorced”, “Separated”, “Widowed”)
t1_trim13 through t1_trim18: The six benevolence items listed above

Press Run Code on the code chunk below to import the data, create the benevolence scale, and drop cases with missing data¹.

Explore the data

Before fitting any models, let’s explore the distribution of benevolence scores by marital status.

Check the variable structure

First, let’s verify how marital status is coded and how many people are in each group.

Note

What do you notice?

Which group is largest?
Which level will serve as the reference group in a regression model?
How many dummy-coded indicators will we need to represent this categorical variable?

Descriptive statistics

Calculate descriptive statistics for baseline benevolence by marital status. Fill in the blanks in the code below, then press Run Code.

You need to select the grouping variable (marital status) and the outcome variable (baseline benevolence), then group by marital status.

Visualize the distribution

Create a box plot to visualize how baseline benevolence differs by marital status. Box plots show the median (middle line), the middle 50% of scores (the box), and the range of most scores (whiskers). We’ll also add a purple circle to mark the mean for each group. Fill in the missing items, then press Run Code.

x-axis: marital status
y-axis: benevolence score
geometry: use geom_boxplot() to create box plots

🤔 Question
🦉 Interpretation

With a partner, discuss what patterns you notice in the plot:

Which groups have the highest and lowest average benevolence?
How much overlap is there between the groups?
Do you predict reliable differences across groups?

Looking at the plot, divorced individuals appear to have the lowest baseline benevolence, while those who are married or in a relationship appear to have the highest. However, there is substantial overlap between the distributions. The formal regression analysis will help us determine which differences are reliably different.

Fit the regression model

Now let’s formally test whether benevolence differs by marital status using linear regression.

Understanding dummy coding with multiple categories

Key Concept

With 6 marital status categories, we need 5 dummy variables (i.e, dummy-coded indiators (k - 1 rule)).

The reference group (first alphabetically: “Divorced”) will be coded as 0 on all dummy variables. Each of the other 5 categories gets its own dummy-coded indicator — set as 1 when a person is in that category, 0 otherwise. We’ll add in our factor variable representing marital status to the lm() and let R do the work to dummy code.

The regression will compare each of the 5 other groups to the reference group (Divorced).

Fit the model

Fit a simple linear regression predicting baseline benevolence from marital status. Fill in the blanks, then press Run Code.

The outcome goes on the left side of ~, the predictor on the right.

With a partner, interpret the regression output:

What does the intercept (2.85) represent?
What does each marital status coefficient represent?
Which confidence intervals include zero? What does this tell you?
Which groups differ reliably from the reference group (Divorced)?

Intercept (2.85): The predicted baseline benevolence for the reference group (Divorced individuals). This is the lowest mean among all groups.
Each coefficient represents the difference from Divorced individuals:
- In relationship (0.37): People in relationships have benevolence scores 0.37 points higher than divorced individuals. CI (0.21 to 0.52) does not include zero → we can be confident this difference exists.
- Married (0.38): Married people have benevolence scores 0.38 points higher than divorced individuals. CI (0.22 to 0.53) does not include zero → we can be confident this difference exists.
- Separated (0.07): Separated individuals have benevolence scores 0.07 points higher than divorced individuals. CI (-0.30 to 0.43) includes zero → the difference is uncertain; zero is a plausible value.
- Single (0.20): Single people have benevolence scores 0.20 points higher than divorced individuals. CI (0.05 to 0.35) does not include zero → we can be confident this difference exists.
- Widowed (0.16): Widowed individuals have benevolence scores 0.16 points higher than divorced individuals. CI (-0.08 to 0.40) includes zero → the difference is uncertain; zero is a plausible value.

Key findings:
Divorced individuals show the lowest baseline benevolence. Those in relationships, married, and single individuals all show higher benevolence (and the 95% CI for the differences don’t include zero). Baseline benevolence for separated and widowed individuals are not reliably different from divorced individuals (the 95% CI includes 0).

Equation: \[\begin{aligned} \hat{y}_i = 2.85 + (0.37 \times \text{InRel}_i) + (0.38 \times \text{Married}_i) +\\ (0.07 \times \text{Sep}_i) + (0.20 \times \text{Single}_i) + (0.16 \times \text{Widow}_i) \end{aligned}\]

where each dummy variable equals 1 for that marital status and 0 otherwise.

To recover group means:

Divorced: \(\hat{y} = 2.85\) (all dummies = 0)
In relationship: \(\hat{y} = 2.85 + 0.37 = 3.22\)
Married: \(\hat{y} = 2.85 + 0.38 = 3.23\)
Separated: \(\hat{y} = 2.85 + 0.07 = 2.92\)
Single: \(\hat{y} = 2.85 + 0.20 = 3.05\)
Widowed: \(\hat{y} = 2.85 + 0.16 = 3.01\)

Want to explore more?

Try these additional analyses on your own:

Change the reference group: Refit the model with “Married” as the reference group. How does the interpretation change? Are single individuals reliably different from married individuals?

Compare specific groups: What if you want to compare “Single” vs. “In relationship” directly? How would you set this up?

Variance decomposition: Calculate the between-group and within-group sum of squares. What proportion of total variance is explained by marital status?

Footnotes

In this activity, we use drop_na() to remove any participants with missing data on our key variables (marital status and the baseline benevolence scale). This is called listwise deletion or complete case analysis.

Important limitation: This approach can:
- Reduce sample size and statistical power
- Introduce bias if data are not missing completely at random (MCAR)
- Waste valuable information from partially complete cases
Better approach: Multiple imputation is a more sophisticated technique that:
- Uses patterns in the observed data to estimate plausible values for missing data
- Accounts for uncertainty in these estimates
- Generally produces less biased results and preserves more of your sample
You’ll learn about missing data techniques, including multiple imputation, in PSY653. For now, we’re using listwise deletion for simplicity, but be aware this is not the gold standard for handling missing data in real research. For those interested in learning more now, see van Buuren & Groothuis-Oudshoorn (2011) on the mice package for multiple imputation in R.↩︎