A webR tutorial

MLR with Uncorrelated Predictors

Background information

Sleep is a vital component of overall health and well-being. Sleep efficiency (SE) β€” the percentage of time spent asleep while in bed β€” is a key indicator of sleep quality. Alcohol consumption is known to affect our sleep, potentially reducing sleep efficiency by disrupting normal sleep patterns. Conversely, the Sleep Hygiene Index (SHI) measures an individual’s engagement in behaviors that promote healthy sleep, such as maintaining a regular sleep schedule, creating a restful environment, and limiting exposure to screens before bedtime.

Study Overview

To determine if alcohol consumption and sleep hygiene are predictive of sleep efficiency, you designed a study involving a diverse group of adults from Colorado. The study aims to provide insights into how these factors individually and collectively are associated with sleep quality.

Study Design

  • Type: Experimental study with a stratified random assignment.

  • Population: Adults aged 21-65 years, selected from a state-level public health database focusing on sleep patterns among Colorado residents.

  • Sample Size: 225 participants.

Stratification and Systematic Assignment

  1. Sleep Hygiene Stratification:

    • Participants were categorized into five groups based on their score on the Sleep Hygiene Instrument (SHI): 0 (poor hygiene), 10, 20, 30, and 40 (excellent hygiene).
  2. Alcohol Consumption Assignment:

    1. Within each SHI group, participants were systematically assigned to one of nine alcohol consumption levels: 0g, 5g, 10g, 15g, 20g, 25g, 30g, 35g, or 40g of alcohol, resulting in a full-factorial design where all combinations of SHI and alcohol consumption levels are represented.

Protocol

One evening study session was selected for each participant. On their selected day, the participant was provided with the designated grams of alcohol. The alcohol was presented in a standardized beverage, ensuring consistency across participants. Participants were instructed to consume the alcohol between the hours of 6 and 9 pm. That night, sleep efficiency was measured using a wearable sleep tracker that recorded sleep patterns throughout the night. Devices were validated for accuracy and reliability in sleep studies.

Simulate data

Press Run Code on the code chunk below to create the simulated data frame for this activity. The produced data frame is called data_uncorrelated.

  • The variable called se represents sleep efficiency, this is the outcome variable that we want to predict.

  • The variable called alcohol represents the grams of alcohol consumed.

  • The variable called shi represents the individual’s Sleep Hygiene Index score (a higher value denotes better sleep hygiene).

Importantly, in this version of the study, because all combinations of sleep hygiene and alcohol use are systematically paired and replicated, the two predictors (sleep hygiene and alcohol) are completely uncorrelated by design.

You don’t need to understand how this simulation is working, but here’s some documentation in case you are interested. This code defines five levels of the Sleep Hygiene Index (SHI) β€” 0, 10, 20, 30, and 40 β€” to represent different levels of sleep hygiene practices among participants. It also specifies nine levels of alcohol consumption, ranging from 0 to 40 grams in increments of 5 grams. The code creates all possible combinations of SHI levels and alcohol consumption levels (using the expand_grid() function), resulting in a comprehensive grid of participant groups. Each combination is replicated five times to simulate multiple participants with the same SHI and alcohol consumption levels. This full factorial design ensures that the predictors (SHI and alcohol consumption) are uncorrelated by design. Finally, the code calculates the sleep efficiency for each simulated participant using a formula that includes a baseline value of 60, adds the product of 0.6 and the SHI score (implying that higher SHI scores increase sleep efficiency), subtracts the product of 0.4 and the alcohol consumption level (indicating that higher alcohol consumption decreases sleep efficiency), and adds a random error term drawn from a normal distribution with a mean of 0 and a standard deviation of 5.75 to mimic natural variability in sleep efficiency.

Descriptive statistics

In the code chunk below, use the skim() function from the skimr package to request descriptive statistics.

Additionally, use the correlate() function from the corrr to request the correlation matrix.

Jot down a few sentences to describe the descriptive statistics as well as the correlation matrix.

Fit a simple linear regression

Please fit a simple linear regression model to quantify the relationship between alcohol and sleep efficiency. Call your model object mod_1. Request the tidy() and glance() output from the broom() package.

Add sleep hygiene as a covariate

Now, to your SLR model, add sleep hygiene (shi) as an additional predictor. Call the model object mod_2. Request the tidy() and glance() output.

Important

Study the two outputs. What do you notice about how the effect of alcohol on sleep efficiency has changed once sleep hygiene is added to the model? How does this differ from the changing effect of alcohol in the observational study that we examined earlier where alcohol and sleep hygiene were correlated?

The model as a Venn diagram

Now, to help further solidify your intuition, please study the Venn diagram below which depicts the uncorrelated predictors example from this activity.

Important

With your neighbor, use the Venn diagram above to define \(R^2\) using the lettered areas. How does this differ from the case where the predictors were correlated? Examine the \(R^2\) from the glance() output to quantify the variability in sleep efficiency that can be explained by alcohol and sleep hygiene.