A webR tutorial
Estimating Mindfulness Practice Among College Students

Introduction
You and your colleague aim to estimate the prevalence of mindfulness practice among college students. Mindfulness has been linked to numerous benefits, including reduced stress and improved mental well-being. Knowing how many students engage in mindfulness practices can help the university develop programs to support student health. The population of interest is all college students at your university.
Simulate the population
For this example, we’ll simulate a data frame that represents the population of all college students at your university. Press Run Code on the code chunk below to create the data frame.
This script simulates a population of 5,000 students. First, a random seed is set to ensure the results are reproducible (i.e., so you get the same results each time your run the code). A population of 5,000 students is created, each assigned a unique ID. The key variable of interest, “Mindfulness,” is a binary variable where students are either assigned a 1 (indicating they practice mindfulness) or a 0 (indicating they do not). The probability of practicing mindfulness is set at 0.20, meaning roughly 20% of the students are expected to practice mindfulness. The rbinom() function is used to generate random numbers from a binomial distribution. The arguments include n (the number of students), size (this indicates that there is one trial for each observation, meaning that each student either “succeeds” or “fails” in a single trial — i.e., practices mindfulness or does not), and prob (this represents the probability of success in each trial — here, each student has a 20% chance of practicing mindfulness).
From this population, we can compute the proportion of college students who practice mindfulness. Since we’re imagining that we just simulated data for the whole population, this is the population parameter.
This script calculates the true prevalence of mindfulness by computing the mean of the “Mindfulness” variable, which provides the proportion of students who practice mindfulness in the simulated population. This calculated prevalence reflects the overall rate of mindfulness practices among the 5,000 students. The cat() function in R is used to concatenate and print objects. It converts its arguments to character strings and outputs them, it’s just a convenient way here to label the output for this activity.
Two approaches to sampling
Now, let’s imagine that you and your colleague differ in your approach to producing a sample of students to study. One of you chooses to draw a random sample of students from the population (Scenario 1), and the other chooses to collate a sample of students by placing advertisements to participate in a study of college student well-being around campus and invite students to participate in the study.
Scenario 1
You choose to draw a simple random sample of students from a roster of all students at your university.
Purpose of Random Sampling
The goal of simple random sampling is to create a subset of the population that is representative of the entire population. Since each student has an equal chance of being selected, the sample should (on average) reflect the true distribution of characteristics (like mindfulness practice) in the population. This method ensures that the sample is unbiased and doesn’t over-represent or under-represent any group.
Drawing a True Random Sample
The slice_sample() function is used to draw a random sample of 500 students from the population. In a true random sample, every student in the population has an equal chance of being selected, regardless of whether or not they practice mindfulness or have any other characteristics.
The code below simulates this random data generating process — every student has an equal chance of being selected, regardless of whether they practice mindfulness or not.
Press Run Code on the code chunk below to simulate your sample.
After drawing the random sample, you estimate the prevalence of mindfulness in the population. The code below computes that proportion.
Scenario 2
Your colleague chooses to recruit students by posting flyers around campus asking students to volunteer for a “study about college student well-being.”
Here’s what happens: Students who already practice mindfulness are much more interested in participating in a wellness study than students who don’t. So when flyers go up, mindfulness practitioners are more likely to sign up.
The code below simulates this self-selection bias (i.e., a biased data generating process). It assigns mindfulness practitioners a higher chance (0.8) of volunteering compared to non-practitioners (0.2), then creates a sample where mindfulness students are overrepresented.
Press Run Code on the code chunk below to simulate your sample.
After collating the sample, you estimate the prevalence of mindfulness. The code below computes that proportion.
Discuss your results
Discussion Guide: Comparing Random and Non-Random Sampling
Potential Bias in the Selected Sample:
- Discuss how the sampling process affected the composition of each sample:
- How does this affect the conclusions you can draw from the study?
- Explore the consequences of using the non-random sampling process:
- Would the findings from the study apply to the general population?
Real-World Applications:
- Reflect on when it might be acceptable to use non-random sampling (e.g., when convenience or targeting specific groups is necessary).
- Highlight the importance of acknowledging and adjusting for bias when using non-random samples in research.
- How might this issue affect your own research?