A webR tutorial
Construct a Confidence Interval for a Mean using Frequentist Approaches
Introduction
You and a colleague were hired by a large corporation to assess the average skill level of employees for a critical skill. A score of 70 on a designated skill test indicates the minimum sufficient skill level. The company needs assurance that the average skill level of its employees exceeds this threshold.
You randomly sample 250 employees and administer the skill test (possible scores range from 0 to 100). Your task is to compute the 99% confidence interval for the mean skill level using both a bootstrap approach and a parametric (theory-based) approach. Based on the confidence intervals, determine whether there is evidence that the average skill level exceeds the sufficient level of 70. That is, determine if the corporation can feel confident that the average skill level exceeds the threshold.
Simulate the data
Press Run Code on the code chunk below to simulate the data frame for this example.
This code creates a data frame called skill_data for 250 employees. The skill of the employees (called skill_score) is normally distributed (the rnorm()
function generates a normally distributed variable for n cases (i.e., 250 employees in this case), with a specified mean and standard deviation).
Plot the data
In the code chunk below, please create a histogram of skill_score.
Study the graph for a few moments and jot down some observations.
Compute confidence intervals
Working in pairs, one member of your team should compute the 99% CI for the mean skill score using a bootstrap approach (utilize 1000 bootstrap resamples) and the other should compute the 99% CI for the mean skill score using a theory-based/parametric approach. Please do the work in the code chunks below. When done, teach one another how you accomplished the task and compare the two 99% CIs.
Bootstrap CI
Parametric/Theory-based CI
Further explorations
If youβd like to further build your intuition about confidence intervals, you could explore the following modifications (now if thereβs time, or later):
Varying Confidence Levels
Task: Compute confidence intervals at different levels (e.g., 90%, 95%, and 99%) and compare the widths of the intervals.
Goal: This will help you understand that higher confidence levels lead to wider intervals and why that might be important in practice.
Discussion Point: What are the trade-offs between narrower confidence intervals and the level of certainty in the conclusions?
Smaller Sample Sizes
Task: Compute confidence intervals using smaller sample sizes (e.g., 50 or 100 employees) to see how sample size affects the confidence intervalβs width and precision.
Goal: This will help you grasp how larger samples lead to more precise estimates and narrower intervals.
Discussion Point: How much data is needed to ensure a reliable estimate?