A webR tutorial

Construct a Confidence Interval for a Mean using Frequentist Approaches

Introduction

You and a colleague were hired by a large corporation to assess the average skill level of employees for a critical skill. A score of 70 on a designated skill test indicates the minimum sufficient skill level. The company needs assurance that the average skill level of its employees exceeds this threshold.

You randomly sample 250 employees and administer the skill test (possible scores range from 0 to 100). Your task is to compute the 99% confidence interval for the mean skill level using both a bootstrap approach and a parametric (theory-based) approach. Based on the confidence intervals, determine whether there is evidence that the average skill level exceeds the sufficient level of 70. That is, determine if the corporation can feel confident that the average skill level exceeds the threshold.

Simulate the data

Press Run Code on the code chunk below to simulate the data frame for this example.

generate data
code explanation

This code creates a data frame called skill_data for 250 employees. The skill of the employees (called skill_score) is normally distributed (the rnorm() function generates a normally distributed variable for n cases (i.e., 250 employees in this case), with a specified mean and standard deviation).

Plot the data

In the code chunk below, please create a histogram of skill_score.

code
get help

Study the graph for a few moments and jot down some observations.

Compute confidence intervals

Working in pairs, one member of your team should compute the 99% CI for the mean skill score using a bootstrap approach (utilize 1000 bootstrap resamples) and the other should compute the 99% CI for the mean skill score using a theory-based/parametric approach. Please do the work in the code chunks below. When done, teach one another how you accomplished the task and compare the two 99% CIs.

Parametric/Theory-based CI

code
get help

Further explorations

If you’d like to further build your intuition about confidence intervals, you could explore the following modifications (now if there’s time, or later):

Varying Confidence Levels

Task: Compute confidence intervals at different levels (e.g., 90%, 95%, and 99%) and compare the widths of the intervals.
Goal: This will help you understand that higher confidence levels lead to wider intervals and why that might be important in practice.
Discussion Point: What are the trade-offs between narrower confidence intervals and the level of certainty in the conclusions?

Smaller Sample Sizes

Task: Compute confidence intervals using smaller sample sizes (e.g., 50 or 100 employees) to see how sample size affects the confidence interval’s width and precision.
Goal: This will help you grasp how larger samples lead to more precise estimates and narrower intervals.
Discussion Point: How much data is needed to ensure a reliable estimate?