Lecture Materials for Week 6
September 29, 2025
Exam Day

Study Guide for Midterm
dplyr Verbs
Be able to:
- Define what each verb does:
select(),filter(),arrange(),mutate(),summarize(),group_by(),left_join().
Common R Operators
Review the meaning of these:
- Logical operators:
==,!=,|,&,%in% - Assignment:
<- - Pipe:
|> - Package function access:
:: - Comparison:
<=Make sure you can match operator to description.
Variable Types Basics
Understand:
- Variable types: categorical (factors), continuous (doubles/integers).
- What “glimpse” output tells you (rows, columns, data types).
- Identify if a variable is nominal, continuous, or has missing values.
Reading R Code for Basic Wrangling Tasks
Be comfortable interpreting:
filter()with logical conditions.group_by()+summarise()for grouped means.mutate()+case_when()for new variables.arrange(desc(...))for sorting.
Check out this practice activity if you wish: Recreating a Pew Graph
Interpreting Plots
Know:
- The common geometries
- The role of faceting (breaking plots into subplots by category).
- Interpret relationships between two quantitative variables.
- Identify which aesthetic maps to color/shape.
- Distinguish between variables on x vs. y axes.
Normal Distribution Basics
Review:
- Empirical Rule (68–95–99.7).
- Mean, median, and mode in symmetric distributions.
- Interpreting density plots.
- What “density” means.
Applying the Normal Distribution
Be able to:
- Apply the Empirical Rule with mean and SD.
- Estimate ranges for 68%, 95%, 99.7% of data.
- Decide whether an observed value falls within those ranges.
Check out this practice activity from lecture.
Interpret Cross Tabulations for Screening Tests & Disease
Know definitions and calculations:
- True Positives, True Negatives, False Positives, False Negatives.
- Prevalence = proportion of sample with disease.
- Sensitivity
- Specificity
- Positive Predictive Value
Check out these practice activity if you wish: Bayes theorem with cross tabs and interpreting cross tabulations using your own inputs.
Confidence Intervals & t-distribution
Review:
- When to use critical t values.
- How confidence intervals change with sample size or confidence level.
- Standard error for a mean and a proportion
- Interpretation of confidence intervals.
- Random vs. convenience sampling impact.
Check out these practice activity if you wish: using pnorm() and qnorm(), calculating CIs for means, calculating CIs for proportions, dangers of non-random sampling, and dangers of very small samples.
Conceptual Statistical Ideas from Modules 5-9
Key topics:
- t-distribution vs. normal distribution.
- Bootstrapping basics.
- Precision vs. accuracy.
- R functions:
pnorm(),qnorm() - Central Limit Theorem.
- Importance of random sampling.
- Purpose of confidence intervals.
- Probability Mass Function (PMF) and Cumulative Distribution Function (CDF).
- Standard error vs. standard deviation.
Practice Tips
- Write your own
dplyrpipelines. - Sketch histograms, bar charts, and scatterplots and label key features.
- Practice confidence interval calculations with small datasets.
- Work through sensitivity/specificity examples with a 2×2 table.
- Use R functions (
pnorm,qnorm) to answer probability questions. - Practice hand calculations of standard errors and confidence intervals when given summary statistics (sample mean, standard deviation, sample size) and the appropriate critical value (z-score or t-score).
What Won’t be on the Exam?
- Bayesian uncertainty estimation (nothing about Bayesian inference from Module 9)
- I won’t ask you about obscure or rarely used functions/arguments — just the basics that we’ve used many times in lecture and lab
- No complex formulas — just know how to compute a standard error for a mean and a proportion, and the associated CIs.
- Nothing on the binomial distibution (just understand the difference between a PMF, PDF and CDF).
- You won’t actually write any R functions, but I will print out the function call and output for you to interpret (e.g., print out the results of a wrangling pipe and ask you what it’s doing, or print the results of
qnorm()and ask you to use the result to compute a CI.)
Test structure
There are 100 questions, including True/False, Multiple Choice, Matching. My best guess is that it will take the average student about 1.5 hours to complete — but you have the whole class period. Please bring a pencil and a basic calculator (not your phone). In class on Monday, I will give each student four 3x5 index cards — you can hand write whatever you like on these cards and use them during the exam.