Lecture Materials for Week 6

September 29, 2025

Exam Day

Study Guide for Midterm

dplyr Verbs

Be able to:

  • Define what each verb does: select(), filter(), arrange(), mutate(), summarize(), group_by(), left_join().

Common R Operators

Review the meaning of these:

  • Logical operators: ==, !=, |, &, %in%
  • Assignment: <-
  • Pipe: |>
  • Package function access: ::
  • Comparison: <= Make sure you can match operator to description.

Variable Types Basics

Understand:

  • Variable types: categorical (factors), continuous (doubles/integers).
  • What “glimpse” output tells you (rows, columns, data types).
  • Identify if a variable is nominal, continuous, or has missing values.

Reading R Code for Basic Wrangling Tasks

Be comfortable interpreting:

  • filter() with logical conditions.
  • group_by() + summarise() for grouped means.
  • mutate() + case_when() for new variables.
  • arrange(desc(...)) for sorting.

Check out this practice activity if you wish: Recreating a Pew Graph

Interpreting Plots

Know:

  • The common geometries
  • The role of faceting (breaking plots into subplots by category).
  • Interpret relationships between two quantitative variables.
  • Identify which aesthetic maps to color/shape.
  • Distinguish between variables on x vs. y axes.

Normal Distribution Basics

Review:

  • Empirical Rule (68–95–99.7).
  • Mean, median, and mode in symmetric distributions.
  • Interpreting density plots.
  • What “density” means.

Applying the Normal Distribution

Be able to:

  • Apply the Empirical Rule with mean and SD.
  • Estimate ranges for 68%, 95%, 99.7% of data.
  • Decide whether an observed value falls within those ranges.

Check out this practice activity from lecture.

Interpret Cross Tabulations for Screening Tests & Disease

Know definitions and calculations:

  • True Positives, True Negatives, False Positives, False Negatives.
  • Prevalence = proportion of sample with disease.
  • Sensitivity
  • Specificity
  • Positive Predictive Value

Check out these practice activity if you wish: Bayes theorem with cross tabs and interpreting cross tabulations using your own inputs.

Confidence Intervals & t-distribution

Review:

  • When to use critical t values.
  • How confidence intervals change with sample size or confidence level.
  • Standard error for a mean and a proportion
  • Interpretation of confidence intervals.
  • Random vs. convenience sampling impact.

Check out these practice activity if you wish: using pnorm() and qnorm(), calculating CIs for means, calculating CIs for proportions, dangers of non-random sampling, and dangers of very small samples.

Conceptual Statistical Ideas from Modules 5-9

Key topics:

  • t-distribution vs. normal distribution.
  • Bootstrapping basics.
  • Precision vs. accuracy.
  • R functions: pnorm(), qnorm()
  • Central Limit Theorem.
  • Importance of random sampling.
  • Purpose of confidence intervals.
  • Probability Mass Function (PMF) and Cumulative Distribution Function (CDF).
  • Standard error vs. standard deviation.

Practice Tips

  • Write your own dplyr pipelines.
  • Sketch histograms, bar charts, and scatterplots and label key features.
  • Practice confidence interval calculations with small datasets.
  • Work through sensitivity/specificity examples with a 2×2 table.
  • Use R functions (pnorm, qnorm) to answer probability questions.
  • Practice hand calculations of standard errors and confidence intervals when given summary statistics (sample mean, standard deviation, sample size) and the appropriate critical value (z-score or t-score).

What Won’t be on the Exam?

  • Bayesian uncertainty estimation (nothing about Bayesian inference from Module 9)
  • I won’t ask you about obscure or rarely used functions/arguments — just the basics that we’ve used many times in lecture and lab
  • No complex formulas — just know how to compute a standard error for a mean and a proportion, and the associated CIs.
  • Nothing on the binomial distibution (just understand the difference between a PMF, PDF and CDF).
  • You won’t actually write any R functions, but I will print out the function call and output for you to interpret (e.g., print out the results of a wrangling pipe and ask you what it’s doing, or print the results of qnorm() and ask you to use the result to compute a CI.)

Test structure

There are 100 questions, including True/False, Multiple Choice, Matching. My best guess is that it will take the average student about 1.5 hours to complete — but you have the whole class period. Please bring a pencil and a basic calculator (not your phone). In class on Monday, I will give each student four 3x5 index cards — you can hand write whatever you like on these cards and use them during the exam.