PSY 652: Research Methods in Psychology I

Classification of Data Science Tasks

Kimberly L. Henry: kim.henry@colostate.edu

Three primary tasks

Description

  • Summarizes features of the world (e.g., diabetes prevalence, social network visualization).
  • Ranges from simple (e.g., mean) to advanced (e.g., cluster analysis).
  • A sample that is representative of the population is “extra essential” here.

Prediction

  • Uses variables (features) to predict an outcome.
  • Ranges from simple (e.g., correlation) to advanced (e.g., complex regression, machine learning).

Causal Inference

  • Identifies the causal effect of an intervention or condition on outcomes (e.g., determining how much vaccination reduces infection rates).
  • Involves randomized experiments or applying methods like confounder adjustment, matching, and instrumental variables to observational data.

Important Applications

Key applications of Description: COVID-19

Article header: Comparisons between countries are essential for the control of COVID-19

A line graph of COVID-19 deaths for various areas.

Key applications of Description: Biodiversity Loss

Article header: Using Red List Indices to monitor extinction risk at national scales

A map of the US that shows where biodiversity is most at risk.

Key applications of Prediction: Hurricane Forecasting

The front page of a report on hurricane forecasting put out by NOAA

Key applications of Prediction: Medical Decision Making

Key applications of Causal Inference: RCT

Article header: Effect of Apriin on All-Cause Mortality in the Health Elderly

Line graphs depicting mortality due to various diseases as a function of aspirin usage

Key application of Causal Inference: Air Pollution

Article header: Advances in Causal Inference at the Intersection of Air Pollution and Health Outcomes

Leveraging the Three Data Science Tasks

An Example: Physical Activity and Cardiovascular Health

Description

  • What are you trying to learn?

    • What is the average daily physical activity level (in minutes) among adults aged 40-65 in the U.S.?
  • What are the ideal data to answer this question?

    • A large national health survey (e.g., NHANES) that includes self-reported or accelerometer-measured physical activity data.

    • Demographic information (age, sex, education, etc.) to describe the population.

  • What’s the purpose of this?

    • To summarize and understand the baseline levels of physical activity in this population, which could inform public health recommendations and awareness campaigns.

Prediction

  • What are you trying to learn?

    • Can wearable device data (e.g., heart rate, step count, and sleep quality) predict the likelihood of developing cardiovascular disease in the next five years among adults aged 40-65?
  • What are the ideal data to answer this question?

    • Longitudinal data from wearable fitness trackers that capture continuous measures like heart rate variability, step count, sleep patterns, and stress levels.

    • Follow-up clinical data on cardiovascular health outcomes (e.g., diagnosis of hypertension, heart disease).

  • What’s the purpose of this?

    • To explore how real-time, personalized data from wearables can be used to predict cardiovascular disease risk, potentially offering more timely and individualized health interventions and a method for early detection of disease.

Causal inference

  • What are you trying to learn?

    • Does using an Apple Watch with personalized activity tracking and health notifications lead to a reduction in cardiovascular disease incidence compared to standard care in adults aged 40-65?
  • What are the ideal data to answer this question?

    • A randomized controlled trial (RCT) involving adults aged 40-65 at risk for cardiovascular disease but without a prior diagnosis.
    • Participants are randomly assigned to either receive an Apple Watch, which tracks their physical activity, heart rate, and sleep while providing personalized goals and health insights, or to a control group receiving standard care with general physical activity advice.
    • The primary outcome is incidence of cardiovascular disease 5 years post-intervention, with secondary outcomes including changes in physical activity, heart rate variability, and other health indicators.
  • What’s the purpose of this?

    • To determine whether the use of an Apple Watch, which offers personalized activity monitoring and health alerts, has a causal effect on reducing the risk of cardiovascular disease compared to receiving standard health advice.

Your turn

A box that delineates Description, Prediction, and Causal Inference tasks -- each with the questions: What are you trying to learn? What are the ideal data to answer this question? What's the purpose of this?