PSY 652: Research Methods in Psychology I

Classification of Data Science Tasks

Kimberly L. Henry: kim.henry@colostate.edu

Three primary tasks

Description

  • Summarizes features of the world (e.g., diabetes prevalence, social network visualization).
  • Ranges from simple (e.g., mean) to advanced (e.g., cluster analysis).

Prediction

  • Uses variables (features) to predict an outcome.
  • Ranges from simple (e.g., correlation) to advanced (e.g., complex regression, machine learning).

Causal Inference

  • Identifies the causal effect of an intervention or condition on outcomes (e.g., determining how much vaccination reduces infection rates).
  • Involves randomized experiments or applying methods like matching and instrumental variables to observational data.

Important Applications

Key applications of Description: COVID-19

Key applications of Description: Biodiversity Loss

Key applications of Prediction: Hurricane Forecasting

Key applications of Prediction: Medical Decision Making

Key applications of Causal Inference: RCT

Key application of Causal Inference: Air Pollution

Leveraging the Three Data Science Tasks

An Example: Physical Activity and Cardiovascular Health

Description

  • What are you trying to learn?

    • What is the average daily physical activity level (in minutes) among adults aged 40-65 in the U.S.?
  • What are the ideal data to answer this question?

    • A large national health survey (e.g., NHANES) that includes self-reported or accelerometer-measured physical activity data.

    • Demographic information (age, sex, education, etc.) to describe the population.

  • What’s the purpose of this?

    • To summarize and understand the baseline levels of physical activity in this population, which could inform public health recommendations and awareness campaigns.

Prediction

  • What are you trying to learn?

    • Can wearable device data (e.g., heart rate, step count, and sleep quality) predict the likelihood of developing cardiovascular disease in the next five years among adults aged 40-65?
  • What are the ideal data to answer this question?

    • Longitudinal data from wearable fitness trackers that capture continuous measures like heart rate variability, step count, sleep patterns, and stress levels.

    • Follow-up clinical data on cardiovascular health outcomes (e.g., diagnosis of hypertension, heart disease).

  • What’s the purpose of this?

    • To explore how real-time, personalized data from wearables can be used to predict cardiovascular disease risk, potentially offering more timely and individualized health interventions and a method for early detection of disease.

Causal inference

  • What are you trying to learn?

    • Does using an Apple Watch with personalized activity tracking and health notifications lead to a reduction in cardiovascular disease incidence compared to standard care in adults aged 40-65?
  • What are the ideal data to answer this question?

    • A randomized controlled trial (RCT) involving adults aged 40-65 at risk for cardiovascular disease but without a prior diagnosis.
    • Participants are randomly assigned to either receive an Apple Watch, which tracks their physical activity, heart rate, and sleep while providing personalized goals and health insights, or to a control group receiving standard care with general physical activity advice.
    • The primary outcome is incidence of cardiovascular disease 5 years post-intervention, with secondary outcomes including changes in physical activity, heart rate variability, and other health indicators.
  • What’s the purpose of this?

    • To determine whether the use of an Apple Watch, which offers personalized activity monitoring and health alerts, has a causal effect on reducing the risk of cardiovascular disease compared to receiving standard health advice.

Your turn