Causal Inference
Causal inference is about more than finding patterns in our data â itâs about understanding what causes what. It asks: âIf we change one thing, what happens to another?â
To do this, we build scientific models that explain the relationships between variables, using logic and theory to guide us.
Why start with the scientific model? Because if our understanding of how things are connected is wrong, no statistical method can fix it.
An Example: Countries with faster internet speeds (X) have stronger economies (Y). But, correlation is likely spurious because wealthier countries are more likely to invest in advanced infrastructure, including high-speed internet. Economic strength and internet speed are both outcomes of higher national wealth.
An Example: When cruise control is set on a car, changes in road incline (X) causally affect the engineâs power output (Z) to maintain the set speed (Y). Despite the causal effect of road incline on engine power, the vehicle speed Y remains constant due to the cruise control. Data collected on vehicle speed (Y) and road incline (X) will show no correlation, despite X causing Y. The cruise control is an example of a feedback control mechanism.
Last week, you made a decision to either:
Read the Module 18 material (\(X = 1\))
Not read the Module 18 material (\(X = 0\))
Which of these actions will be better for your comprehension of todayâs lecture?
We have two potential outcomes in this scenario:
One represents your comprehension if you read the material: \(Y^{X = 1}\)
The other represents your comprehension if you didnât read the material: \(Y^{X = 0}\)
If you read the Module 18 material prior to class â then we will observe \(Y^{X = 1}\). The other potential outcome, \(Y^{X = 0}\), wonât be observed and will remain the counterfactual (i.e., since X = 0 didnât occur â this is the outcome that canât be observed).
If you didnât read the Module 18 material prior to class â then we will observe \(Y^{X = 0}\). The other potential outcome, \(Y^{X = 1}\), wonât be observed and will remain the counterfactual (i.e., since X = 1 didnât occur â this is the outcome that canât be observed).
The individualâlevel causal effect is \(Y^{X = 1} - Y^{X = 0}\).
That is â the individual-level causal effect is the difference in your comprehension of the lecture if you read the Module 18 material versus if you didnât read the material.
This difference is unknowable!
This is the fundamental problem of causal inference (Holland, 1986).
Letâs imagine that I randomly assign half of you to read the Module 18 material and the other half to not read the material.
If youâre assigned to read the material, and you read it â then weâll observe \(Y^{X = 1}\).
If youâre not assigned to read the material, and you donât read it â then weâll observe \(Y^{X = 0}\).
Therefore, we can compute the Average Causal Effect as:
\[ E\left(Y_i^{X=1}\right) - E\left(Y_i^{X=0}\right) \]
⌠where \(E\) represents expectation (i.e., the average or mean value of a random variable if the process were repeated infinitely many times).
Randomization is a tremendously powerful procedure for examining causal effects. However, itâs not always perfect in practice:
Non-compliance: Sometimes participants donât follow instructions, which can affect the treatment assignment.
Drop-out: Participants may fail to provide outcome data, leading to missing information.
Unmanipulable causes: In some cases, the cause of interest cannot be directly manipulated (e.g., the causal effect of emotional responses).
Ethical and practical constraints: Randomization might be unethical (e.g., studying the causal effect of drug use) or unfeasible (e.g., studying the effects of socioeconomic status).
In these scenarios, itâs helpful to think about the entire causal network underlying the data. Graphical notation can make it easier to understand and analyze these complex relationships.
DAGS are a powerful and intuitive tool for representing causal relationships.
Heuristic Models: DAGs are simplified representations of a causal model, focusing on the structure of relationships between variables rather than the exact mechanisms or mathematical details.
Directed: Each arrow shows the direction of influence between two variables. For example, an arrow from X â Y means X is assumed to cause Y (but not the other way around).
Acyclic: DAGs do not allow loops or cyclesâno variable can directly or indirectly cause itself. This ensures clarity in identifying causal pathways.
Anonymous Influence: DAGs donât specify how variables influence each other (e.g., linear or nonlinear effects); they only indicate that one variable influences another.
Start by drawing the exposure (e.g., reading the material) and the outcome (e.g., comprehension of the lecture).
Draw all common causes of X and Y â including measured and unmeasured variables.
As you add new variables, include any common causes of any pair of variables in your DAG. This ensures that the DAG accounts for potential bias introduced by these shared causes.
If there are variables through which X affects Y (i.e., mediators), draw arrows from X to the mediator(s), and from the mediator(s) to Y.
Include all selection variables â these are variables that represent selection processes that occur as part of the study â e.g., drop out, death, non-compliance.
Drawing a DAG requires expert knowledge and must be complete! NOT including an arrow implies an ASSUMPTION that there is NO CAUSAL EFFECT.
If instead of randomly assigning students to read/not read â I simply assessed whether students chose to read the material â will the difference in comprehension between those who chose to read vs. not read still represent an average causal effect?
Path: A path is a sequence of arrows between two variables in a Directed Acyclic Graph (DAG). The path must not pass through any variable more than once. The direction of the arrows is irrelevant when defining a path.
Causal Paths: A causal path is a path in which all arrows point in the same direction, starting from the cause (X) and ending at the effect (Y). This path represents a direct or indirect causal relationship from X to Y.
Non-Causal Paths: A non-causal path includes at least one arrow that points toward X (or away from Y when tracing the path). These paths may introduce confounding, colliding, or other non-causal associations between X and Y.
READ -> COMPREHEND (causal path)
READ <- MOTIVATION -> COMPREHEND (non-causal path/back-door)
A back-door path connects the exposure (X) to the outcome (Y) through a common cause â also called a confounder.
Conditioning on the confounder (e.g., adjusting for MOTIVATION) blocks the back door path.
This breaks the spurious connection and isolates the causal effect.
READ <- MOTIVATION -> COMPREHEND
READ -> COMPREHEND (causal path)
READ -> COMPREHEND (causal path)
READ -> ACTUAL_READ -> COMPREHEND (causal path)
READ -> ACTUAL_READ <- MOTIVATION -> COMPREHEND (non-causal path)
READ -> ACTUAL_READ -> COMPREHEND
READ -> ACTUAL_READ <- MOTIVATION -> COMPREHEND
If You Want the Total Effect:
Do not adjust for the mediator (i.e., ACTUAL_READ)
Simply estimate the overall effect of READ -> COMPREHEND
Otherwise, you will âexplain awayâ part of the effect of interest.
By default, collider paths are blocked, meaning they do not create a spurious association between the exposure and the outcome.
But, beware, if you condition on a collider â you will open a back door path and create a spurious association.
By conditioning on ACTUAL_READ (e.g., analyzing only those who complied or controlling for compliance in a model), you open a non-causal path between READ and MOTIVATION that would otherwise remain blocked.
This creates a spurious association between READ and COMPREHEND, undermining the causal interpretation.
Revel in the beauty of your experiment, and donât include any post-treatment variables.
What are all paths that link Smoking and Cancer in this DAG?
SMK -> CAN
SMK <- SES -> CAN
SMK <- SES -> DTH <- CAN
SMK <- SES <- CI -> DTH <- CAN