A webR tutorial
Exploring cross tabulations

Recap: Our Fake News Investigation
In class, we analyzed two variables from a study of online news articles:
Variable A: Does the article have an exclamation point in the title? (No, Yes)
Variable B: Is the article fake? (No, Yes)
We learned that:
P(Exclamation Point) = 0.12 (12% of articles have exclamation points)
P(Fake) = 0.40 (40% of articles are fake)
P(Fake | Exclamation Point) = 0.889 (89% of articles with exclamation points are fake)
The custom function, called analyze_binary_association(), constructs a complete joint probability distribution table from these three estimates. It then calculates the key marginal, joint, and conditional probabilities discussed in the lecture. Press Run Code on the code chunk below to see the results.
Summarizing the results
The analysis revealed a strong association between exclamation points and fake news. Articles with exclamation points in their titles are much more likely to be fake (89%) compared to articles without exclamation points (33%). This 56-percentage-point difference is the risk difference (RD), reflecting the absolute gap in probabilities.
The risk ratio (RR) provides a relative comparison: the probability of an article being fake is about 2.7 times higher when it contains an exclamation point than when it does not.
Moving forward
Now let’s explore an alternative situation — imagining a world where the two variables have no relationship — when they are independent.
Key Concept
Independence: Two variables are independent if knowing the value of one tells you nothing about the other.
Mathematical definition:
\[
P(B \mid A) = P(B)
\]
That is, the probability of \(B\) happening is the same whether \(A\) happens or not.
Your Task
Step 1: Form a hypothesis
If “Exclamation Point” and “Fake Article” were independent, what value would we expect for \(P(\text{Fake} \mid \text{Exclamation})\)? \[
P(\text{Fake} \mid \text{Exclamation}) = P(\text{Fake}) = 0.40
\]
Step 2: Test independence
Run the function again using your hypothesized value for \(P(\text{Fake} \mid \text{Exclamation})\).
Step 3: Verify independence properties
If the two variables are truly independent, these relationships should hold:
Association Strength:
- The risk difference (RD) should equal 0, because the probabilities of Fake are the same for both Exclamation=Yes and Exclamation=No.
- The risk ratio (RR) should equal 1, since the ratio of equal probabilities is 1.
- The risk difference (RD) should equal 0, because the probabilities of Fake are the same for both Exclamation=Yes and Exclamation=No.
Conditional Probabilities:
\[ P(\text{Fake} \mid \text{Exclamation}) = P(\text{Fake} \mid \text{No Exclamation}) = 0.40 \]Joint Probability Rule:
\[ P(A \text{ and } B) = P(A) \times P(B) \]Check:
If independent, then:
\[ P(\text{Exclamation=Yes AND Fake=Yes}) = 0.12 \times 0.40 = 0.048 \]Compare this expected joint probability (0.048) with the observed joint probability from the table.
Challenge: Design your own cross tabulation
The best way to learn is to apply these concepts to a question you find interesting.
Your Challenge: Think of two binary variables related to your own research interests.
Need inspiration? Here are some ideas to spark your creativity:
Social Psychology: Does scrolling through social media (A) affect your self-reported feeling of loneliness (B) later that day?
Cognitive Psychology: After using practice testing (e.g., flashcards, past exams) for a topic (A), were you able to get an A on the test (B)?
Developmental Psychology: For a parent, did reading a physical book (not a screen) before bed (A) correlate with the child taking less time to fall asleep (B)?
Form Your Hypothesis:
What are your estimated probabilities for each variable, P(A) and P(B)? (e.g. What’s the probability of practice testing? What’s the probabiity of getting an A?)
What is your intuitive guess for the conditional probability, P(B|A)? (e.g., Given you used practice testing to study, what is the probability you got an A on the test?)
There is no right or wrong hypothesis. The goal is to practice turning a hunch about human behavior into a testable question and then use the cross-tabulation to see what the relationship looks like.