Variable | Description |
---|---|
country | The name of the country |
iee | Intergenerational earnings elasticity |
gini | The Gini Coefficient |
Apply and Practice Activity
The Gatsby Curve as a SLR
Introduction
For this activity you will fit a simple linear regression (SLR) to the Gatsby Curve data that we considered in Module 3.
Please follow the steps below to complete this activity.
Step by step directions
Step 1
Navigate to the apply_and_practice_programs folder in the programs folder of the foundations project. Open up the file called gatsby.qmd.
To ensure you are working in a fresh session, close any other open tabs (save them if needed). Click the down arrow beside the Run button toward the top of your screen then click Restart R and Clear Output.
Once the .qmd file is open, add your name to the author section of the YAML metadata.
Step 2
To begin, we need to load the packages that are needed for this activity. Find the first level header labeled
# Load packages
Load the following packages: broom, skimr, here, and tidyverse.
Step 3
Next, we need to import the data — it’s called gatsby.Rds, which is in the data folder of the foundations project.
The gatsby.Rds data frame has three variables:
Find the section in the analysis notebook labeled
# Import data
Inside the code chunk, import the gatsby.Rds data frame. Take a look at the contents of the imported data frame in the Environment tab to inspect it and familiarize yourself with the variables.
Step 4
Next we will obtain some descriptive statistics.
Find the section in the analysis notebook labeled
# Get decriptive statistics
Inside the code chunk, run skim() on the data frame.
Look at the output, and note the minimum, mean, and maximum scores for gini and iee. Write a few sentences to describe these.
Step 5
Find the section in the analysis notebook labeled
# Create a scatterplot
Graph the relationship between gini (on the x-axis) and iee (on the y-axis), and request the best fit line.
Label the data points using the geom_label_repel() function from the ggrepel package. Your graph should look like the graph below. Please try to write the code yourself. If you’re stuck click on the Code tabset below.
After you create the graph, study it and then write a few sentences to describe the graph.
|>
gatsby ggplot(mapping = aes(x = gini, y = iee)) +
geom_point() +
::geom_label_repel(mapping = aes(label = country),
ggrepelcolor = "grey35", fill = "white", size = 2, box.padding = 0.4,
label.padding = 0.1) +
geom_smooth(method = "lm", formula = y ~ x, se = FALSE, color = "darkorange") +
theme_minimal() +
labs(title = "The Great Gatsby Curve: high inequality tends to mean intergenerational \neconomic immobility",
x = "Gini Coefficient",
y = "Intergenerational Earnings Elasticity")
Step 6
Find the section in the analysis notebook labeled
# Fit the SLR
Fit a simple linear regression model — regress iee on gini. Use the tidy() function to request the estimates of the intercept and slope. Use the glance() function to request the \(R^2\). Use augment() to request the .fitted and .resid scores.
Next, use the output to perform the following tasks:
Use the resulting regression equation to calculate the predicted value of iee and the residual for the United States. Do this by hand, then check the .fitted and .resid output from augment() to check your answer.
Use the resulting regression equation to calculate the predicted value of iee when gini is at the minimum score in the sample. Make note of this value for later.
Write a few sentences to interpret the intercept and slope from the model output. Also describe and interpret the \(R^2\).
Step 7
Notice from Step 4 that the minimum score for gini is far above 0 — meaning that 0 is not a meaningful value for this predictor, and thus the intercept of the regression model is not as useful as it could be.
Find the section in the analysis notebook labeled
# Center gini and refit the model
Create a new version of gini that is centered at the lowest score for gini — call this variable gini_c.
Then, refit the SLR using gini_c as the predictor rather than gini. Request the tidy() output and the glance() output.
Note that the intercept of this centered model is equal to the predicted value of iee when gini is at the minimum score in the sample that you calculated in Step 6.
Write a few sentences to describe how the model estimates changed — including which ones changed, and how the new estimate(s) are interpreted. Describe why there is correspondence between the intercept in this centered model, and the predicted value of iee when gini is at the minimum score in the sample (which you calculated in Step 6).
Step 8
Finalize and submit.
Now that you’ve completed all tasks, to help ensure reproducibility, click the down arrow beside the Run button toward the top of your screen then click Restart R and Clear Output. Scroll through your notebook and see that all of the output is now gone. Now, click the down arrow beside the Run button again, then click Restart R and Run All Chunks. Scroll through the file and make sure that everything ran as you would expect. You will find a red bar on the side of a code chunk if an error has occurred. Taking this step ensures that all code chunks are running from top to bottom, in the intended sequence, and producing output that will be reproduced the next time you work on this project.
Now that all code chunks are working as you’d like, click Render. This will create an .html output of your report. Scroll through to make sure everything is correct. The .html output file will be saved along side the corresponding .qmd notebook file.
Follow the directions on Canvas for the Apply and Practice Assignment entitled “Gatsby SLR Apply and Practice Activity” to get credit for completing this assignment.