Creating charts with ggplot2
To access the interactive graphic, press and hold control, click the link below, and choose “Open Link in New Tab”.
What variable is on the x-axis?
What variable is on the y-axis?
What does each bubble represent?
What do the colors of the bubbles represent?
What do the sizes of the bubbles represent?
The World Bank DataBank is an online platform that allows users to access and explore a vast collection of economic, financial, and social data frames. The DataBank offers free access to data from various sources within the World Bank, including the World Development Indicators (WDI) which we’ll explore today.
This package for R is a tool that provides an interface to the World Bank’s World Development Indicators (WDI) database directly from R using the World Bank’s Application Programming Interface (API).
Please press Run Code on the code chunk in order to create our desired data frame.
Press Run Code on the code chunk to display the data using the glimpse() function.
If you want to explore, you can add the width argument to the code below. Put your cursor inside the parentheses after glimpse then type width = 100
. Remember that R is case sensitive. Press Run Code again to see the result. If you want to reset the code chunk to the original state, click on the start over button on the right.
country: The names of the countries included in the data frame.
region: The regional grouping of each country.
life_expectancy: The average life expectancy at birth, measured in years, for each country in 2022. It reflects the average number of years a newborn is expected to live under current mortality rates.
gdp_per_capita: The Gross Domestic Product (GDP) per capita for each country, measured in current US dollars. It represents the average economic output per person and is an indicator of the country’s economic health.
population: The total number of people living in each country in 2022.
Layer | Description |
---|---|
Data | The data to be plotted. |
Aesthetics | The mapping of variables to elements of the plot. |
Geometrics | The visual elements to display the data -- e.g., bar graph, scatterplot. |
Facets | Plotting of small groupings within the plot area -- i.e., small multiples. |
Statistics and Transformations | Additional statistics imposed onto plot or numerical transformations of variables to aid interpretation. |
Coordinates | The space on which the data are plotted -- e.g., Cartesian coordinate system (x- and y-axis), polar coordinates (pie chart). |
Themes | All non-data elements -- e.g., background color, labels. |
Let’s focus on the first 3 layers to begin! These are ALWAYS required.
Using the wdi_2022 data frame, aesthetically map gdp_per_capita to the x-axis and life_expectancy to the y-axis. Draw a scatter plot. Replace the three underscores with the correct code to accomplish this task
You might notice that we’re missing two aesthetics from the gapminder chart. Let’s now map population to the size of the points, and region to the color of the points.
Layer | Description |
---|---|
Data | The data to be plotted. |
Aesthetics | The mapping of variables to elements of the plot. |
Geometrics | The visual elements to display the data -- e.g., bar graph, scatterplot. |
Facets | Plotting of small groupings within the plot area -- i.e., small multiples. |
Statistics and Transformations | Additional statistics imposed onto plot or numerical transformations of variables to aid interpretation. |
Coordinates | The space on which the data are plotted -- e.g., Cartesian coordinate system (x- and y-axis), polar coordinates (pie chart). |
Themes | All non-data elements -- e.g., background color, labels. |
Perhaps rather than putting all countries on one plot, we’d prefer to have a separate plot for each region. We can use the facet_wrap() function to accomplish this. This is an example of a Facet layer. To use the function, be sure to add a + sign at the end of the last line of code, return to a new line, and then list the function name. Inside the parentheses, list a tilde (~
) and then the variable you want to facet by (region). Additionally, use the ncol
and nrow
arguments within the facet_wrap() function to request 2 columns and 4 rows.
GDP has a vast range ($259 for Burundi to $187,267 for Lichtenstein), with most countries stacked toward the lower end of GDP. As you’ll learn about in Module 15, it’s often useful to apply a non-linear transformation to these types of heavily skewed variables. This is a form of a Statistics/Transformation layer.
The scale_x_log10() function in ggplot2 is used to transform the x-axis of a plot to a logarithmic scale with base 10. Add a layer (remember to put the + sign at the end of the last line of existing code), then apply the transformation function.
If we wish, we could zoom into a certain part of the graph — for example, to focus on countries with a GDP between 1 and 25,000 USD. We can use the coord_cartesian() function for this — a Coordinates layer.
The argument to the function needed here is xlim
for x-axis limits. To restrict the graph to x scores between 1 and 25,000 we use:
coord_cartesian(xlim = c(1, 25000))
Let’s style our graph a bit. Here we’ll add labels. Namely a title, and x- and y-axis labels. The labs() function is used to add this theme layer.
We can also change the overall theme of the graph. There are several built in themes to choose from. Let’s look at a few.
Create a density graph of life expectancy. Enhance and style it as you like.
Create a box plot of life expectancy by region. Enhance and style it as you like.