PSY 652: Research Methods in Psychology I

Non-Linear Transformations

Kimberly L. Henry: kim.henry@colostate.edu

What are Non-Linear Transformations?

  • Non-linear transformations are mathematical modifications applied to variables in a regression model to better capture the underlying relationship between predictors and outcomes.

  • They help address non-linear patterns that simple linear regression cannot explain by transforming the scale of a variable.

Why Use Transformations?

  • Reveal hidden patterns: Non-linear relationships can become more interpretable after transformation.

  • Meet assumptions: Improve model fit by addressing skewness, stabilizing variance, or making relationships more linear.

  • Control for large ranges: Manage variables with wide ranges, e.g., converting income or population size to a logarithmic scale.

Types of Non-Linear Transformations

  1. Upward Transformations: Increase the magnitude of changes, often using powers

    • Quadratic: \(x^2\)

    • Exponential: \(e^x\)

  2. Downward Transformations: Compress changes, making large values less extreme

    • Logarithmic: \(\log(x)\)

    • Square Root: \(\sqrt{x}\)

Example of an upward transformation

Example of a downward transformation

Natural logarithm – A downward transformation

Logarithmic transformations are very common in statistics — in particular the natural log transformation. Denoted as ln, the natural logarithm uses the mathematical constant e (approximately 2.71828) as its base.

The natural logarithm is defined as: \(log_e(x) = c\), and is the power (or exponent) to which the base \(e\) must be raised to produce a given number: \(\ln(x) = c \quad \text{means that} \quad e^c = x\).

An example

What can nonlinear transformations buy us in terms of linear regression modeling?

We’ll use data compiled by Our World in Data on gross domestic product (GDP) per capita and life expectancy of residents in 166 entities/countries.

Press Run Code on the code chunk below to import the data and wrangle it for analysis — including selecting year 2021, creating shorter names, and dropping countries with missing data.

Look at the distribution of each variable

Press Run Code on the code chunks below to create a density plot of Life Expectancy and GDP.

Create a scatter plot of GDP and Life Expectancy

Press Run Code on the code chunk below to create a scatter plot of the raw variables.

Rule of the Bulge

Graphic by Andrew Zieffler

An example

Visualization from https://ds100.org

Explore transformations

Density plot for GDP and ln(GDP)

Press Run Code on the code chunks below to create a density plot of GDP and ln(GDP).

Fit a SLR

In the code chunk below, create a new variable that is the natural log of GDP (call it gdp_ln). Then, regress life expectancy (le) on ln(GDP) (gdp_ln). Request the tidy() output with 95% CIs.

Interpretations

The intercept (21.10) provides the expected life expectancy when ln(GDP) per capita is zero. A GDP per capita of 1 corresponds to a ln(GDP) per capita of 0. Therefore, according to our model, a country with a GDP of 1 would have an expected life expectancy of 21.1 years.

Interpretation of slope

The slope (5.44) indicates that for every one-unit increase in the natural logarithm of GDP per capita, the life expectancy at birth increases by approximately 5.44 years. This indicates a strong positive relationship between the natural log of GDP per capita and life expectancy.

Interpretation in terms of change in actual GDP

A one-percent increase in GDP per capita is associated with a .05 unit increase in life expectancy.

A larger change

Of course, a 1 percent increase in GDP per capita is very small — it is probably more informative to consider a larger increase — for example a 100 percent increase in GDP per capita. In this instance, we’d ask the question: “How much would we expect life expectancy to differ between two countries where one has a GDP per capita that is twice the size of the other?”.

If one country has a GDP that is twice that of the other country (a 100% increase), then we expect the country with the higher GDP to have a life expectancy that is about 3.8 years longer.

Your turn

How much would we expect life expectancy to differ between two countries where one has a GDP per capita that is 50% larger than the other?”

Answer

A 50% increase in GDP is associated with about a 2.2 year increase in life expectancy.

Compare two countries

Let’s compare two countries with different GDP levels: the Democratic Republic of Congo (where GDP per capita = $817 in 2021) and the United States (where GDP per capita = $57,523 in 2021).

To aid with using marginaleffects tools, let’s fit the same model, but with slightly different syntax:

Effect of a $500 increase in GDP

We can use the average_comparisons() function from marginaleffects to examine the expected change in life expectancy if the Democratic Republic of Congo experienced a $500 increase in GDP per capita, and if the USA experienced a $500 increase in GDP per capita.

Effect of a doubling of GDP (100% increase)

Now, let’s see what happens if we double GDP per capita for each country.

A graph

The red lines represents the GDP per capita of the Democratic Republic of Congo and the blue lines represents the GDP per capita of the USA. Solid lines equal GDP per capita in 2021, dashed lines represent a $500 increase, dotted lines represent a 100% increase.

Another quick example

Rather than consider GDP and life expectancy, let’s consider GDP and CO2 emissions. These data also come from Our World in Data. Let’s consider the total GDP (not GDP per capita) and the annual total emissions of carbon dioxide (CO₂) measured in tonnes for the year 2021.

Click Run Code on the code chunk below.

Look at the distribution of each variable

Press Run Code on the code chunks below to create a density plot of CO2 emissions and GDP.

Explore transformations

Click Run Code on the code chunk below. After creating the initial plot — explore the following changes:

  1. Request the natural log of gdp.

  2. Request the natural log of co2.

  3. Request the natural log of gdp and co2.

Fit a SLR with the transformed variables

Create the natural log transformed versions of GDP and CO2 emissions, then fit a SLR to regress ln(CO2 emissions) on ln(GDP). Request the tidy() output with 95% CIs.

Each one unit increase in ln(GDP) is associated with a 0.9881734 unit increase in ln(CO2).

Interpretations in raw metric

The function below can be used to interpret a regression slope in which both x and y have been natural log transformed. Enter in the regression slope from the fitted model, and the desired percent change for the x variable.

A 100% increase in GDP is associated with a 98% increase in CO2 emissions.

Predicted scores

What is the predicted C02 emissions for the USA?

The GDP for the USA is 21,131,600,000,000. What is the natural log of this value?

Solve for the y-hat (predicted ln(CO2)):

Back transform y-hat so it’s in it’s original metric (CO2 in tonnes)

And, we can save ourselves some work by using augment().