Polynomial Regression
The basic regression model assumes that the relationship between x and y is linear. However, in some cases the effect of a given predictor may differ by levels of that very predictor.
That is, the “effect of x” differs as x increases.
Consider the relationship between age and health care expenditures.
Polynomial regression can model the relationship between x and y if it’s not merely linear by incorporating higher order terms of x.
This accounts for the shifting influence of x on y at various x values.
\(y\) regressed on \(x\) and \(x^2\) accommodates a single curve bend (quadratic model).
\(y\) regressed on \(x\), \(x^2\), and \(x^3\) accommodates two bends (cubic model).
Each subsequent higher order term allows for an additional bend.
40 participants are randomly assigned to varying levels of minutes spent practicing for a visual discrimination test (0, 2, 4, 6, 8, 10, 12, or 14 minutes). Subsequently, a test on visual discrimination is conducted, and each participant’s score on the test is recorded.
The experiment revolves around two variables:
practice: the experimentally assigned duration of practice
score: the score on the test.
Press Run Code to import the data and take a look at the scores:
Press Run Code to examine the linear relationship between the two variables.
Press Run Code to consider if a curvilinear relationship fits the data better.
What is the approach?
Test a series of models, each progressively incorporating an additional polynomial term (e.g., \(x\), \(x^2\), \(x^3\), etc.).
We strive to adopt the most parsimonious model, so we begin with the simplest model and continue testing higher-order models until we find that the latest added term no longer noticeably enhances the model’s fit to the data.
At this point, we opt for the previous model, where the highest-order term significantly influenced the model’s ability to capture the curvature in the relationship.
Crucially, all lower-order terms are retained in the model if the highest-order term is deemed necessary, regardless of whether the estimates of the lower terms have values close to zero.
First, we need to create the polynomial terms, we’ll consider up to 2 bends.
Press Run Code on the code chunk below to produce a squared term and cubic term for minutes spent practicing.
Press Run Code on the code chunk to fit a linear model.
Press Run Code on the code chunk to assess the \(R^2\) and sigma for the linear model.
Pres Run Code on the code chunk below to examine whether a quadratic model better fits the data. That is: Would a curvilinear model with a single bend better fit the data?
Pres Run Code on the code chunk below to examine whether a cubic model better fits the data — that is: Do we need a third bend to describe the relationship?
\[ \hat{y_i} = {b_0} + ({b_1}\times{x_i}) + ({b_2}\times{x^2_i}) \]
\[ \hat{y_i} = 1.703 + (3.517\times{x_i}) + (-.142\times{x^2_i}) \]
Where \({x_i}\) refers to practice and \({x^2_i}\) refers to practice2.
The estimate for the intercept is the predicted test score for people who practice 0 minutes (i.e., practice = 0). That is, if someone doesn’t practice, we predict they will score a 1.7 on the test.
\[ \hat{y_i} = {b_0} + ({b_1}\times{x_i}) + ({b_2}\times{x^2_i}) \]
There is not one slope that relates practice to score – rather there are many slopes depending on the level of x. The slope of a line drawn tangent to the parabola at a certain x is estimated by:
\[ {b_1} + (2\times{b_2}\times{x}) \]
\[ 3.517 + (2\times-.142\times{x}) \]
\[ {b_1} + (2\times{b_2}\times{x}) \] \[ 3.517 + (2\times-.142\times{x}) \]
\[ 3.517 + (2\times-.142\times0) = 3.52 \]
When practice = 0 the slope is: \(3.517 + (2\times-.142\times0) = 3.52\)
When practice = 8 the slope is: \(3.517 + (2\times-.142\times8) = 1.25\)
When practice = 14 the slope is: \(3.517 + (2\times-.142\times14) = -.46\)
There is a value of x along the curve when the slope drawn tangent to the line is 0.
In other words, this is the point at which y-hat takes a maximum value if the parabola is a mound or a minimum value if the parabola is a bowl.
This point is called the vertex and the x-coordinate of the vertex can be estimated with the following formula:
\[ {-b_1} \div (2\times{b_2}) \]
\[ {-b_1} \div (2\times{b_2}) \]
For our example, the vertex is: \(-3.517 \div (2\times{-.142}) = 12.39\)
In our case, the practice time at which the predicted score is maximized is about 12.4 minutes. This is the point where the effect of practicing goes from positive to negative.
Press Run Code on the code chunk below to see how the marginaleffects function can calculate the changing slopes for you.