Polynomial Regression: Background
00:00 Linearity is mathematically the nicest case that you can have. However, sometimes you may want to use higher order terms to see whether incorporating them might give you a better model for your phenomenon.
00:13 In this lesson, we’ll take a look at polynomial regression.
00:17 In polynomial regression with only one independent variable, what we’re seeking is a regression model that contains not only the linear term, but also possibly a quadratic term, a cubic term, and then a term up to some higher order, say x to the power of k.
00:35 One of the reasons why you may want to use a polynomial regression model is to take into account the fact that your data may not be linear. It may not be modeled well using a linear model, and so you want to take into account some possible non-linear effects using a polynomial regression model. Now, when you have multiple variables—for example, say that you have an input where you’ve got two components for the input—then a quadratic model would take this form.
01:04 So you’ve got the two linear terms for both of the components for the input x₁ and x₂, but then you’re going to have a quadratic term for x₁, a quadratic term for x₂, and then this term that involves the multiplication of the component x₁ and x₂.
01:20 And this type of term is called a mixed term. So x₁x₂, we’re mixing the two components of the input.
01:31 When you go up to higher order regression models with many variables, the notation gets a little bit more complicated, but you can write down the model in a similar way using multi-variate notation.
01:43 For the purpose of this course, though, we’re going to stick with quadratic and cubic models.
01:49 They may seem a bit more complicated, but in actuality, polynomial regression problems, they can be solved using the same ideas from linear regression, which is kind of cool. So for example, say you wanted to solve this cubic model.
02:04 So you want to define a regression model that involves only one single input variable. So we’ve got a scalar problem. But we want to not just get linear terms. We want quadratic and cubic terms.
02:17 So how are we going to solve this regression model? Remember in the background, we’ve got observation data. So we’ve got observations for the input x and the corresponding observation for the response, y. We don’t have measurements for x² and x³ though.
02:33 Well, the idea is going to be the following: we want to convert this cubic regression problem, so involving x³ and up to x², and we want to convert it into a linear regression problem in a multiple variable scenario.
02:49 So what we’ll do is we’ll think of x² and x³ as new independent variables. Although they sort of implicitly depend or explicitly depend on the input x, we want to treat them as independent variables.
03:05 And then what we do is through introduction of these dummy variables—z₁, z₂, and z₃—we can view this cubic model in a single variable up here, in the form of a linear model involving multi-variables, so the variables being z₁, z₂, and z₃.
03:24 And so now this regression problem looks just like the multiple linear regression problems that we did in the last two previous lessons.
03:34 This approach that converts a high-order polynomial regression problem into one that’s linear and new variables, it does introduce one extra step. You see, because in the actual data that we have, we’re only observing x and the corresponding response, y, and so what we’ll need to do is generate the values of x² and x³ that correspond to the inputs. So for example, suppose that we’ve measured, or we have observed, an input and the corresponding response, and we’ve got n observations, just like before. Then what we’ll need to do is we’ll need to compute the values of the input at this observation. We’ll need to compute its square term and its cubic term, because then these will be acting as our other two independent variables in our multiple linear regression model. Now in the background, scikit-learn is going to be doing this all for you, and you’ll see how some of the options that we use in a new class that we’ll have to introduce in order to do this transformation of computing these new features—scikit has all that for us.
04:47 So let’s take a look through some figures of what we’re trying to achieve by using polynomial regression on some test data.
04:56 So here we’ve got four figures that represent four different degrees for a polynomial regression model on some hypothetical data that contains six observations, and these are again represented by these green dots.
05:11 So in all four figures, the green dots are at the same place because these all correspond to the same observation, and the only thing that we’re changing is the degree of the regression model.
05:23 In the top left corner, we’ve got a linear model, so this is degree one. Degree two, so a quadratic or a parabolic model. And then at the bottom left, we’ve got a cubic model, and down here in the bottom right, we’ve got a fifth-order polynomial regression model.
05:40 There’s one very important question that arises when you’re implementing polynomial regression, and it has to do with the choice of the optimal degree of the polynomial regression model.
05:51 There’s no straightforward rule on choosing the degree of your polynomial in a polynomial regression model, but there are however, a couple of important things that you should keep in mind when implementing polynomial regression.
06:03 And that is the issues of underfitting or overfitting the data. In underfitting, what that means is that your phenomenon that you’re trying to model really does have some nonlinear effects, so in other words, the input and the output can’t be modeled well, or aren’t modeled well with a linear model.
06:23 And so the linear model can’t really detect the dependency between the input and the output response, and these are going to correspond to some low R² values.
06:35 At the other extreme, is when you choose a sufficiently high polynomial regression model so that the model that you get can exactly estimate the responses at the input values of the observed data.
06:50 The basic idea is that you’ve chosen enough degrees of freedom so that you can exactly fit the observed data. Now, though this may seem like the best choice—so in other words, choose a sufficiently high degree so that you can accurately match up the observed data—this may not actually be the best choice, and in fact, in most cases, it isn’t because for example, in this particular case, the model that we get by using a degree five polynomial is going to predict that for values that are greater than fifty-five, the response drops suddenly. And for example, at, say, sixty, we’re at a response of about zero. And depending on what it is that you’re trying to model, it may not be the case that at sixty, the response should be very close to zero or at zero. All right, well, let’s head on over to Python and implement polynomial regression.
Become a Member to join the conversation.