Simple Linear Regression: Code
First thing, let’s import the two modules that we’re going to need. We’re going to need NumPy, and we’re going to need a class from the
sklearn module that’s going to implement linear regression.
So let’s make this input array a two-dimensional array containing six rows in one column. To do that, we use the
reshape() function. We pass it in a tuple: number of rows and the number of columns.
And so what
reshape() is going to do is that because this array contains six data points, and we’re asking that
reshape() return us a two-dimensional array containing one column, because there are six data points, the number of rows is going to be computed automatically as six.
And then to actually compute the model—in other words, in this case, compute the coefficients—we need to use the
.fit() method on the model object that we created using the
This attribute is a NumPy array in this case, which has only one data point or one value, which is the 𝑏₁ value. In regression, there’s a value that can be used to determine how good a linear model fits the data, and this is the 𝑅² value. To get the 𝑅² value, you use the
.score() method on the
model, and you input whatever x and y values you want. In this case, let’s use the actual observations to see what the 𝑅² value is on the data that we have.
04:16 An 𝑅² value that is close to one, or exactly equal to one, means that a linear model is a good fit for the data. We talked about how regression can be used for two main purposes: for prediction and for inference.
Oh, forgot my
t there. All right. So really a way to think about this abstractly is that this is f(x) evaluated at each of the individual x values, and in this case, because x is a NumPy array containing six data points, we’re going to get six responses.
The real power of prediction using a regression model is to evaluate the model at inputs x to determine the corresponding response. So let’s create a new input array, and we’ll use the
arange() function in NumPy, which is similar to the
range() function, which creates an array—in this case, a NumPy array from zero to five.
06:02 And we’re going to need this to be a two-dimensional array. And let’s run that actually now, so that we print the output. And now let’s use the model to predict what the responses are for those new inputs.
06:33 And there you go. You just created your first linear regression model and used the model to predict responses for desired inputs. Now that you know how to implement simple linear regression using scikit-learn, let’s now talk about multiple linear regression, and then we’ll come back to Jupyter to implement that using scikit-learn.
Become a Member to join the conversation.