Simple Linear Regression: Code
00:00 All right. If you already haven’t, fire up an instance of a Jupyter notebook, an editor, or any other terminal that you’re comfortable with to write your Python code.
00:11
First thing, let’s import the two modules that we’re going to need. We’re going to need NumPy, and we’re going to need a class from the sklearn
module that’s going to implement linear regression.
00:39
Let’s create some dummy data to try out the LinearRegression
class, the input array. We’re going to be using an array containing six data points.
00:55 And the linear regression object is going to be expecting for the input array a two-dimensional array. As we have it now this is a one-dimensional array containing six data points.
01:07
So let’s make this input array a two-dimensional array containing six rows in one column. To do that, we use the reshape()
function. We pass it in a tuple: number of rows and the number of columns.
01:22
There’s a shortcut that we can use in reshape()
. When we want reshape()
to infer the number or the dimension size for one of the dimensions, we can pass in a -1
value.
01:35
And so what reshape()
is going to do is that because this array contains six data points, and we’re asking that reshape()
return us a two-dimensional array containing one column, because there are six data points, the number of rows is going to be computed automatically as six.
01:54 Now let’s create the output array.
02:06
So let’s verify that x
has a shape of six by one.
02:16
And the shape of y
… so here y
is a one-dimensional NumPy array containing six data points. Now let’s build our regression model.
02:36
And then to actually compute the model—in other words, in this case, compute the coefficients—we need to use the .fit()
method on the model object that we created using the LinearRegression
class.
02:50
.fit()
takes two required positional arguments, the first one, x
, being the input variable, and y
being the response.
02:59
Now that we’ve called the .fit()
method on the model object, the model object contains attributes that contain all of the coefficients. The 𝑏₀ coefficient is the intercept_
attribute,
03:15
and the other coefficients in the model, that are the ones that are in front of the input variables—in this case, there’s only one input variable—is in the coefficient attribute, or coef_
.
03:31
This attribute is a NumPy array in this case, which has only one data point or one value, which is the 𝑏₁ value. In regression, there’s a value that can be used to determine how good a linear model fits the data, and this is the 𝑅² value. To get the 𝑅² value, you use the .score()
method on the model
, and you input whatever x and y values you want. In this case, let’s use the actual observations to see what the 𝑅² value is on the data that we have.
04:16 An 𝑅² value that is close to one, or exactly equal to one, means that a linear model is a good fit for the data. We talked about how regression can be used for two main purposes: for prediction and for inference.
04:32 Let’s use our model to predict what the responses are for the observed inputs for x.
04:50 Another way to get these values is to manually evaluate the inputs x on our model. So if you recall, the model is 𝑏₀ plus 𝑏₁, evaluated at the inputs x.
05:15
Oh, forgot my t
there. All right. So really a way to think about this abstractly is that this is f(x) evaluated at each of the individual x values, and in this case, because x is a NumPy array containing six data points, we’re going to get six responses.
05:36
The real power of prediction using a regression model is to evaluate the model at inputs x to determine the corresponding response. So let’s create a new input array, and we’ll use the arange()
function in NumPy, which is similar to the range()
function, which creates an array—in this case, a NumPy array from zero to five.
06:02 And we’re going to need this to be a two-dimensional array. And let’s run that actually now, so that we print the output. And now let’s use the model to predict what the responses are for those new inputs.
06:33 And there you go. You just created your first linear regression model and used the model to predict responses for desired inputs. Now that you know how to implement simple linear regression using scikit-learn, let’s now talk about multiple linear regression, and then we’ll come back to Jupyter to implement that using scikit-learn.
Become a Member to join the conversation.
timothyrobertgray on March 2, 2022
Hi Cesar. Started this class. I’m using Anaconda/Jupyter notebook. This section was going along well when I received a couple errors near the end. Any thoughts?
TypeError Traceback (most recent call last) /var/folders/8t/hl56kksj2zv71g36pyhd10wm0000gn/T/ipykernel_14436/1176011380.py in <module> 1 x_new = np.arange(5).reshape((-1, 1)) ----> 2 print(x_new)
TypeError: ‘numpy.float64’ object is not callable
/var/folders/8t/hl56kksj2zv71g36pyhd10wm0000gn/T/ipykernel_14436/2147735938.py in <module> 1 y_new = model.predict(x_new) ----> 2 print(y_new)
TypeError: ‘numpy.float64’ object is not callable