Multiple Linear Regression: Code
Here’s the data that you’ll need to implement Multiple Linear Regression in Python:
x = [[0, 1], [5, 1], [15, 2], [25, 5], [35, 11], [45, 15], [55, 34], [60, 35]]
y = [4, 5, 20, 14, 32, 22, 38, 43]
00:00
Let’s start again by first importing NumPy and then the LinearRegression
class from the sklearn.linear_model
module. As before, we’re going to be using some dummy data.
00:14 I’m going to copy and paste some values for the inputs x and the outputs y, and this data will be provided to you underneath the notes that accompany the video lesson.
00:28 So from the data, you see that what we have is eight observations. So eight observations for the responses, and we’ve got eight input observations. Each input observation consists of two data points.
00:44 So we’re using a two-dimensional linear regression model. Let’s take these Python lists and create NumPy arrays out of them. So go ahead and run that and then redefine x to be a NumPy array … and the same thing for y.
01:05
Let’s take a look at the shape
of x and the shape
of y.
01:14 And so we see this is consistent with some of the notation that we were using in the previous lesson. The input array is a two-dimensional array containing eight rows and two columns. Eight corresponds to the number of observations, and y is a one-dimensional array containing eight components.
01:35
Now let’s use the LinearRegression
class and the .fit()
method to build the model and compute the coefficients all at once.
01:52 Let’s compute the R² value.
02:08 This is a good value for R², and it tells us that the linear model is a pretty good fit to the actual data. Now, let’s go ahead and print out the coefficients—first the intercept …
02:26 and then the coefficients multiplying the input components.
02:35 So as before, the intercept is one value, one of those coefficients, and then the coefficients multiplying the input components is going to be two because this is a two-dimensional linear model.
02:50
Now let’s compute the estimates for the responses that correspond to the input observations. This time, let’s just call this y_est
for y estimate.
03:02 And so these are going to be the values of the model evaluated at the observations. Oh, and this time it wasn’t a t that I forgot. It was a c.
03:16 So let’s compare the estimates and the actual observed values for the responses. So for the first response of four, we estimated a value of 5.7, approximately. For the response actual response of five.
03:37 We estimate with the model a value of about 8.0 and so on. Now let’s use the model to predict responses at new inputs. So go ahead and create an input array.
03:51
Let’s create a five-by-two input array. So you’ll have to use, say, the arange()
function just to create some dummy data. So we’ll pass into arange()
10
so that we get the values from zero to nine, and then we want to reshape that so that we get a five-by-two NumPy array. And let’s just pass in the -1
shortcut.
04:16 So just to clarify, when we print this, we’re going to be generating a five-by-two NumPy array or a five-by-two matrix. And then we can predict what the responses are going to be at those inputs.
04:37 And that’s all there is to it. So implementing multiple linear regression is very similar to just simple linear regression. The only difference is making sure that all your dimensions add up.
04:51 In the next couple of lessons, We’ll take a look at how we can implement polynomial regression in scikit-learn.
Become a Member to join the conversation.