# Multiple Linear Regression: Background

**00:00**
Now that we’ve covered simple linear regression, let’s see what multiple linear regression is all about.

**00:08**
In multiple linear regression, the input variable is at least two-dimensional . To distinguish between scalar and array or vector values, we’re going to be using bold symbols.

**00:21**
So here we’re using the bold symbol **x** to say that **x** is going to be an array or a vector that contains more than one component. So in this case, it’s containing *r* components.

**00:36**
In this case, the model takes the following form. We’ve got the intercept, *b₀*, and then we’re multiplying each component of the input vector by an unknown coefficient. So in this case, the number of unknowns or coefficients is going to be *r* plus 1 plus that intercept. As in simple linear regression, in order to build this model, we’re given *n* observations.

**01:02**
So we’ve got one input observation and its corresponding response, and then a second input observation, and then its corresponding response, and so on, and there are *n* of these.

**01:14**
And just to make sure you’re remembering that each of these input observations, they are r-dimensional arrays or r-dimensional vectors. And so, for example, if we’re looking at the *ith* input observation, that *ith* input observation has *r* components.

**01:32**
And so we’re going to need two indices. The first index corresponds to *i*, so *i* would be referring to the *ith* observation. And then the second index tells us the component of that *ith* observation.

**01:45**
So there are one, two, all the way up to *r* components for this *ith* observation.

**01:54**
The problem is the same as in simple linear regression. We want to find that model, and to find that model, we need to find the coefficients. And one way to do that is to minimize the residual sum of squares function.

**02:07**
The definition of RSS is the same as in simple linear regression. We compute the residuals for each *ith* input observation and corresponding response.

**02:18**
We square it, and we want to add those up, *i* going from one to *n*. In the lesson on simple linear regression, I showed you a formula for the expressions for *b₀* and *b₁*.

**02:31**
In that case, there were only two coefficients. I want to do the same thing in the case of multiple linear regression, but to do so, I have to introduce a little bit of matrix notation. Now, if you’ve never taken a course, say, on linear algebra, or maybe it’s been a long time since you’ve worked with matrices, then just come along for the ride, take a look at the nice formulas that we’ll display and just let scikit-learn do all the computations for you. All right.

**02:59**
So this is what we’ll do. The coefficients we’re going to stack them up in a vector. There are *r* plus 1 coefficients that are unknown. And so we’re going to denote that vector by the bold letter **b**.

**03:12**
We’re going to do the same thing for the responses. We’ve got *n* responses. We’re going to denote by the bold letter **y** the vector of responses.

**03:22**
And then we’re going to gather the *n* input responses. Remember, these are arrays or vectors that contain *r* components, and then we’re going to do something else just for convenience.

**03:33**
We’re going to tack on to this associated matrix this column of ones. Now, in case you haven’t fallen asleep yet, let me just open up these components here. Yikes. Okay.

**03:48**
We opened up a can of worms. So these input vectors, they are really r-dimensional. And so we stack those inputs as rows in a matrix, but then we also add this column of ones, just so that the mathematics is easy to do, and we get nice formulas at the end.

**04:07**
So let me show you what the formula is for the unknown coefficients *b₀* to *bᵣ*.

**04:15**
So using some calculus and a little bit of matrix notation, there’s actually a nice closed form expression for those coefficients. It involves some transposes and some inverses, but this is sort of a theoretical formula.

**04:31**
It isn’t all that useful because numerically, some of these computations are unstable. So computing inverses can be a very tricky thing.

**04:41**
Instead, in the background, scikit-learn is computing the solution to **b** using some well-established numerical linear algebra routines, like the singular value decomposition. All right.

**04:54**
I think that’s enough. Why don’t we go see how we implement multiple linear regression in Python?

Become a Member to join the conversation.