Loading video player…

Starting With Linear Regression in Python (Overview)

We’re living in the era of large amounts of data, powerful computers, and artificial intelligence. This is just the beginning. Data science and machine learning are driving image recognition, autonomous vehicle development, decisions in the financial and energy sectors, advances in medicine, the rise of social networks, and more. Linear regression is an important part of this.

Linear regression is one of the fundamental statistical and machine learning techniques. Whether you want to do statistics, machine learning, or scientific computing, there’s a good chance that you’ll need it. It’s advisable to learn it first and then proceed toward more complex methods.

In this video course, you’ll learn:

  • What linear regression is
  • What linear regression is used for
  • How linear regression works
  • How to implement linear regression in Python, step by step

For more information on concepts covered in this lesson, you can check out Using Jupyter Notebooks.


Sample Code (.zip)

1.5 MB

Course Slides (.pdf)

1.6 MB

00:00 Hey there. Welcome to this Real Python course on implementing linear regression in Python.

00:07 What is regression? Regression analysis is a statistical method for estimating the relationship between a dependent variable and one or more independent variables.

00:19 Regression techniques are used in all branches of the sciences and finance. The two main uses of regression are prediction and inference. In prediction, the goal is to forecast the outcome of some event, state, or object from some previous knowledge. Whereas in inference, your goal is to determine whether an event, a state, or an object affects the production of another event, state, or object.

00:45 When used for prediction, regression analysis has substantial overlap with the field of machine learning.

00:54 In regression, the goal is to build a mathematical model describing the effect of a set of input variables on another variable y. The input variables are sometimes called predictors, independent variables, or features. The variable y is called the response, the output, or the dependent variable. As an example, y might be the sale price of a home, and the independent variables might be the square footage of the home, the proximity of the home to schools, the proximity of the home to hospitals, or maybe the sell price of other homes in the same neighborhood.

01:33 In regression, you assume some mathematical model between y and x. Here, f is the model. It’s a function of x, and the response depends on the model plus some random error term (ε).

01:48 So the main goal of regression is to build a good model for f.

01:54 To build the model, you need data. In other words, you need observations or actual measurements. So in the example of the sell price of a home, you’ve got the data for the sell price of one home, another home, a third home, and so on.

02:11 The gist of a regression technique—so what differentiates it among others—is how you take the n observations to build the model f. There are many regression techniques: linear, polynomial regression, nonlinear regression, decision trees, support vector machines, neural networks, and many others.

02:32 In this course, you’re going to learn how to use the Python module scikit-learn to implement linear regression and the related polynomial regression. Why linear regression? Well, linear regression is the most widely used regression method. In linear regression, the model f is assumed to take the following form.

02:54 The input variables appear linearly in the model, and you have a constant term (𝑏₀), which is sometimes called the bias or the intercept. scikit-learn is used in machine learning and built on top of the popular module NumPy.

03:11 Let’s talk a little bit about the environment that I’ll be using for the course. You’re going to need scikit-learn. You can use pip to install it, or if you want other modules for data science, I recommend Anaconda Python.

03:25 I’m going to be using Jupyter in this course. So if you’re not familiar with Jupyter, you can go ahead and check out this Real Python course on how to use Jupyter.

03:34 But if you’re more comfortable with using your own editor, it’s perfectly fine. You’ll have no problems following along. And one more note, the code in this course has been tested in Python 3.9, but even if you have slightly older versions of Python, you should be all right.

03:51 Here’s the table of contents of the course. We’re going to begin by taking a look at simple linear regression. This is linear regression where we only have one input variable.

04:02 Then we’ll move on to multiple linear regression and then tackle polynomial regression, both for the simple case and for the multiple input case. And then we’ll wrap things up in a summary. All right.

04:14 I hope you’re looking forward to the course. Let’s get going then.

Avatar image for toigopaul

toigopaul on Dec. 12, 2023

This course is incredible! I am now officially thrilled to be a paid member of Real Python. At a fraction of the cost, this course got me through a poorly worded edX HarvardX CS109x exercise and I predict it’ll help with future exercises too. Furthermore, the Real Python “lecture” is also superior to CS109x. For reference, CS109x cost me $299.

Become a Member to join the conversation.