For more information about concepts covered in this lesson, you can check out:
Creating Scatter Plots
00:00 Creating scatter plots. A scatter plot is a visual representation of how two variables relate to each other. You can use scatter plots to explore the relationship between two variables, for example, by looking for any correlation between them.
00:17 In this section of the course, you’ll become familiar with creating basic scatter plots using Matplotlib. In later sections, you’ll learn how to further customize your plots to represent more complex data using more than two dimensions.
Before you can start working with
plt.scatter(), you’ll need to install Matplotlib. It’s generally a good idea when experimenting with new Python code to do so in a virtual environment, and if you’re not sure how to do so, check out this Real Python course.
Matplotlib can be installed using Python’s standard package manager
pip by running the following command in the console.
01:04 Now that you have Matplotlib installed, consider the following use case. A café sells six different types of bottled orange drinks. The owner wants to understand the relationship between the price of the drinks and how many of each one he sells, so he keeps track of how many of each drink he sells every day.
01:22 You can visualize this relationship with the code that follows.
In this Python script, you import the
pyplot submodule from
matplotlib using the alias
plt. This alias is generally used by convention to shorten the module and submodule names.
01:41 You then create lists with the price and average sales per day for each of the six orange drinks sold.
Finally, you create the scatter plot using
plt.scatter() with the two variables you wish to compare as input arguments.
As you’re using a Python script, you also need to explicitly display the figure using
If you’re using an interactive environment, such as a console or a Jupyter Notebook, you won’t need the call to
plt.show(). But in this course, all the examples will be in the form of scripts, and will include that call. Here, you can see the output from the code you’ve just seen.
02:44 This plot shows that, in general, the more expensive a drink is, the fewer items are sold. However, the drink that costs $4.02 is an outlier, which may show that it’s a particularly popular product. When using scatter plots in this way, close inspection can help you explore the relationship between variables. You can then carry out further analysis, whether it’s using linear regression or other techniques.
03:11 For more on linear regression, check out this Real Python course. You can also produce the scatter plot shown above using another function within Matplotlib’s pyplot module.
plt.plot() is a general purpose plotting function that will allow you to create various different line or marker plots. You can achieve the same scatter plot as the one you obtained in the section above with the following code.
In this case, you had to include the marker
"o" as a third argument. Otherwise,
plt.plot() would plot a line graph. The plot you created with this code is identical to the plot you created earlier with
In some instances, for the basic scatter plot you’re plotting in this example, using
plt.plot() may be preferable. You can compare the efficiency of the two functions using the
timeit is imported, then a print statement is included for each type of call, which includes a call to
timeit() call includes the command to be run, the number of times that the command should be run, and the namespace in which it should be executed.
The performance will vary on different computers, but when you run this code, you’ll find that
plt.plot() is significantly more efficient than
plt.scatter(). When running the example on my system,
plt.plot() was significantly faster, as seen in the figures onscreen.
If you can create scatter plots using
plt.plot() and it’s so much faster, then why would you ever use
plt.scatter()? You’ll find the answer in the rest of this course.
Most of the customizations and advanced uses you’ll learn about are only possible when using
plt.scatter(). Here’s a rule of thumb you can use. If you need a basic scatter plot, use
plt.plot(), especially if you want to prioritize performance.
If you want to customize your scatter plot by using more advanced plotting features, then use
plt.scatter(). In the next section of the course, you’ll take a look at the customizations that you can only do using
Become a Member to join the conversation.