For more information about concepts covered in this lesson, you can check out:
Creating Scatter Plots
00:00 Creating scatter plots. A scatter plot is a visual representation of how two variables relate to each other. You can use scatter plots to explore the relationship between two variables, for example, by looking for any correlation between them.
00:17 In this section of the course, you’ll become familiar with creating basic scatter plots using Matplotlib. In later sections, you’ll learn how to further customize your plots to represent more complex data using more than two dimensions.
Before you can start working with
plt.scatter(), you’ll need to install Matplotlib. It’s generally a good idea when experimenting with new Python code to do so in a virtual environment, and if you’re not sure how to do so, check out this Real Python course.
01:04 Now that you have Matplotlib installed, consider the following use case. A café sells six different types of bottled orange drinks. The owner wants to understand the relationship between the price of the drinks and how many of each one he sells, so he keeps track of how many of each drink he sells every day.
If you’re using an interactive environment, such as a console or a Jupyter Notebook, you won’t need the call to
plt.show(). But in this course, all the examples will be in the form of scripts, and will include that call. Here, you can see the output from the code you’ve just seen.
02:44 This plot shows that, in general, the more expensive a drink is, the fewer items are sold. However, the drink that costs $4.02 is an outlier, which may show that it’s a particularly popular product. When using scatter plots in this way, close inspection can help you explore the relationship between variables. You can then carry out further analysis, whether it’s using linear regression or other techniques.
plt.plot() is a general purpose plotting function that will allow you to create various different line or marker plots. You can achieve the same scatter plot as the one you obtained in the section above with the following code.
In this case, you had to include the marker
"o" as a third argument. Otherwise,
plt.plot() would plot a line graph. The plot you created with this code is identical to the plot you created earlier with
The performance will vary on different computers, but when you run this code, you’ll find that
plt.plot() is significantly more efficient than
plt.scatter(). When running the example on my system,
plt.plot() was significantly faster, as seen in the figures onscreen.
Most of the customizations and advanced uses you’ll learn about are only possible when using
plt.scatter(). Here’s a rule of thumb you can use. If you need a basic scatter plot, use
plt.plot(), especially if you want to prioritize performance.
If you want to customize your scatter plot by using more advanced plotting features, then use
plt.scatter(). In the next section of the course, you’ll take a look at the customizations that you can only do using
Become a Member to join the conversation.