Hint: You can adjust the default video playback speed in your account settings.
Hint: You can set the default subtitles language in your account settings.
Sorry! Looks like there’s an issue with video playback 🙁 This might be due to a temporary outage or because of a configuration issue with your browser. Please see our video player troubleshooting guide to resolve the issue.

Creating Scatter Plots

For more information about concepts covered in this lesson, you can check out:

00:00 Creating scatter plots. A scatter plot is a visual representation of how two variables relate to each other. You can use scatter plots to explore the relationship between two variables, for example, by looking for any correlation between them.

00:17 In this section of the course, you’ll become familiar with creating basic scatter plots using Matplotlib. In later sections, you’ll learn how to further customize your plots to represent more complex data using more than two dimensions.

00:32 Before you can start working with plt.scatter(), you’ll need to install Matplotlib. It’s generally a good idea when experimenting with new Python code to do so in a virtual environment, and if you’re not sure how to do so, check out this Real Python course.

00:48 Matplotlib can be installed using Python’s standard package manager pip by running the following command in the console.

01:04 Now that you have Matplotlib installed, consider the following use case. A café sells six different types of bottled orange drinks. The owner wants to understand the relationship between the price of the drinks and how many of each one he sells, so he keeps track of how many of each drink he sells every day.

01:22 You can visualize this relationship with the code that follows.

01:30 In this Python script, you import the pyplot submodule from matplotlib using the alias plt. This alias is generally used by convention to shorten the module and submodule names.

01:41 You then create lists with the price and average sales per day for each of the six orange drinks sold.

02:01 Finally, you create the scatter plot using plt.scatter() with the two variables you wish to compare as input arguments.

02:14 As you’re using a Python script, you also need to explicitly display the figure using plt.show().

02:22 If you’re using an interactive environment, such as a console or a Jupyter Notebook, you won’t need the call to plt.show(). But in this course, all the examples will be in the form of scripts, and will include that call. Here, you can see the output from the code you’ve just seen.

02:44 This plot shows that, in general, the more expensive a drink is, the fewer items are sold. However, the drink that costs $4.02 is an outlier, which may show that it’s a particularly popular product. When using scatter plots in this way, close inspection can help you explore the relationship between variables. You can then carry out further analysis, whether it’s using linear regression or other techniques.

03:11 For more on linear regression, check out this Real Python course. You can also produce the scatter plot shown above using another function within Matplotlib’s pyplot module.

03:24 plt.plot() is a general purpose plotting function that will allow you to create various different line or marker plots. You can achieve the same scatter plot as the one you obtained in the section above with the following code.

03:46 In this case, you had to include the marker "o" as a third argument. Otherwise, plt.plot() would plot a line graph. The plot you created with this code is identical to the plot you created earlier with plt.scatter().

04:01 In some instances, for the basic scatter plot you’re plotting in this example, using plt.plot() may be preferable. You can compare the efficiency of the two functions using the timeit module.

04:15 Firstly, timeit is imported, then a print statement is included for each type of call, which includes a call to timeit.timeit().

04:26 The timeit() call includes the command to be run, the number of times that the command should be run, and the namespace in which it should be executed.

05:08 The performance will vary on different computers, but when you run this code, you’ll find that plt.plot() is significantly more efficient than plt.scatter(). When running the example on my system, plt.plot() was significantly faster, as seen in the figures onscreen.

05:27 If you can create scatter plots using plt.plot() and it’s so much faster, then why would you ever use plt.scatter()? You’ll find the answer in the rest of this course.

05:38 Most of the customizations and advanced uses you’ll learn about are only possible when using plt.scatter(). Here’s a rule of thumb you can use. If you need a basic scatter plot, use plt.plot(), especially if you want to prioritize performance.

05:54 If you want to customize your scatter plot by using more advanced plotting features, then use plt.scatter(). In the next section of the course, you’ll take a look at the customizations that you can only do using plt.scatter().

Become a Member to join the conversation.