Hint: You can adjust the default video playback speed in your account settings.
Hint: You can set the default subtitles language in your account settings.
Sorry! Looks like there’s an issue with video playback 🙁 This might be due to a temporary outage or because of a configuration issue with your browser. Please see our video player troubleshooting guide to resolve the issue.

Graph Your Data With Python and ggplot (Overview)

In this course, you’ll learn how to use ggplot in Python to create data visualizations using a grammar of graphics. A grammar of graphics is a high-level tool that allows you to create data plots in an efficient and consistent way. It abstracts most low-level details, letting you focus on creating meaningful and beautiful visualizations for your data.

There are several Python packages that provide a grammar of graphics. This course focuses on plotnine since it’s one of the most mature ones. plotnine is based on ggplot2 from the R programming language, so if you have a background in R, then you can consider plotnine as the equivalent of ggplot2 in Python.

In this course, you’ll learn how to:

  • Install plotnine and Jupyter Notebook
  • Combine the different elements of the grammar of graphics
  • Use plotnine to create visualizations in an efficient and consistent way
  • Export your data visualizations to files

This course assumes that you already have some experience in Python and at least some knowledge of Jupyter Notebook and pandas. To get up to speed on these topics, check out Jupyter Notebook: An Introduction and Using Pandas and Python to Explore Your Dataset.

Download

Sample Code (.zip)

148.7 KB

Download

Course Slides (.pdf)

4.5 MB

00:00 Hello, and welcome to this course on using ggplot in Python, where you’ll learn how to visualize your data using the plotnine library, which is a port of the famous R programming library called ggplot2.

00:13 Now, first, you might wonder, “Why would you even want to build data visualizations? What’s the point in doing this?” If you have statistics, you might say.

00:21 So, let’s look at four datasets here. This is a famous example that shows four different datasets here labeled with I, II, III, and IV, and you can see their summary statistics are very similar to each other if not the same. You can pause this and compare the different datasets and you’ll see, for example, here in the first line, the count between the first and the third dataset are exactly the same, and there’s a lot of other similarities like this.

00:45 So you might assume that these datasets represent similar data to each other. But if you go ahead and start visualizing this—and let me just put some color on here so that you know which plot’s going to refer to which datasets—then you can see right away that these datasets are very different from each other. This is the first one, the second one, third one, and the fourth one, and the distribution of the data points is completely different for each of them, even though they have very similar or the same descriptive statistics.

01:16 Now, this example is called Anscombe’s quartet, which is a famous example that shows you that you can’t solely rely on the descriptive statistics of your datasets because they might hide quite a lot of variability in there. And something that’s very hard to see if you just look at the data like this is that you can see it very easily by building some descriptive visualizations, and plotnine applies a grammar of graphics approach to building these visualizations that makes it very powerful and easy to use, and so I hope you’re going to enjoy working with it when you learn about it in this course. Now, the course is going to walk you over, first of all, what is a grammar of graphics—a term that I just mentioned before, which is a way of building graphics—and how to do this in a layered approach.

02:06 Then, you’re going to get set up using either Anaconda or a virtual environment. Then, you’re going to start talking about the layers that this grammar of graphics applies, with the first one being the data layer.

02:17 Then you’re going to talk about aesthetics as well as geometric objects. These three being the most important layers that you need to work with when you work with plotnine, or ggplot, for that matter.

02:28 And then you’re also going to learn about a couple of other layers that you can use to make your visualizations even more descriptive and more meaningful.

02:37 These are statistical transformations, scales, and coordinate systems. And then, finally, you’re also going to learn about how you can apply different themes that are built-in in plotnine to change the look and feel of your plots and learn how to export these graphics to separate files so you can use it also elsewhere. And this covers the extent of the course.

02:59 I hope you’re going to enjoy it and learn something from it. In the next lesson, you’re going to get started by hearing about what is a grammar of graphics and how is it applied in ggplot and plotnine.

03:09 See you there!

Become a Member to join the conversation.