Hint: You can adjust the default video playback speed in your account settings.
Hint: You can set your subtitle preferences in your account settings.
Sorry! Looks like there’s an issue with video playback 🙁 This might be due to a temporary outage or because of a configuration issue with your browser. Please refer to our video player troubleshooting guide for assistance.

Anscombe's Quartet Revisited

To follow along at this point in the lesson, you can use the following code:

Python
import pandas as pd

# Anscombe's Quartet
x  = [10, 8, 13, 9, 11, 14, 6, 4, 12, 7, 5]
y1 = [8.04, 6.95, 7.58, 8.81, 8.33, 9.96, 7.24, 4.26, 10.84, 4.82, 5.68]
y2 = [9.14, 8.14, 8.74, 8.77, 9.26, 8.10, 6.13, 3.10, 9.13, 7.26, 4.74]
y3 = [7.46, 6.77, 12.74, 7.11, 7.81, 8.84, 6.08, 5.39, 8.15, 6.42, 5.73]
x4 = [8, 8, 8, 8, 8, 8, 8, 19, 8, 8, 8]
y4 = [6.58, 5.76, 7.71, 8.84, 8.47, 7.04, 5.25, 12.50, 5.56, 7.91, 6.89]

I   = pd.DataFrame([x, y1], index=["x", "y1"]).T
II  = pd.DataFrame([x, y2], index=["x", "y2"]).T
III = pd.DataFrame([x, y3], index=["x", "y3"]).T
IV  = pd.DataFrame([x4, y4], index=["x4", "y4"]).T

00:00 Before we dig into this layer by layer, I want to give a quick throwback to this Anscombe’s quartet that you learned about in the first lesson of this course.

00:10 Now, here’s some data that makes up these four different types of plots that all have the same statistical values but very different plots. I want to show you how quickly you can plot these using plotnine.

00:24 Now you can get this data off of the description of this course lesson if you want to run it as well, but you can also just watch. So, I have to import pandas explicitly because it comes as a dependency with plotnine but it’s not automatically imported, of course.

00:39 And now you can see I have these datasets and if you .describe() them, you would see what we saw before.

00:50 I could say…

00:55 You could compare these values and see that they’re very similar—the statistical values—if not the exact same. But now, if you take a different approach and you actually go ahead and visualize these datasets—using plotnine, in this case—you can very quickly see a difference.

01:13 So I need to import from plotnine, the ggplot, the aesthetic, and the geometrical object. With ggplot, with this first one, I can add the data layer, so to say.

01:26 And this is the syntax that you can use. You can say ggplot(), pass in the data. So here, I’m passing in the pandas DataFrame as the data layer.

01:37 Then, you’re adding the aesthetics layer, where you define the mappings. From x is going to map to x here, and y is going to map to y1, in this case.

01:49 So, you want to plot this first dataset.

01:54 And now, if I execute this,

01:57 you can see the plot popping up here. And it looks a certain way, okay. One plot alone doesn’t tell you much yet, but now if you make the second one…

02:07 In the same way, I’m just going to say ggplot(), but pass in the second dataset. I’m going to say + aes() (plus aesthetics), where I’m going to map x to "x" and y to "y2", in this case.

02:24 And finally, you need to define the geometric objects, and this is just going to be a point plot. So if I run this, you right away see that this data said has a completely different distribution of the values actually.

02:38 So something that was basically impossible to see by just the statistical descriptions, you can very easily distinguish by a quick plot that doesn’t take more than one import line and then three lines of code for each plot.

02:54 So, you can play around with this a bit more. Also, you can plot the other ones. You can plot number III and number IV and compare them, and if you want, research a little how you can change the colors and size of these dots.

03:08 So, see you in the next lesson, where you’re going to start looking at the data layer in a bit more detail.

Become a Member to join the conversation.