An Introduction to a Grammar of Graphics
For more information on concepts covered in this lesson, you can check out the following resources:
00:00 You’ll get started by learning about what a grammar of graphics even means. This term comes from an influential book that was written a while ago, and someone else, Hadley Wickham, he created a library in the R language that he called A Layered Grammar of Graphics, and this library was called ggplot. ggplot2 is the currently used version of it. And plotnine, the library that you’ll be working with now, is essentially a port of ggplot to Python.
00:30 So, it applies the same structure and the same approach of using a layered grammar of graphics to represent graphics. But, yes, it is written for Python, so it will be easy for you to use and integrate in your Python apps.
00:44 Now, what is a layered grammar of graphics? Essentially, you can think of it as putting more and more layers on top of each other to create a graphic. That’s also why it’s called a layered grammar of graphics.
00:56 And there’s three important layers. There’s a couple more, but you’re going to learn about these three primarily, which is: data as the first base layer, then aesthetics on top of that, which are mappings from the data to specific visible elements in the graph, and then geometric objects, which is how to represent the data points.
01:17 Let’s look at these in a bit more detail. When you think of data, you might just think of a table like this, which is a common way to represent the data, which consists of rows and of columns and has these different data items in there. Now, if you would just apply the data layer to your plotnine graph, then all you would end up with is a gray square.
01:40
So, this gray square already has the data in there, but there’s no information on how to display the data, so plotnine can’t do anything more with it than just telling you, “Okay, there’s some data.” But you do need to establish this first layer of data so that you can move on to the second layer, which is aesthetics. And here, as I mentioned before, this is about mapping values that exist in your data to things that you can perceive on the graph. Most importantly, those will be the x
position and the y
position.
02:11 And I’ll show you in the next slide but I just want to mention that there’s a couple of other layers as well in the layered grammar of graphics—for example, scales and coordinate system.
02:21 There’s different ones that you can choose for these as well. But plotnine just applies some good defaults for these for you, so you usually don’t have to worry about those unless you really want to fine-tune your graph.
02:34
So, the most important things are mappings to x
and y
position.
02:39
And if you think about the dataset that you looked at before, you could pick out a lot of things from this dataset, but let’s say you’re going to map this column class
to the x-axis and you’re going to map this column here, the highway miles that the car can drive per gallon, as the y-axis.
03:01 And then, if you apply this layer, your plotnine graphic would look like this. You see, it makes already more sense, and you see the highlights here of the mapping.
03:10
Here’s the y column that got mapped over here and it put in the name of the column and also the scales, and then on the x-axis, you can see that it mapped the class
correctly.
03:23 So the x- and the y-axis got the mapped in this concept of the aesthetics. Again, you have something in the dataset and you want to tell plotnine how to display it in the graph.
03:37 So, this looks already more like a graph, but it’s still missing the data points. And this is the third important piece of the puzzle, which is the geometric objects. These geometric metrics objects tell plotnine how you want the data points to be represented. So, in this case, you’re telling it that you want it to be little black circles for each data point that’s in the dataset. And with this, you have a complete graph that makes sense. You have a y-axis with a mapping, you have an x-axis mapping, and then you have the data underlying all of it, and then the top layer of the geometric objects telling you how to display that data. And here, you can read about how SUVs, for example, can’t drive a ton of kilometers on the highway per gallon that they use, so their fuel consumption is much worse than, for example, this subcompact car that sits up all the way up there.
04:34 Okay! As a quick recap, what you need to build a plot with plotnine is three layers of the layered grammar of graphics, which is, first, the data, then aesthetics, and then third, geometric objects.
04:50 This is the quick introduction for what is a grammar of graphics, and next up, you’ll get set up on your system with Anaconda or a virtual environment.
Become a Member to join the conversation.