Box Plots With Pandas and Matplotlib
00:00
In the previous lesson, you used factor()
to get this nice box plot displayed, and now you’re going to take a little look a bit under the hood to just see how you could do the same thing in pandas directly. Now, you might want to… First of all, let’s take a look at huron
and just prove that this is actually a DataFrame
object.
00:19
So you see that the data that plotnine uses to create these plots is based on pandas. So you might wonder, “Okay, so I did some factor()
on that, what happens if I run huron.factor()
and pass in the column name? For example, "decade"
. Do I get anything here? No.” Okay, so the DataFrame
object has no attribute 'factor'
, and this is because factor()
is a function that comes from the R programming language. So, when you’re working with the plotnine API, you will find some such things where if you know pandas well, you might not actually know the functions that are used in there because it’s just based on a different programming language and uses the same API that ggplot2 uses. Okay, but so you can do the same thing in pandas, of course, as well. You would have to go and use .groupby()
01:09
and then pass in "decade"
,
01:11
and then you could even plot this if you wanted to. You can say "decade"
, "level"
, and then .plot()
this. You would get something that’s somewhat similar to what you saw before. You can see it follows a similar trend.
01:25 You have little line graphs here instead of the box plots, so this is not exactly what you’re looking for, but you can see that there’s a similar trend already in there because you’re displaying the same data and also grouped it by decade. Also, you see that the axes aren’t very meaningful here.
01:41
So what you could do instead is fall back down another level and go into using Matplotlib, which is what also helps building the plots for plotnine. I could say from matplotlib.pyplot import boxplot
.
02:00
And then you can say on this dataset, the huron
dataset, make a box plot where you put "level"
against "decade"
,
02:12 and now you can see that you built a similar graph to the one that you got with plotnine as well. So, I just wanted to show you this, that these are the libraries that are actually used by plotnine to build the plots, to do all the math that’s involved in here.
02:25 It’s built on top of pandas and it’s built on top of Matplotlib, and sometimes you will have to fall back to the underlying libraries if you want to do some of the transformations that can’t happen directly, just maybe because there’s no R equivalent to it or just because it wasn’t built into the API.
02:42 So, keep in mind that there is this pandas, Matplotlib layers underneath plotnine, and you can use them, but you might have to figure out the syntax being a little different than it is to what you see in plotnine, because plotnine uses the ggplot2 API, which comes from the R programming language. All right, this is all I wanted to show you here in this little exploration. In the next lesson, you’re going to look at scales: another layer in your layered grammar of graphics that you can change and work with.
Become a Member to join the conversation.