Box Plots With Pandas and Matplotlib

Graph Your Data With Python and ggplot Martin Breuss 03:15

00:00 In the previous lesson, you used factor() to get this nice box plot displayed, and now you’re going to take a little look a bit under the hood to just see how you could do the same thing in pandas directly. Now, you might want to… First of all, let’s take a look at huron and just prove that this is actually a DataFrame object.

01:09 and then pass in "decade",

01:11 and then you could even plot this if you wanted to. You can say "decade", "level", and then .plot() this. You would get something that’s somewhat similar to what you saw before. You can see it follows a similar trend.

01:25 You have little line graphs here instead of the box plots, so this is not exactly what you’re looking for, but you can see that there’s a similar trend already in there because you’re displaying the same data and also grouped it by decade. Also, you see that the axes aren’t very meaningful here.

01:41 So what you could do instead is fall back down another level and go into using Matplotlib, which is what also helps building the plots for plotnine. I could say from matplotlib.pyplot import boxplot.

02:00 And then you can say on this dataset, the huron dataset, make a box plot where you put "level" against "decade",

02:12 and now you can see that you built a similar graph to the one that you got with plotnine as well. So, I just wanted to show you this, that these are the libraries that are actually used by plotnine to build the plots, to do all the math that’s involved in here.

02:25 It’s built on top of pandas and it’s built on top of Matplotlib, and sometimes you will have to fall back to the underlying libraries if you want to do some of the transformations that can’t happen directly, just maybe because there’s no R equivalent to it or just because it wasn’t built into the API.

Become a Member to join the conversation.