Statistical Transformations: factor()
geom_boxplot. Now, if you think of this data again,
huron.head(), you remember that it has three columns. It has the year, it has the water level, and then it has the decade that this year is in.
01:17 Okay, so this is kind of a disappointing result that doesn’t really tell you anything. And as you can see, plotnine didn’t really know what to do with these decades that you inputted, because it just keeps going. You know, like it goes for 100 years if I remember correctly.
01:33 It goes from 1870 to values up to 1970, but the data isn’t grouped specifically following these. So, what plotnine does by default here is it just lumps all the data together and gives you this one big box plot that just spans all of the 100 years, which doesn’t give you much information and isn’t really what you’re looking for.
So this is how you can also apply a statistical transformation where it’s going to say, it’s going to consider all of these decades’ ordinal values and group all of the levels together by decade. So just by adding this
factor(), you can see that plotnine can now recognize these as specific individual bins, so to say, and just display the data for each of those decades.
02:28 And now you can see an interesting variability where you can actually see the trend of how did the water levels change over the past 100 years. So, this is a useful thing to know, that you can apply statistical transformations like this.
And you might be wondering, “What’s actually happening underneath this shiny API that plotnine provides?” And you might be looking for the
factor() function, and if you figure out that pandas underlies this whole structure that is built up here, you won’t find this
factor() function in there because the whole API of plotnine is based on
ggplot, which is, again, based on the R programming language.
Become a Member to join the conversation.