Join us and get access to thousands of tutorials and a community of expert Pythonistas.

This lesson is for members only. Join us and get access to thousands of tutorials and a community of expert Pythonistas.

The Scales Layer

Graph Your Data With Python and ggplot Martin Breuss 05:44

For more information on concepts covered in this lesson, you can check out the following resources:

00:00 In this lesson, you’re learning about another layer inside of your layered grammar of graphics, and these are the scales and how you can edit those if you want to make your graph more user-friendly without actually changing the data itself. You can do that by improving labeling, and you’re going to also look at an example of reshaping the x values for one of the datasets.

00:21 So, the dataset you’re going to use for this is the economics dataset that also comes included with plotnine, so you can say from plotnine.data import economics.

00:31 And you should always take a look at the data

00:35 to understand what this dataset is about. Now, if you run into something like this, that the column names—pce, for example—aren’t very meaningful and you kind of, like, need to understand a little bit about your dataset before you can start plotting so that you know which columns you want to choose. But also, of course, you can learn things about a dataset by plotting it.

00:55 That’s a big point of this exploration that you’re doing using plotnine. But if you want to know more about these datasets, especially the included ones, you can head over to the plotnine documentation, and they have descriptions on each of the included datasets.

01:10 So here, you would see plotnine.data.economics. That’s the one that you just loaded. and it shows US economic time series, so development of economics in the US from a certain time to a certain time. And pce, for example, means personal consumption expenditures, in billions of dollars. Okay.

01:29 But what you’re going to plot in here now is just date and population, so you want to see how did the population change over time.

01:37 Let’s start off with just making this line plot first before actually applying any changes to the scales layer, just so that you see what this builds by default.

01:49 economics plus aesthetics. And you just said you want to plot "date", that’s got to be a string, and population.

02:03 And then, geom… Let’s make a line plot, that still need to import, from plotnine import ggplot… Data, aesthetics, geometric objects.

02:19 Then, I can call this line plot and see what does plotnine give you by default. So you see, it maps the aesthetics onto the scales here. It gives it a name and it plots the development of population over time in the US from this dataset. But these scales here aren’t that meaningful.

02:39 The title here isn’t that great. And it might be interesting to see, how did the population change? Not with dates here, but maybe amount of years, maybe, like decades we could put in here instead and it would maybe be more meaningful.

02:53 So, there’s two things that you can do to make this graph be more understandable. And for this, you will need to import a couple of things. So you’re going to import from plotnine,

03:06 let’s start off with the labs function, which helps you to give better titles to both the plot in total and also to your scales on the side.

03:16 So, you can add this layer in here and give the plot a title. I’m going to call this "US Population over time", for example. And then let’s also give a better title to the y-axis here. I’m going to say y should be "Population" instead. Let’s look at this.

03:39 And this already looks a little better, you know? You give the context of what is this plot about and give a better name to the y-axis. And then you also said that it could be nice to see from the start, from 1970, in decades, how did this population change? And just change, maybe, the labels that you have down here for the tick marks.

03:58 And you can do this also, for example, for this specific situation, you can use a function called scale_x_timedelta().

04:08 Import that one too, and then just add another layer here, scale_x_timedelta().

04:16 And then give this one a name. And that’s really all you need to do. So I’m going to say "Years since"… When did this start? 1967. Let’s say " 1970"

04:32 and then run this again. And then you can see that, again, the labeling down here on your x-axis changed, and it didn’t just add the title as it did with the function that you used before, but it also changed the ticks here.

04:47 So it’s saying, now, this is 1970, so this gives a 0 here. It starts somewhere in 1967, the data, and 1970 is considered year zero, and then it moves 10 years, 20 years, 30 years, 40 years forward.

05:01 So, with just a nice little scales manipulation, you can make your graph be much more understandable and easier to digest without actually needing to change the data below it.

05:13 And this is what you can do with the scales layer. As you can see, a lot of this is taken care of automatically with reasonable defaults, which is something I’ve mentioned a lot throughout this course, but let’s compare these two again. Without defining anything for the scales layer, you still get a readable graph from that.

05:30 So you often don’t need to do anything about it, but if you want to, you can, and you can make your graphs even better with that. That’s all about scales, and in the next lesson, you’re going to learn about coordinate systems.

Become a Member to join the conversation.