Visualizing DataFrames With Matplotlib
00:00 pandas allows you to visualize data or create plots right from DataFrames. It uses Matplotlib in the background, so exploiting pandas plotting capabilities is very similar to working with Matplotlib.
00:14
As an example, let’s take this temperature DataFrame, and if we call the .plot()
method,
00:22 then we’ll get a line graph where the horizontal axis is the datetime index.
00:29
You can pass in many optional parameters into this .plot()
method. The default kind of plot is a line graph, and so if I run this, we’ll get the exact same thing.
00:40 We can also, say, pass in the line color, which for example, we can choose as green. And maybe we’d like the style of the line to be one where we’ve got markers connected by straight lines.
00:55
And maybe we want the marker colors to be, say, black. An alias for marker color is mfc
for marker face color, and we can pass a k
for black.
01:09
And then maybe the marker size could be, say, 10
.
01:14
when you run that, you get this plot that has a little bit more color to it and clearly shows the data points on the graph. Now, you can directly call, instead of the .plot()
method, and passing in a kind
value, you can just simply use the .line()
method as well, and that will do the exact same thing. Now, if you wanted to save this figure, you would first need to get the figure and then simply pass in a value for the name of the figure to the .savefig()
method.
01:47
And maybe we’ll call this the 'temperatures'
file and save it as a PNG. So in this case, when you run that, we also get the figure, and in your current working directory, you should have that temperatures.png
file.
02:04 Now, another type of plot is a histogram. Let’s go back to our job candidates DataFrame for this.
02:14
What we’re going to do is we’re going to take the py-score
and create a histogram out of that, so we’ll go ahead and go plot and we can either also call the .hist()
method and pass in values, optional parameters, or we can pass a value to the kind
keyword argument as 'hist'
. But I think, in this case, it might be just a little bit more readable to simply just call the .hist()
method.
02:42
And for this histogram, we’ll have 5
bins and maybe give a value to the alpha
keyword argument so that the bars that are created have a little bit of transparency—say, maybe 0.8
. Now here, I should specify what column, and we said let’s try this with the 'py-score'
.
03:01 Otherwise, this would generate histograms for all of the columns that have numerical values. So if we go ahead and run that, we get this sort of pretty dull looking histogram based on the fact that the data for this example doesn’t have very interesting variation.
03:18 If, instead, we use fewer bins, then we run that—again, not so interesting. But you get the idea is that you can directly call the different plotting functions right on a DataFrame, and a lot of the optional parameters in some of the functions that are in Matplotlib can also be passed into these methods. And again, the reason is because in the background, pandas is using Matplotlib.
03:47
Maybe we’ll get some more interesting plots if we, for example, do this for the histogram for, say, the 'js-score'
and maybe also the 'total'
.
04:02
Here, we can vary up the alpha
value, just so that things are a little bit more transparent and we can see both of the histograms. These graphs are pretty basic, but if you’ve got data that’s a lot more interesting, you definitely want to read up on all the optional parameters that you can pass in the different plotting functions in Matplotlib.
04:24 This ends the lesson on using the basic plotting functions in pandas to graph some of the data in a DataFrame. We’re going to wrap things up in the next lesson with a summary.
Become a Member to join the conversation.