Create Your First pandas Plot
00:00 Create Your First Pandas Plot.
00:04
Your dataset contains some columns related to the earnings of graduates in each major. "Median"
is the median earnings of full-time, year-round workers, "P25th"
is the 25th percentile of earnings, "P75th"
is the 75th percentile of earnings, and "Rank"
is the major’s rank by median earnings. Let’s start with a plot displaying these columns.
00:29
First, you need to set up your Jupyter Notebook to display plots with the %matplotlib
magic command.
00:38
The %matplotlib
magic command sets up your Jupyter Notebook for displaying plots with Matplotlib. The standard Matplotlib graphics backend is used by default, and your plots will be displayed in a separate window. Note that you can change the Matplotlib backend by passing an argument to the %matplotlib
magic command.
00:56
For example, the inline
backend is popular for Jupyter Notebooks because it displays the plot in the Notebook itself, immediately below the cell that creates the plot.
01:05 There are a number of other backends available. For more information, check out the Rich Outputs tutorial in the IPython documentation.
01:14
Now you’re ready to make your first plot! You can do so with the .plot()
method.
01:29
.plot()
returns a line graph containing data from every row in the DataFrame. The x-axis values represent the rank of each institution, and the "P25th"
, "Median"
, and "P75th"
values are plotted on the y-axis.
01:44
If you’re not following along in a Jupyter Notebook or an IPython shell, then you’ll need to use the pyplot
interface from matplotlib
to display the plot. Here’s how to show the figure in a standard Python shell.
02:09
Notice that you must first import the pyplot
module from Matplotlib before calling plt.show()
to display the plot.
02:19 Regardless of the method used to create the plot, it looks like this. Looking at it, you can make the following observations. The median income decreases as rank decreases.
02:32 This is expected because the rank is determined by the median income. Some majors have large gaps between the 25th and 75th percentiles. People with these degrees may earn significantly less or significantly more than the median income.
02:48 Other majors have very small gaps between the 25th and 75th percentiles. People with these degrees earn salaries very close to the median income. This first plot already hints that there’s a lot more to discover in the data.
03:02 Some majors have a wide range of earnings, and others have a rather narrow range. To discover these differences, you’ll use several other types of plots.
03:12 For an introduction to medians, percentiles, and other statistics, check out Real Python’s Python Statistics Fundamentals: How to Describe Your Data course.
03:23
.plot()
has several optional parameters. Most notably, the kind
parameter accepts eleven different string values and determines which kind of plot you’ll create. "area"
is for area plots, "bar"
is for vertical bar charts, "barh"
is for horizontal bar charts, "box"
is for box plots, "hexbin"
is for hexbin plots, "hist"
is for histograms, "kde"
is for kernel density estimate charts, "density"
is an alias for "kde"
, "line"
is for line graphs, "pie"
is for pie charts, and "scatter"
is for scatter plots.
04:05
The default value is "line"
. Line graphs, like the one you’ve created above, provide a good overview of your data and you can use them to detect general trends.
04:14
They rarely provide sophisticated insight, but they can give you clues as to where to zoom in. If you don’t provide a parameter to .plot()
, then it creates a line plot with the index on the x-axis and all the numeric columns on the y-axis.
04:28 While this is a useful default for datasets with only a few columns, for the college majors dataset and its several numeric columns, it looks like quite a mess.
04:38
As an alternative to passing strings to the kind
parameter of .plot()
, DataFrame
objects have several methods that you can use to create the various kinds of plots you’ve just seen: .area()
, .bar()
, .barh()
, .box()
, .hexbin()
, .hist()
, .kde()
, .density()
, .line()
, .pie()
, and .scatter()
. In this video course, you’ll use the .plot()
interface and pass strings to the kind
parameter, but you’re encouraged to try out the other methods mentioned here as well.
05:12
Now that you’ve created your first pandas plot, let’s take a closer look at how .plot()
works.
Become a Member to join the conversation.