Create Your First Pandas Plot
Your dataset contains some columns related to the earnings of graduates in each major.
"Median" is the median earnings of full-time, year-round workers,
"P25th" is the 25th percentile of earnings,
"P75th" is the 75th percentile of earnings, and
"Rank" is the major’s rank by median earnings. Let’s start with a plot displaying these columns.
%matplotlib magic command sets up your Jupyter Notebook for displaying plots with Matplotlib. The standard Matplotlib graphics backend is used by default, and your plots will be displayed in a separate window. Note that you can change the Matplotlib backend by passing an argument to the
%matplotlib magic command.
.plot() returns a line graph containing data from every row in the DataFrame. The x-axis values represent the rank of each institution, and the
"P75th" values are plotted on the y-axis.
If you’re not following along in a Jupyter Notebook or an IPython shell, then you’ll need to use the
pyplot interface from
matplotlib to display the plot. Here’s how to show the figure in a standard Python shell.
02:32 This is expected because the rank is determined by the median income. Some majors have large gaps between the 25th and 75th percentiles. People with these degrees may earn significantly less or significantly more than the median income.
02:48 Other majors have very small gaps between the 25th and 75th percentiles. People with these degrees earn salaries very close to the median income. This first plot already hints that there’s a lot more to discover in the data.
.plot() has several optional parameters. Most notably, the
kind parameter accepts eleven different string values and determines which kind of plot you’ll create.
"area" is for area plots,
"bar" is for vertical bar charts,
"barh" is for horizontal bar charts,
"box" is for box plots,
"hexbin" is for hexbin plots,
"hist" is for histograms,
"kde" is for kernel density estimate charts,
"density" is an alias for
"line" is for line graphs,
"pie" is for pie charts, and
"scatter" is for scatter plots.
They rarely provide sophisticated insight, but they can give you clues as to where to zoom in. If you don’t provide a parameter to
.plot(), then it creates a line plot with the index on the x-axis and all the numeric columns on the y-axis.
As an alternative to passing strings to the
kind parameter of
DataFrame objects have several methods that you can use to create the various kinds of plots you’ve just seen:
.scatter(). In this video course, you’ll use the
.plot() interface and pass strings to the
kind parameter, but you’re encouraged to try out the other methods mentioned here as well.
Become a Member to join the conversation.