Visualizing Your pandas DataFrame
If you don’t want to run the code on your local machine, you can find the course demos on Google Colab.
Here are additional resources about data visualization in Python:
- Interactive Data Visualization in Python With Bokeh | Real Python Article
- Interactive Data Visualization in Python With Bokeh | Real Python Video Course
- Plot With pandas: Python Data Visualization Basics | Real Python Video Course
- Develop Data Visualization Interfaces in Python With Dash | Real Python Article
00:00
All data has a story to tell. So far, the language you’ve used to tell that story has been the values of the DataFrame
, but there is another way. Visualizations summarize the values in the data and provide information to the clients.
00:16 Charts and graphs can draw attention to points of interest and make the story obvious. Traditionally, visualizations in Python have been done with the Matplotlib package, and while it is very powerful and capable, the API has a learning curve. The Pandas API affords you the advantage of generating Matplotlib visualizations without the ceremony or noise.
00:41 While everything in the course up until now has been possible using the command line, such as IPython, visualizations are something you really need Jupyter Notebook for.
00:50
So if you haven’t started it up yet, go ahead and do that. Again, if you’re using Anaconda, it’s installed for you. Just run jupyter notebook
in an Anaconda prompt and the server will start.
01:04
You’ll need to import the pandas
package and load the NBA data into a DataFrame
.
01:10 Earlier in the course, you learned how to summarize data with grouping and aggregation methods. For example, here’s how to summarize the number of points scored each year by the New York Knicks.
01:25
There were more than 50 years of data in the DataFrame
. Can you see the highest or lowest scoring years? How about trends? It’s hard to do just by staring at the numbers and picking out the ones of interest.
01:40
You can generate a plot of the data just by calling the .plot()
method on the Series
. The default plot is a line plot. This chart clearly shows a peak around 1970 and a low point around 2000.
01:55
You can also create a bar chart simply by adding the keyword argument kind
and setting it to 'bar'
. That’s a little cramped, so let’s just look at the past 10 years.
02:07 And it’s obvious that 2012 was the lowest scoring year.
02:12 Another interesting visualization is the scatter plot that shows two values on different axes. This scatter plot shows the points versus opposition points for the Knicks after 2000.
02:25
As you might expect, games in which the home team scored more points, the opposition scored more points as well. But there’s something hidden in this data. Set the size of the points using the s
keyword argument.
02:39 See the empty area along the diagonal? That represents games where the home and away teams scored the same number of points, or a tie. From the scatter plot, it seems that there were no tied games, but you can write a simple query just to make sure.
02:56 You can also create pie charts with Pandas. For example, how many games did the Miami Heat win versus lose in the year 2013? Here’s the query to get the data.
03:11 This was a good year for the Heat, but how good? Use the pie chart to visualize the data. They won more than 75% of the games. This is just the beginning of what you can do with data visualization. For more, check out this post on Real Python.
03:31 In the last lesson of the course, you’ll review what you’ve learned.
pnmcdos on April 7, 2022
Answered my own question. My formula was missing ()
after the sum.
Vaibhav Patil on Oct. 1, 2023
It’s a very intuitive and nice explanation.
Become a Member to join the conversation.
pnmcdos on April 7, 2022
Minute 1:26, how did you get that information to print?