Plotting With Pandas
The pandas
library has become popular not just for enabling powerful data analysis but also for its handy pre-canned plotting methods.
Interestingly, those plotting methods are really just convenient wrappers around existing matplotlib
calls. You can use matplotlib
and pandas
to produce even more sophisticated visualizations.
00:00 The Pandas library has become popular not just for enabling powerful data analysis, but also for its handy, pre-canned plotting methods. Interestingly, though, Panda’s plotting methods are really just convenient wrappers around existing Matplotlib calls. We can use Matplotlib and Pandas to produce even more sophisticated visualizations.
00:28 Before we start scripting, let’s learn a bit about how Pandas works by running it in the interactive shell.
00:36
I’m going to start by importing all three of the required libraries, matplotlib.pyplot
, numpy
, and finally pandas
. Let’s create a pandas.Series
, which is a one-dimensional labeled array.
00:54
I’ll call this s
and I’ll get it with pd.Series()
passing in np.arange(5)
to generate an ndarray
from 0
to 4
inclusive.
01:11
And for the index
, I’ll convert the string 'abcde'
into a list
.
01:18
If I inspect this Series
, you can see what it looks like. To plot this, I’ll type ax = s.plot()
. pandas.Series
contain a .plot()
method, which calls the matplotlib
plotting method internally.
01:37
If you recall, that method implicitly tracks the current Axes
, and so we can obtain that Axes
object by storing it in a variable.
01:48
And just to show that this ax
variable is an Axes
, I can use the built-in type()
function passing in ax
. And look at that: AxesSubplot
, just like before.
02:02
Remember, we’re tracking the current Figure
with pyplot
under the hood, and so I can compare the ID of the object returned by the gca()
(get current axes) function with the ID of our Axes
object.
02:20
And they are the same. This shows that we can use Pandas in a similar way to stateful Matplotlib, but with the additional functionality of Pandas. This also means that we can take a stateless approach, obtaining our Axes
object and modifying it manually before plotting the whole Figure
.
02:43
I’ll show you how to do that next. I’m here in Visual Studio Code in a new file called plot5.py
. We’ll be plotting the moving average of a widely watched financial time series, the CBOE market volatility index, or CBOE VIX. In other words, financial data.
03:08
The first thing we’ll do is grab all of our libraries. The only new one here is matplotlib.transforms
, which I will import as mtransforms
.
03:22
And of course, we need pandas
too. Next, I’ll declare a variable for the URL and initialize it with this string. This links to a CSV file containing dates and their associated volatility.
03:38
We need to turn this into a pandas.Series
, so I’ll say vix = pd.read_csv()
, passing in the URL to read from as well as some other arguments that will help with interpreting dates and removing non-accessible values, which are marked in the file with a dot (.
).
04:04
We also need to generate a Series
of the 90-day rolling averages, which can be done with the .rolling()
and .mean()
functions, just like this.
04:17
In order to split this data into bins, I’ll use the pandas.cut()
function. Each date will be assigned to a bin corresponding to a severity level obtained from the rolling average associated with that date.
04:34
The bins are labeled 0
, 1
, 2
, and 3
, and which date goes into which bin is determined by these cut-offs: 14
, 18
, and 24
.
04:49
Now I need to decide on a color map and store it in a variable. I’ll call this cmap
and I’ll get it with plt.get_cmap()
passing in Red-Yellow-Green reversed, just like I did before. To actually create the plot, We can call the .plot()
function on our ma
object.
05:15
In this case, we want to plot this rolling average as a black line in an 8 by 4 figure. Remember, the Pandas .plot()
function calls pyplot.plot()
function under the hood, and so now pyplot
is tracking a current Axes
.
05:34 Let’s grab it and store it as a variable so that we can further modify it.
05:41
ax = plt.gca()
. Now I’ll quickly set some Axes
properties with methods that we’ve used before.
05:52
This is nothing new. This code here will use a Matplotlib transform to draw colored bars in the visualization based on those state
bins we created earlier.
06:06 This is another way to visualize level of fear in the marketplace.
06:12 In other words, we’re mapping bins—or severity levels—to a specific color and then drawing those on the screen. And finally, I want to draw a horizontal dashed line at the mean of our VIX data.
06:31
We can do that with Matplotlib’s .axhline()
method passing in the vix.mean()
and some styling properties.
06:43
And now I will show this with pyplot
. If we observe this plot, you can see that the colored bars actually correspond to the black line. A higher average results in more red colors, and a lower average corresponds to more green. It all depends on the bin each date was placed in.
07:05 Now, this course is by no means a dedicated Pandas tutorial, as you can probably tell. We’ve got dedicated tutorials at realpython.com if you’re interested in learning more about how this Pandas code works in-depth.
07:22 The takeaway here is that we can use Pandas to aid in our data analysis and plotting alongside Matplotlib. It just opens the door for new opportunities.
Xavier on Nov. 2, 2019
Re the invalid “dark grey”, permissible XKCD colours can be found here
https://matplotlib.org/3.1.0/tutorials/colors/colors.html#xkcd-colors
Looks like you must also remove the space after the colon, e.g. color='xkcd:darkblue'
Xavier on Nov. 2, 2019
Typo in setting of vix. read_csv argument should be parse_dates=True
Ranit Pradhan on April 6, 2020
TypeError: parser_f() got an unexpected keyword argument ‘pase_dates’
Ranit Pradhan on April 6, 2020
Ok,got it ....Thank You Mr.Xavier, it should be parse_dates=True.
patientwriter on Feb. 24, 2022
The very idea that we should use pandas, of all things, in order to simplify working with matplotlib, tells you all you need to know about how crazy complex and unpythonic matplotlib is to begin with. The idea of bringing matlab functionality into python is fine, but matplotlib clearly was far too slavish in following matlab’s original implementation rather than transforming it to python equivalents. For the core devs to abandon Pylab was no small concession to this reality. Thus, it is no wonder there are now so many alternative plotting libraries in the python ecosystem.
Bartosz Wilk on Aug. 18, 2022
Traceback (most recent call last):
File "/Volumes/Work/realpython/plotting_with_matplotlib/plotting_with_panda.py", line 31, in <module>
ax.axhline(vix.mean(), linestyle='dashed', color='xkcd: dark grey', alpha=0.6, label='Full-period mean', marker='')
File "/Volumes/Work/realpython/venv/lib/python3.10/site-packages/matplotlib/axes/_axes.py", line 737, in axhline
l = mlines.Line2D([xmin, xmax], [y, y], transform=trans, **kwargs)
File "/Volumes/Work/realpython/venv/lib/python3.10/site-packages/matplotlib/lines.py", line 370, in __init__
self.set_color(color)
File "/Volumes/Work/realpython/venv/lib/python3.10/site-packages/matplotlib/lines.py", line 1030, in set_color
mcolors._check_color_like(color=color)
File "/Volumes/Work/realpython/venv/lib/python3.10/site-packages/matplotlib/colors.py", line 130, in _check_color_like
raise ValueError(f"{v!r} is not a valid value for {k}")
ValueError: 'xkcd: dark grey' is not a valid value for color
mindconnect dot cc on April 1, 2023
Replace color='xkcd: dark grey'
with color='#444'
.
walterrieppi on Feb. 20, 2024
I think the URL is changed or something else because i have an urlopen error [WinError 10060].
Bartosz Zaczyński RP Team on Feb. 20, 2024
@walterrieppi The URL is still very much alive. It might be some kind of a network problem. Try downloading the file manually and passing the path to a local file instead of the URL to pandas.
billzabel837 on Sept. 8, 2024
Time keeps on ticking, ticking, into the future Python 3.12.0 pandas 2.2.2
The parameter squeeze is DEPRECATED since pandas version 1.4.0.
'infer_datetime_format' is deprecated,
A strict version of it is now the default.
You can safely remove this argument.
vix = pd.read_csv(
url,
index_col=0,
parse_dates=True,
na_values=".",
infer_datetime_format=True,
squeeze=True,
).dropna()
# (fails pd.cut - vix is still a data frame)
# replaced with
vix = pd.read_csv(url, index_col=0, parse_dates=True, na_values=".").dropna()
vix = vix.squeeze("columns")
worked for me…
Martin Breuss RP Team on Sept. 9, 2024
@billzabel837 ah yes it really does, and sometimes it ticks very quickly in the tech world… ⏳
Thanks for noting that this course needs an update and for posting your solution 🙏
Become a Member to join the conversation.
Marco Belo on Oct. 30, 2019
ValueError: Invalid RGBA argument: ‘xkcd: dark grey’