Join us and get access to hundreds of tutorials and a community of expert Pythonistas.

Unlock This Lesson

This lesson is for members only. Join us and get access to hundreds of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Hint: You can adjust the default video playback speed in your account settings.
Hint: You can set the default subtitles language in your account settings.
Sorry! Looks like there’s an issue with video playback 🙁 This might be due to a temporary outage or because of a configuration issue with your browser. Please see our video player troubleshooting guide to resolve the issue.

Matplotlib and Pandas

Give Feedback

Now that you’ve seen how to build a histogram in Python from the ground up, let’s see how other Python packages can do the job for you. Matplotlib provides the functionality to visualize Python histograms out of the box with a versatile wrapper around NumPy’s histogram():

import matplotlib.pyplot as plt

# An "interface" to matplotlib.axes.Axes.hist() method
n, bins, patches = plt.hist(x=d, bins='auto', color='#0504aa',
                            alpha=0.7, rwidth=0.85)
plt.grid(axis='y', alpha=0.75)
plt.title('My Very Own Histogram')
plt.text(23, 45, r'$\mu=15, b=3$')
maxfreq = n.max()
# Set a clean upper y-axis limit.
plt.ylim(ymax=np.ceil(maxfreq / 10) * 10 if maxfreq % 10 else maxfreq + 10)


As defined earlier, a plot of a histogram uses its bin edges on the x-axis and the corresponding frequencies on the y-axis. In the chart above, passing bins='auto' chooses between two algorithms to estimate the ideal number of bins. At a high level, the goal of the algorithm is to choose a bin width that generates the most faithful representation of the data. For more on this subject, which can get pretty technical, check out Choosing Histogram Bins from the Astropy docs.

Staying in Python’s scientific stack, Pandas’ Series.histogram() uses matplotlib.pyplot.hist() to draw a Matplotlib histogram of the input Series:

import pandas as pd

# Generate data on commute times.
size, scale = 1000, 10
commutes = pd.Series(np.random.gamma(scale, size=size) ** 1.5)

commutes.plot.hist(grid=True, bins=20, rwidth=0.9,
plt.title('Commute Times for 1,000 Commuters')
plt.ylabel('Commute Time')
plt.grid(axis='y', alpha=0.75)

Histogram of commute times for 1000 commuters

pandas.DataFrame.histogram() is similar but produces a histogram for each column of data in the DataFrame.

williamjarrold on April 24, 2020


The first script did not work the first time because it does not define the variable d. One simply needs to add…

np.random.seed(444) np.set_printoptions(precision=3)

d = np.random.laplace(loc=15, scale=3, size=500)

…to the top to make it work (it’s from the prior section of the course)

Also, after I made the addition and ran it from Terminal on my Mac, it did not display. Thanks to I fixed this problem by adding....

… to the last line of the script. If there are better / alternative ways of getting the display to work, I’m interested. (-:

Become a Member to join the conversation.