Join us and get access to hundreds of tutorials and a community of expert Pythonistas.

Unlock This Lesson

This lesson is for members only. Join us and get access to hundreds of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Hint: You can adjust the default video playback speed in your account settings.
Hint: You can set the default subtitles language in your account settings.
Sorry! Looks like there’s an issue with video playback 🙁 This might be due to a temporary outage or because of a configuration issue with your browser. Please see our video player troubleshooting guide to resolve the issue.

Math and Statistics Functions

In this lesson, you’ll learn about new and improved math and statistics functions in Python 3.8. Python 3.8 brings many improvements to existing standard library packages and modules. math in the standard library has a few new functions. math.prod() works similarly to the built-in sum(), but for multiplicative products:

>>>
>>> import math
>>> math.prod((2, 8, 7, 7))
784

>>> 2 * 8 * 7 * 7
784

The two statements are equivalent. prod() will be easier to use when you already have the factors stored in an iterable.

Another new function is math.isqrt(). You can use isqrt() to find the integer part of square roots:

>>>
>>> import math
>>> math.isqrt(9)
3

>>> math.sqrt(9)
3.0

>>> math.isqrt(15)
3

>>> math.sqrt(15)
3.872983346207417

The square root of 9 is 3. You can see that isqrt() returns an integer result, while math.sqrt() always returns a float. The square root of 15 is almost 3.9. Note that isqrt() truncates the answer down to the next integer, in this case 3.

Finally, you can now more easily work with n-dimensional points and vectors in the standard library. You can find the distance between two points with math.dist(), and the length of a vector with math.hypot():

>>>
>>> import math
>>> point_1 = (16, 25, 20)
>>> point_2 = (8, 15, 14)

>>> math.dist(point_1, point_2)
14.142135623730951

>>> math.hypot(*point_1)
35.79106033634656

>>> math.hypot(*point_2)
22.02271554554524

This makes it easier to work with points and vectors using the standard library. However, if you will be doing many calculations on points or vectors, you should check out NumPy.

The statistics module also has several new functions:

The following example shows the functions in use:

>>>
>>> import statistics
>>> data = [9, 3, 2, 1, 1, 2, 7, 9]
>>> statistics.fmean(data)
4.25

>>> statistics.geometric_mean(data)
3.013668912157617

>>> statistics.multimode(data)
[9, 2, 1]

>>> statistics.quantiles(data, n=4)
[1.25, 2.5, 8.5]

In Python 3.8, there is a new statistics.NormalDist class that makes it more convenient to work with the Gaussian normal distribution. To see an example of using NormalDist, you can try to compare the speed of the new statistics.fmean() and the traditional statistics.mean():

>>>
>>> import random
>>> import statistics
>>> from timeit import timeit

>>> # Create 10,000 random numbers
>>> data = [random.random() for _ in range(10_000)]

>>> # Measure the time it takes to run mean() and fmean()
>>> t_mean = [timeit("statistics.mean(data)", number=100, globals=globals())
...           for _ in range(30)]
>>> t_fmean = [timeit("statistics.fmean(data)", number=100, globals=globals())
...            for _ in range(30)]

>>> # Create NormalDist objects based on the sampled timings
>>> n_mean = statistics.NormalDist.from_samples(t_mean)
>>> n_fmean = statistics.NormalDist.from_samples(t_fmean)

>>> # Look at sample mean and standard deviation
>>> n_mean.mean, n_mean.stdev
(0.825690647733245, 0.07788573997674526)

>>> n_fmean.mean, n_fmean.stdev
(0.010488564966666065, 0.0008572332785645231)

>>> # Calculate the lower 1 percentile of mean
>>> n_mean.quantiles(n=100)[0]
0.6445013221202459

In this example, you use timeit to measure the execution time of mean() and fmean(). To get reliable results, you let timeit execute each function 100 times, and collect 30 such time samples for each function. Based on these samples, you create two NormalDist objects. Note that if you run the code yourself, it might take up to a minute to collect the different time samples.

NormalDist has many convenient attributes and methods. See the documentation for a complete list. Inspecting .mean and .stdev, you see that the old statistics.mean() runs in 0.826 ± 0.078 seconds, while the new statistics.fmean() spends 0.0105 ± 0.0009 seconds. In other words, fmean() is about 80 times faster for these data.

If you need more advanced statistics in Python than the standard library offers, check out statsmodels and scipy.stats.

Become a Member to join the conversation.