The NumPy library supports expressive, efficient numerical programming in Python. Finding extreme values is a very common requirement in data analysis. The NumPy max()
and maximum()
functions are two examples of how NumPy lets you combine the coding comfort offered by Python with the runtime efficiency you’d expect from C.
In this tutorial, you’ll learn how to:
 Use the NumPy
max()
function  Use the NumPy
maximum()
function and understand why it’s different frommax()
 Solve practical problems with these functions
 Handle missing values in your data
 Apply the same concepts to finding minimum values
This tutorial includes a very short introduction to NumPy, so even if you’ve never used NumPy before, you should be able to jump right in. With the background provided here, you’ll be ready to continue exploring the wealth of functionality to be found in the NumPy library.
Free Bonus: Click here to get access to a free NumPy Resources Guide that points you to the best tutorials, videos, and books for improving your NumPy skills.
NumPy: Numerical Python
NumPy is short for Numerical Python. It’s an open source Python library that enables a wide range of applications in the fields of science, statistics, and data analytics through its support of fast, parallelized computations on multidimensional arrays of numbers. Many of the most popular numerical packages use NumPy as their base library.
Introducing NumPy
The NumPy library is built around a class named np.ndarray
and a set of methods and functions that leverage Python syntax for defining and manipulating arrays of any shape or size.
NumPy’s core code for array manipulation is written in C. You can use functions and methods directly on an ndarray
as NumPy’s Cbased code efficiently loops over all the array elements in the background. NumPy’s highlevel syntax means that you can simply and elegantly express complex programs and execute them at high speeds.
You can use a regular Python list
to represent an array. However, NumPy arrays are far more efficient than lists, and they’re supported by a huge library of methods and functions. These include mathematical and logical operations, sorting, Fourier transforms, linear algebra, array reshaping, and much more.
Today, NumPy is in widespread use in fields as diverse as astronomy, quantum computing, bioinformatics, and all kinds of engineering.
NumPy is used under the hood as the numerical engine for many other libraries, such as pandas and SciPy. It also integrates easily with visualization libraries like Matplotlib and seaborn.
NumPy is easy to install with your package manager, for example pip
or conda
. For detailed instructions plus a more extensive introduction to NumPy and its capabilities, take a look at NumPy Tutorial: Your First Steps Into Data Science in Python or the NumPy Absolute Beginner’s Guide.
In this tutorial, you’ll learn how to take your very first steps in using NumPy. You’ll then explore NumPy’s max()
and maximum()
commands.
Creating and Using NumPy Arrays
You’ll start your investigation with a quick overview of NumPy arrays, the flexible data structure that gives NumPy its versatility and power.
The fundamental building block for any NumPy program is the ndarray
. An ndarray
is a Python object wrapping an array of numbers. It may, in principle, have any number of dimensions of any size. You can declare an array in several ways. The most straightforward method starts from a regular Python list or tuple:
>>> import numpy as np
>>> A = np.array([3, 7, 2, 4, 5])
>>> A
array([3, 7, 2, 4, 5])
>>> B = np.array(((1, 4), (1, 5), (9, 2)))
>>> B
array([[1, 4],
[1, 5],
[9, 2]])
You’ve imported numpy
under the alias np
. This is a standard, widespread convention, so you’ll see it in most tutorials and programs.
In this example, A
is a onedimensional array of numbers, while B
is twodimensional.
Notice that the np.array()
factory function expects a Python list or tuple as its first parameter, so the list or tuple must therefore be wrapped in its own set of brackets or parentheses, respectively. Just throwing in an unwrapped bunch of numbers won’t work:
>>> np.array(3, 7, 2, 4, 5)
Traceback (most recent call last):
...
TypeError: array() takes from 1 to 2 positional arguments but 5 were given
With this syntax, the interpreter sees five separate positional arguments, so it’s confused.
In your constructor for array B
, the nested tuple argument needs an extra pair of parentheses to identify it, in its entirety, as the first parameter of np.array()
.
Addressing the array elements is straightforward. NumPy’s indices start at zero, like all Python sequences. By convention, a twodimensional array is displayed so that the first index refers to the row, and the second index refers to the column. So A[0]
is the first element of the onedimensional array A
, and B[2, 1]
is the second element in the third row of the twodimensional array B
:
>>> A[0] # First element of A
3
>>> A[4] # Fifth and last element of A
5
>>> A[1] # Last element of A, same as above
5
>>> A[5] # This won't work because A doesn't have a sixth element
Traceback (most recent call last):
...
IndexError: index 5 is out of bounds for axis 0 with size 5
>>> B[2, 1] # Second element in third row of B
2
So far, it seems that you’ve simply done a little extra typing to create arrays that look very similar to Python lists. But looks can be deceptive! Each ndarray
object has approximately a hundred builtin properties and methods, and you can pass it to hundreds more functions in the NumPy library.
Almost anything that you can imagine doing to an array can be achieved in a few lines of code. In this tutorial, you’ll only be using a few functions, but you can explore the full power of arrays in the NumPy API documentation.
Creating Arrays in Other Ways
You’ve already created some NumPy arrays from Python sequences. But arrays can be created in many other ways. One of the simplest is np.arange()
, which behaves rather like a soupedup version of Python’s builtin range()
function:
>>> np.arange(10)
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> np.arange(2, 3, 0.1)
array([ 2., 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9])
In the first example above, you only specified the upper limit of 10
. NumPy follows the standard Python convention for ranges and returns an ndarray
containing the integers 0
to 9
. The second example specifies a starting value of 2
, an upper limit of 3
, and an increment of 0.1
. Unlike Python’s standard range()
function, np.arange()
can handle noninteger increments, and it automatically generates an array with np.float
elements in this case.
NumPy’s arrays may also be read from disk, synthesized from data returned by APIs, or constructed from buffers or other arrays.
NumPy arrays can contain various types of integers, floatingpoint numbers, and complex numbers, but all the elements in an array must be of the same type.
You’ll start by using builtin ndarray
properties to understand the arrays A
and B
:
>>> A.size
5
>>> A.shape
(5,)
>>> B.size
6
>>> B.shape
(3, 2)
The .size
attribute counts the elements in the array, and the .shape
attribute contains an ordered tuple of dimensions, which NumPy calls axes. A
is a onedimensional array with one row containing five elements. Because A
has only one axis, A.shape
returns a oneelement tuple.
By convention, in a twodimensional matrix, axis 0
corresponds to the rows, and axis 1
corresponds to the columns, so the output of B.shape
tells you that B
has three rows and two columns.
Python strings and lists have a very handy feature known as slicing, which allows you to select sections of a string or list by specifying indices or ranges of indices. This idea generalizes very naturally to NumPy arrays.
For example, you can extract just the parts you need from B
, without affecting the original array:
>>> B[2, 0]
9
>>> B[1, :]
array([1, 5])
In the first example above, you picked out the single element in row 2
and column 0
using B[2, 0]
. The second example uses a slice to pick out a subarray. Here, the index 1
in B[1, :]
selects row 1
of B
. The :
in the second index position selects all the elements in that row. As a result, the expression B[1, :]
returns an array with one row and two columns, containing all the elements from row 1
of B
.
If you need to work with matrices having three or more dimensions, then NumPy has you covered. The syntax is flexible enough to cover any case. In this tutorial, though, you’ll only deal with one and twodimensional arrays.
If you have any questions as you play with NumPy, the official NumPy docs are thorough and wellwritten. You’ll find them indispensable if you do serious development using NumPy.
NumPy’s max()
: The Maximum Element in an Array
In this section, you’ll become familiar with np.max()
, a versatile tool for finding maximum values in various circumstances.
Note: NumPy has both a packagelevel function and an ndarray
method named max()
. They work in the same way, though the package function np.max()
requires the target array name as its first parameter. In what follows, you’ll be using the function and the method interchangeably.
Python also has a builtin max()
function that can calculate maximum values of iterables. You can use this builtin max()
to find the maximum element in a onedimensional NumPy array, but it has no support for arrays with more dimensions. When dealing with NumPy arrays, you should stick to NumPy’s own maximum functions and methods.
For the rest of this tutorial, max()
will always refer to the NumPy version.
np.max()
is the tool that you need for finding the maximum value or values in a single array. Ready to give it a go?
Using max()
To illustrate the max()
function, you’re going to create an array named n_scores
containing the test scores obtained by the students in Professor Newton’s linear algebra class.
Each row represents one student, and each column contains the scores on a particular test. So column 0
contains all the student scores for the first test, column 1
contains the scores for the second test, and so on. Here’s the n_scores
array:
>>> import numpy as np
>>> n_scores = np.array([
... [63, 72, 75, 51, 83],
... [44, 53, 57, 56, 48],
... [71, 77, 82, 91, 76],
... [67, 56, 82, 33, 74],
... [64, 76, 72, 63, 76],
... [47, 56, 49, 53, 42],
... [91, 93, 90, 88, 96],
... [61, 56, 77, 74, 74],
... ])
You can copy and paste this code into your Python console if you want to follow along. To simplify the formatting before copying, click >>>
at the top right of the code block. You can do the same with any of the Python code in the examples.
Once you’ve done that, the n_scores
array is in memory. You can ask the interpreter for some of its attributes:
>>> n_scores.size
40
>>> n_scores.shape
(8, 5)
The .shape
and .size
attributes, as above, confirm that you have 8
rows representing students and 5
columns representing tests, for a total of 40
test scores.
Suppose now that you want to find the top score achieved by any student on any test. For Professor Newton’s little linear algebra class, you could find the top score fairly quickly just by examining the data. But there’s a quicker method that’ll show its worth when you’re dealing with much larger datasets, containing perhaps thousands of rows and columns.
Try using the array’s .max()
method:
>>> n_scores.max()
96
The .max()
method has scanned the whole array and returned the largest element. Using this method is exactly equivalent to calling np.max(n_scores)
.
But perhaps you want some more detailed information. What was the top score for each test?
Here you can use the axis
parameter:
>>> n_scores.max(axis=0)
array([91, 93, 90, 91, 96])
The new parameter axis=0
tells NumPy to find the largest value out of all the rows. Since n_scores
has five columns, NumPy does this for each column independently. This produces five numbers, each of which is the maximum value in that column.
The axis
parameter uses the standard convention for indexing dimensions. So axis=0
refers to the rows of an array, and axis=1
refers to the columns.
The top score for each student is just as easy to find:
>>> n_scores.max(axis=1)
array([83, 57, 91, 82, 76, 56, 96, 77])
This time, NumPy has returned an array with eight elements, one per student. The n_scores
array contains one row per student. The parameter axis=1
told NumPy to find the maximum value for each student, across the columns. Therefore, each element of the output contains the highest score attained by the corresponding student.
Perhaps you want the top scores per student, but you’ve decided to exclude the first and last tests. Slicing does the trick:
>>> filtered_scores = n_scores[:, 1:1]
>>> filtered_scores.shape
(8, 3)
>>> filtered_scores
array([72, 75, 51],
[53, 57, 56],
[77, 82, 91],
[56, 82, 33],
[76, 72, 63],
[56, 49, 53],
[93, 90, 88],
[56, 77, 74]])
>>> filtered_scores.max(axis=1)
array([75, 57, 91, 82, 76, 56, 93, 77])
You can understand the slice notation n_scores[:, 1:1]
as follows. The first index range, represented by the lone :
, selects all the rows in the slice.
The second index range after the comma, 1:1
, tells NumPy to take the columns, starting at column 1
and ending 1
column before the last. The result of the slice is stored in a new array named filtered_scores
.
With a bit of practice, you’ll learn to do array slicing on the fly, so you won’t need to create the intermediate array filtered_scores
explicitly:
>>> n_scores[:, 1:1].max(axis=1)
array([75, 57, 91, 82, 76, 56, 93, 77])
Here you’ve performed the slice and the method call in a single line, but the result is the same. NumPy returns the perstudent set of maximum n_scores
for the restricted set of tests.
Handling Missing Values in np.max()
So now you know how to find maximum values in any completely filled array. But what happens when a few array values are missing? This is pretty common with realworld data.
To illustrate, you’ll create a small array containing a week’s worth of daily temperature readings, in Celsius, from a digital thermometer, starting on Monday:
>>> temperatures_week_1 = np.array([7.1, 7.7, 8.1, 8.0, 9.2, np.nan, 8.4])
>>> temperatures_week_1.size
7
It seems the thermometer had a malfunction on Saturday, and the corresponding temperature value is missing, a situation indicated by the np.nan
value. This is the special value Not a Number, which is commonly used to mark missing values in realworld data applications.
So far, so good. But a problem arises if you innocently try to apply .max()
to this array:
>>> temperatures_week_1.max()
nan
Since np.nan
reports a missing value, NumPy’s default behavior is to flag this by reporting that the maximum, too, is unknown. For some applications, this makes perfect sense. But for your application, perhaps you’d find it more useful to ignore the Saturday problem and get a maximum value from the remaining, valid readings. NumPy has provided the np.nanmax()
function to take care of such situations:
>>> np.nanmax(temperatures_week_1)
9.2
This function ignores any nan
values and returns the largest numerical value, as expected.
Notice that np.nanmax()
is a function in the NumPy library, not a method of the ndarray
object.
Exploring Related Maximum Functions
You’ve now seen the most common examples of NumPy’s maximumfinding capabilities for single arrays. But there are a few more NumPy functions related to maximum values that are worth knowing about.
For example, instead the maximum values in an array, you might want the indices of the maximum values. Let’s say you want to use your n_scores
array to identify the student who did best on each test. The .argmax()
method is your friend here:
>>> n_scores.argmax(axis=0)
array([6, 6, 6, 2, 6])
It appears that student 6
obtained the top score on every test but one. Student 2
did best on the fourth test.
You’ll recall that you can also apply np.max()
as a function of the NumPy package, rather than as a method of a NumPy array. In this case, the array must be supplied as the first argument of the function.
For historical reasons, the packagelevel function np.max()
has an alias, np.amax()
, which is identical in every respect apart from the name:
>>> n_scores.max(axis=1)
array([83, 57, 91, 82, 76, 56, 96, 77])
>>> np.max(n_scores, axis=1)
array([83, 57, 91, 82, 76, 56, 96, 77])
>>> np.amax(n_scores, axis=1)
array([83, 57, 91, 82, 76, 56, 96, 77])
In the code above, you’ve called .max()
as a method of the n_scores
object, and as a standalone library function with n_scores
as its first parameter. You’ve also called the alias np.amax()
in the same way. All three calls produce exactly the same results.
Now you’ve seen how to use np.max()
, np.amax()
, or .max()
to find maximum values for an array along various axes. You’ve also used np.nanmax()
to find the maximum values while ignoring nan
values, as well as np.argmax()
or .argmax()
to find the indices of the maximum values.
You won’t be surprised to learn that NumPy has an equivalent set of minimum functions: np.min()
, np.amin()
, .min()
, np.nanmin()
, np.argmin()
, and .argmin()
. You won’t deal with those here, but they behave exactly like their maximum cousins.
NumPy’s maximum()
: Maximum Elements Across Arrays
Another common task in data science involves comparing two similar arrays. NumPy’s maximum()
function is the tool of choice for finding maximum values across arrays. Since maximum()
always involves two input arrays, there’s no corresponding method. The np.maximum()
function expects the input arrays as its first two parameters.
Using np.maximum()
Continuing with the previous example involving class scores, suppose that Professor Newton’s colleague—and archrival—Professor Leibniz is also running a linear algebra class with eight students. Construct a new array with the values for Leibniz’s class:
>>> l_scores = np.array([
... [87, 73, 71, 59, 67],
... [60, 53, 82, 80, 58],
... [92, 85, 60, 79, 77],
... [67, 79, 71, 69, 87],
... [86, 91, 92, 73, 61],
... [70, 66, 60, 79, 57],
... [83, 51, 64, 63, 58],
... [89, 51, 72, 56, 49],
... ])
>>> l_scores.shape
(8, 5)
The new array, l_scores
, has the same shape as n_scores
.
You’d like to compare the two classes, student by student and test by test, to find the higher score in each case.
NumPy has a function, np.maximum()
, specifically designed for comparing two arrays in an elementbyelement manner.
Check it out in action:
>>> np.maximum(n_scores, l_scores)
array([[87, 73, 75, 59, 83],
[60, 53, 82, 80, 58],
[92, 85, 82, 91, 77],
[67, 79, 82, 69, 87],
[86, 91, 92, 73, 76],
[70, 66, 60, 79, 57],
[91, 93, 90, 88, 96],
[89, 56, 77, 74, 74]])
If you visually check the arrays n_scores
and l_scores
, then you’ll see that np.maximum()
has indeed picked out the higher of the two scores for each [row, column] pair of indices.
What if you only want to compare the best test results in each class? You can combine np.max()
and np.maximum()
to get that effect:
>>> best_n = n_scores.max(axis=0)
>>> best_n
array([91, 93, 90, 91, 96])
>>> best_l = l_scores.max(axis=0)
>>> best_l
array([92, 91, 92, 80, 87])
>>> np.maximum(best_n, best_l)
array([92, 93, 92, 91, 96])
As before, each call to .max()
returns an array of maximum scores for all the students in the relevant class, one element for each test. But this time, you’re feeding those returned arrays into the maximum()
function, which compares the two arrays and returns the higher score for each test across the arrays.
You can combine those operations into one by dispensing with the intermediate arrays, best_n
and best_l
:
>>> np.maximum(n_scores.max(axis=0), l_scores.max(axis=0))
array([91, 93, 90, 91, 96])
This gives the same result as before, but with less typing. You can choose whichever method you prefer.
Handling Missing Values in np.maximum()
Remember the temperatures_week_1
array from an earlier example? If you use a second week’s temperature records with the maximum()
function, you may spot a familiar problem.
First, you’ll create a new array to hold the new temperatures:
>>> temperatures_week_2 = np.array(
... [7.3, 7.9, np.nan, 8.1, np.nan, np.nan, 10.2]
... )
There are missing values in the temperatures_week_2
data, too. Now see what happens if you apply the np.maximum
function to these two temperature arrays:
>>> np.maximum(temperatures_week_1, temperatures_week_2)
array([ 7.3, 7.9, nan, 8.1, nan, nan, 10.2])
All the nan
values in both arrays have popped up as missing values in the output. There’s a good reason for NumPy’s approach to propagating nan
. Often it’s important for the integrity of your results that you keep track of the missing values, rather than brushing them under the rug. But here, you just want to get the best view of the weekly maximum values. The solution, in this case, is another NumPy package function, np.fmax()
:
>>> np.fmax(temperatures_week_1, temperatures_week_2)
array([ 7.3, 7.9, 8.1, 8.1, 9.2, nan, 10.2])
Now, two of the missing values have simply been ignored, and the remaining floatingpoint value at that index has been taken as the maximum. But the Saturday temperature can’t be fixed in that way, because both source values are missing. Since there’s no reasonable value to insert here, np.fmax()
just leaves it as a nan
.
Just as np.max()
and np.nanmax()
have the parallel minimum functions np.min()
and np.nanmin()
, so too do np.maximum()
and np.fmax()
have corresponding functions, np.minimum()
and np.fmin()
, that mirror their functionality for minimum values.
Advanced Usage
You’ve now seen examples of all the basic use cases for NumPy’s max()
and maximum()
, plus a few related functions.
Now you’ll investigate some of the more obscure optional parameters to these functions and find out when they can be useful.
Reusing Memory
When you call a function in Python, a value or object is returned. You can use that result immediately by printing it or writing it to disk, or by feeding it directly into another function as an input parameter. You can also save it to a new variable for future reference.
If you call the function in the Python REPL but don’t use it in one of those ways, then the REPL prints out the return value on the console so that you’re aware that something has been returned. All of this is standard Python stuff, and not specific to NumPy.
NumPy’s array functions are designed to handle huge inputs, and they often produce huge outputs. If you call such a function many hundreds or thousands of times, then you’ll be allocating very large amounts of memory. This can slow your program down and, in an extreme case, might even cause a memory or stack overflow.
This problem can be avoided by using the out
parameter, which is available for both np.max()
and np.maximum()
, as well as for many other NumPy functions. The idea is to preallocate a suitable array to hold the function result, and keep reusing that same chunk of memory in subsequent calls.
You can revisit the temperature problem to create an example of using the out
parameter with the np.max()
function. You’ll also use the dtype
parameter to control the type of the returned array:
>>> temperature_buffer = np.empty(7, dtype=np.float32)
>>> temperature_buffer.shape
(7,)
>>> np.maximum(temperatures_week_1, temperatures_week_2, out=temperature_buffer)
array([ 7.3, 7.9, nan, 8.1, nan, nan, 10.2], dtype=float32)
The initial values in temperature_buffer
don’t matter, since they’ll be overwritten. But the array’s shape is important in that it must match the output shape. The displayed result looks like the output that you received from the original np.maximum()
example. So what’s changed? The difference is that you now have the same data stored in temperature_buffer
:
>>> temperature_buffer
array([ 7.3, 7.9, nan, 8.1, nan, nan, 10.2], dtype=float32)
The np.maximum()
return value has been stored in the temperature_buffer
variable, which you previously created with the right shape to accept that return value. Since you also specified dtype=np.float32
when you declared this buffer, NumPy will do its best to convert the output data to that type.
Remember to use the buffer contents before they’re overwritten by the next call to this function.
Filtering Arrays
Another parameter that’s occasionally useful is where
. This applies a filter to the input array or arrays, so that only those values for which the where
condition is True
will be included in the comparison. The other values will be ignored, and the corresponding elements of the output array will be left unaltered. In most cases, this will leave them holding arbitrary values.
For the sake of the example, suppose you’ve decided, for whatever reason, to ignore all scores less than 60
for calculating the perstudent maximum values in Professor Newton’s class. Your first attempt might go like this:
>>> n_scores
array([[63, 72, 75, 51, 83],
[44, 53, 57, 56, 48],
[71, 77, 82, 91, 76],
[67, 56, 82, 33, 74],
[64, 76, 72, 63, 76],
[47, 56, 49, 53, 42],
[91, 93, 90, 88, 96],
[61, 56, 77, 74, 74]])
>>> n_scores.max(axis=1, where=(n_scores >= 60))
ValueError: reduction operation 'maximum' does not have an identity,
so to use a where mask one has to specify 'initial'
The problem here is that NumPy doesn’t know what to do with the students in rows 1
and 5
, who didn’t achieve a single test score of 60
or better. The solution is to provide an initial
parameter:
>>> n_scores.max(axis=1, where=(n_scores >= 60), initial=60)
array([83, 60, 91, 82, 76, 60, 96, 77])
With the two new parameters, where
and initial
, n_scores.max()
considers only the elements greater than or equal to 60
. For the rows where there is no such element, it returns the initial
value of 60
instead. So the lucky students at indices 1
and 5
got their best score boosted to 60
by this operation!
The original n_scores
array is untouched.
Comparing Differently Shaped Arrays With Broadcasting
You’ve learned how to use np.maximum()
to compare arrays with identical shapes. But it turns out that this function, along with many others in the NumPy library, is much more versatile than that. NumPy has a concept called broadcasting that provides a very useful extension to the behavior of most functions involving two arrays, including np.maximum()
.
Whenever you call a NumPy function that operates on two arrays, A
and B
, it checks their .shape
properties to see if they’re compatible. If they have exactly the same .shape
, then NumPy just matches the arrays element by element, pairing up the element at A[i, j]
with the element at B[i, j]
. np.maximum()
works like this too.
Broadcasting enables NumPy to operate on two arrays with different shapes, provided there’s still a sensible way to match up pairs of elements. The simplest example of this is to broadcast a single element over an entire array. You’ll explore broadcasting by continuing the example of Professor Newton and his linear algebra class. Suppose he asks you to ensure that none of his students receives a score below 75
. Here’s how you might do it:
>>> np.maximum(n_scores, 75)
array([[75, 75, 75, 75, 83],
[75, 75, 75, 75, 75],
[75, 77, 82, 91, 76],
[75, 75, 82, 75, 75],
[75, 76, 75, 75, 76],
[75, 75, 75, 75, 75],
[91, 93, 90, 88, 96],
[75, 75, 77, 75, 75]])
You’ve applied the np.maximum()
function to two arguments: n_scores
, whose .shape
is (8, 5), and the single scalar parameter 75
. You can think of this second parameter as a 1 × 1 array that’ll be stretched inside the function to cover eight rows and five columns. The stretched array can then be compared element by element with n_scores
, and the pairwise maximum can be returned for each element of the result.
The result is the same as if you had compared n_scores
with an array of its own shape, (8, 5), but with the value 75
in each element.
This stretching is just conceptual—NumPy is smart enough to do all this without actually creating the stretched array. So you get the notational convenience of this example without compromising efficiency.
You can do much more with broadcasting. Professor Leibniz has noticed Newton’s skulduggery with his best_n_scores
array, and decides to engage in a little data manipulation of her own.
Leibniz’s plan is to artificially boost all her students’ scores to be at least equal to the average score for a particular test. This will have the effect of increasing all the belowaverage scores—and thus produce some quite misleading results! How can you help the professor achieve her somewhat nefarious ends?
Your first step is to use the array’s .mean()
method to create a onedimensional array of means per test. Then you can use np.maximum()
and broadcast this array over the entire l_scores
matrix:
>>> mean_l_scores = l_scores.mean(axis=0, dtype=np.integer)
>>> mean_l_scores
array([79, 68, 71, 69, 64])
>>> np.maximum(mean_l_scores, l_scores)
array([[87, 73, 71, 69, 67],
[79, 68, 82, 80, 64],
[92, 85, 71, 79, 77],
[79, 79, 71, 69, 87],
[86, 91, 92, 73, 64],
[79, 68, 71, 79, 64],
[83, 68, 71, 69, 64],
[89, 68, 72, 69, 64]])
The broadcasting happens in the highlighted function call. The onedimensional mean_l_scores
array has been conceptually stretched to match the twodimensional l_scores
array. The output array has the same .shape
as the larger of the two input arrays, l_scores
.
Following Broadcasting Rules
So, what are the rules for broadcasting? A great many NumPy functions accept two array arguments. np.maximum()
is just one of these.
Arrays that can be used together in such functions are termed compatible, and their compatibility depends on the number and size of their dimensions—that is, on their .shape
.
The simplest case occurs if the two arrays, say A
and B
, have identical shapes. Each element in A
is matched, for the function’s purposes, to the element at the same index address in B
.
Broadcasting rules get more interesting when A
and B
have different shapes. The elements of compatible arrays must somehow be unambiguously paired together so that each element of the larger array can interact with an element of the smaller array. The output array will have the .shape
of the larger of the two input arrays. So compatible arrays must follow these rules:

If one array has fewer dimensions than the other, only the trailing dimensions are matched for compatibility. The trailing dimensions are those that are present in the
.shape
of both arrays, counting from the right. So ifA.shape
is(99, 99, 2, 3)
andB.shape
is(2, 3)
, thenA
andB
are compatible because(2, 3)
are the trailing dimensions of each. You can completely ignore the two leftmost dimensions ofA
. 
Even if the trailing dimensions aren’t equal, the arrays are still compatible if one of those dimensions is equal to
1
in either array. So ifA.shape
is(99, 99, 2, 3)
as before andB.shape
is(1, 99, 1, 3)
or(1, 3)
or(1, 2, 1)
or(1, 1)
, thenB
is still compatible withA
in each case.
You can get a feel for the broadcasting rules by playing around in the Python REPL. You’ll be creating some toy arrays to illustrate how broadcasting works and how the output array is generated:
>>> A = np.arange(24).reshape(2, 3, 4)
>>> A
array([[[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]],
[[12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23]]])
>>> A.shape
(2, 3, 4)
>>> B = np.array(
... [
... [[7, 11, 10, 2], [6, 7, 2, 14], [ 7, 4, 4, 1]],
... [[18, 5, 22, 7], [25, 8, 15, 24], [31, 15, 19, 24]],
... ]
... )
>>> B.shape
(2, 3, 4)
>>> np.maximum(A, B)
array([[[ 0, 11, 10, 3], [ 4, 7, 6, 14], [ 8, 9, 10, 11]],
[[18, 13, 22, 15], [25, 17, 18, 24], [31, 21, 22, 24]]])
There’s nothing really new to see here yet. You’ve created two arrays of identical .shape
and applied the np.maximum()
operation to them. Notice that the handy .reshape()
method lets you build arrays of any shape. You can verify that the result is the elementbyelement maximum of the two inputs.
The fun starts when you experiment with comparing two arrays of different shapes. Try slicing B
to make a new array, C
:
>>> C = B[:, :1, :]
>>> C
array([[[7, 11, 10, 2]],
[[18, 5, 22, 7]]])
>>> C.shape
(2, 1, 4)
>>> np.maximum(A, C)
array([[[ 0, 11, 10, 3], [ 4, 11, 10, 7], [ 8, 11, 10, 11]],
[[18, 13, 22, 15], [18, 17, 22, 19], [20, 21, 22, 23]]]))
The two arrays, A
and C
, are compatible because the new array’s second dimension is 1
, and the other dimensions match.
Notice that the .shape
of the result of the maximum()
operation is the same as A.shape
. That’s because C
, the smaller array, is being broadcast over A
. The result of a broadcast operation between arrays will always have the .shape
of the larger array.
Now you can try an even more radical slicing of B
:
>>> D = B[:, :1, :1]
>>> D
array([[[7]],[[18]]])
>>> D.shape
(2, 1, 1)
>>> np.maximum(A, D)
array([[[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]],
[[18, 18, 18, 18], [18, 18, 18, 19], [20, 21, 22, 23]]])
Once again, the trailing dimensions of A
and D
are all either equal or 1
, so the arrays are compatible and the broadcast works. The result has the same .shape
as A
.
Perhaps the most extreme type of broadcasting occurs when one of the array parameters is passed as a scalar:
>>> np.maximum(A, 10)
array([[[10, 10, 10, 10], [10, 10, 10, 10], [10, 10, 10, 11]],
[[12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23]]])
NumPy automatically converts the second parameter, 10
, to an array([10])
with .shape
(1,)
, determines that this converted parameter is compatible with the first, and duly broadcasts it over the entire 2 × 3 × 4 array A
.
Finally, here’s a case where broadcasting fails:
>>> E = B[:, 1:, :]
>>> E
array([[[6, 7, 2, 14], [ 7, 4, 4, 1]],
[[25, 8, 15, 24], [31, 15, 19, 24]]])
>>> E.shape
(2, 2, 4)
>>> np.maximum(A, E)
Traceback (most recent call last):
...
ValueError: operands could not be broadcast together with shapes (2,3,4) (2,2,4)
If you refer back to the broadcasting rules above, you’ll see the problem: the second dimensions of A
and E
don’t match, and neither is equal to 1
, so the two arrays are incompatible.
You can read more about broadcasting in Look Ma, No for
Loops: Array Programming With NumPy.
There’s also a good description of the rules in the NumPy docs.
The broadcasting rules can be confusing, so it’s a good idea to play around with some toy arrays until you get a feel for how it works!
Conclusion
In this tutorial, you’ve explored the NumPy library’s max()
and maximum()
operations to find the maximum values within or across arrays.
Here’s what you’ve learned:
 Why NumPy has its own
max()
function, and how you can use it  How the
maximum()
function differs frommax()
, and when it’s needed  Which practical applications exist for each function
 How you can handle missing data so your results make sense
 How you can apply your knowledge to the complementary task of finding minimum values
Along the way, you’ve learned or refreshed your knowledge of the basics of NumPy syntax. NumPy is a hugely popular library because of its powerful support for array operations.
Now that you’ve mastered the details of NumPy’s max()
and maximum()
, you’re ready to use them in your applications, or continue learning about more of the hundreds of array functions supported by NumPy.
Free Bonus: Click here to get access to a free NumPy Resources Guide that points you to the best tutorials, videos, and books for improving your NumPy skills.
If you’re interested in using NumPy for data science, then you’ll also want to investigate pandas, a very popular datascience library built on top of NumPy. You can learn about it in The Pandas DataFrame: Make Working With Data Delightful. And if you want to produce compelling images from data, take a look at Python Plotting With Matplotlib (Guide).
The applications of NumPy are limitless. Wherever your NumPy adventure takes you next, go forth and matrixmultiply!