NumPy's max() and maximum(): Find Extreme Values in Arrays

NumPy's max() and maximum(): Find Extreme Values in Arrays

by Charles de Villiers basics data-science numpy

The NumPy library supports expressive, efficient numerical programming in Python. Finding extreme values is a very common requirement in data analysis. The NumPy max() and maximum() functions are two examples of how NumPy lets you combine the coding comfort offered by Python with the runtime efficiency you’d expect from C.

In this tutorial, you’ll learn how to:

  • Use the NumPy max() function
  • Use the NumPy maximum() function and understand why it’s different from max()
  • Solve practical problems with these functions
  • Handle missing values in your data
  • Apply the same concepts to finding minimum values

This tutorial includes a very short introduction to NumPy, so even if you’ve never used NumPy before, you should be able to jump right in. With the background provided here, you’ll be ready to continue exploring the wealth of functionality to be found in the NumPy library.

NumPy: Numerical Python

NumPy is short for Numerical Python. It’s an open source Python library that enables a wide range of applications in the fields of science, statistics, and data analytics through its support of fast, parallelized computations on multidimensional arrays of numbers. Many of the most popular numerical packages use NumPy as their base library.

Introducing NumPy

The NumPy library is built around a class named np.ndarray and a set of methods and functions that leverage Python syntax for defining and manipulating arrays of any shape or size.

NumPy’s core code for array manipulation is written in C. You can use functions and methods directly on an ndarray as NumPy’s C-based code efficiently loops over all the array elements in the background. NumPy’s high-level syntax means that you can simply and elegantly express complex programs and execute them at high speeds.

You can use a regular Python list to represent an array. However, NumPy arrays are far more efficient than lists, and they’re supported by a huge library of methods and functions. These include mathematical and logical operations, sorting, Fourier transforms, linear algebra, array reshaping, and much more.

Today, NumPy is in widespread use in fields as diverse as astronomy, quantum computing, bioinformatics, and all kinds of engineering.

NumPy is used under the hood as the numerical engine for many other libraries, such as pandas and SciPy. It also integrates easily with visualization libraries like Matplotlib and seaborn.

NumPy is easy to install with your package manager, for example pip or conda. For detailed instructions plus a more extensive introduction to NumPy and its capabilities, take a look at NumPy Tutorial: Your First Steps Into Data Science in Python or the NumPy Absolute Beginner’s Guide.

In this tutorial, you’ll learn how to take your very first steps in using NumPy. You’ll then explore NumPy’s max() and maximum() commands.

Creating and Using NumPy Arrays

You’ll start your investigation with a quick overview of NumPy arrays, the flexible data structure that gives NumPy its versatility and power.

The fundamental building block for any NumPy program is the ndarray. An ndarray is a Python object wrapping an array of numbers. It may, in principle, have any number of dimensions of any size. You can declare an array in several ways. The most straightforward method starts from a regular Python list or tuple:

Python
>>> import numpy as np
>>> A = np.array([3, 7, 2, 4, 5])
>>> A
array([3, 7, 2, 4, 5])

>>> B = np.array(((1, 4), (1, 5), (9, 2)))
>>> B
array([[1, 4],
       [1, 5],
       [9, 2]])

You’ve imported numpy under the alias np. This is a standard, widespread convention, so you’ll see it in most tutorials and programs. In this example, A is a one-dimensional array of numbers, while B is two-dimensional.

Notice that the np.array() factory function expects a Python list or tuple as its first parameter, so the list or tuple must therefore be wrapped in its own set of brackets or parentheses, respectively. Just throwing in an unwrapped bunch of numbers won’t work:

Python
>>> np.array(3, 7, 2, 4, 5)
Traceback (most recent call last):
...
TypeError: array() takes from 1 to 2 positional arguments but 5 were given

With this syntax, the interpreter sees five separate positional arguments, so it’s confused.

In your constructor for array B, the nested tuple argument needs an extra pair of parentheses to identify it, in its entirety, as the first parameter of np.array().

Addressing the array elements is straightforward. NumPy’s indices start at zero, like all Python sequences. By convention, a two-dimensional array is displayed so that the first index refers to the row, and the second index refers to the column. So A[0] is the first element of the one-dimensional array A, and B[2, 1] is the second element in the third row of the two-dimensional array B:

Python
>>> A[0]  # First element of A
3
>>> A[4]  # Fifth and last element of A
5
>>> A[-1]  # Last element of A, same as above
5
>>> A[5]  # This won't work because A doesn't have a sixth element
Traceback (most recent call last):
 ...
IndexError: index 5 is out of bounds for axis 0 with size 5
>>> B[2, 1]  # Second element in third row of B
2

So far, it seems that you’ve simply done a little extra typing to create arrays that look very similar to Python lists. But looks can be deceptive! Each ndarray object has approximately a hundred built-in properties and methods, and you can pass it to hundreds more functions in the NumPy library.

Almost anything that you can imagine doing to an array can be achieved in a few lines of code. In this tutorial, you’ll only be using a few functions, but you can explore the full power of arrays in the NumPy API documentation.

Creating Arrays in Other Ways

You’ve already created some NumPy arrays from Python sequences. But arrays can be created in many other ways. One of the simplest is np.arange(), which behaves rather like a souped-up version of Python’s built-in range() function:

Python
>>> np.arange(10)
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

>>> np.arange(2, 3, 0.1)
array([ 2., 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9])

In the first example above, you only specified the upper limit of 10. NumPy follows the standard Python convention for ranges and returns an ndarray containing the integers 0 to 9. The second example specifies a starting value of 2, an upper limit of 3, and an increment of 0.1. Unlike Python’s standard range() function, np.arange() can handle non-integer increments, and it automatically generates an array with np.float elements in this case.

NumPy’s arrays may also be read from disk, synthesized from data returned by APIs, or constructed from buffers or other arrays.

NumPy arrays can contain various types of integers, floating-point numbers, and complex numbers, but all the elements in an array must be of the same type.

You’ll start by using built-in ndarray properties to understand the arrays A and B:

Python
>>> A.size
5
>>> A.shape
(5,)

>>> B.size
6
>>> B.shape
(3, 2)

The .size attribute counts the elements in the array, and the .shape attribute contains an ordered tuple of dimensions, which NumPy calls axes. A is a one-dimensional array with one row containing five elements. Because A has only one axis, A.shape returns a one-element tuple.

By convention, in a two-dimensional matrix, axis 0 corresponds to the rows, and axis 1 corresponds to the columns, so the output of B.shape tells you that B has three rows and two columns.

Python strings and lists have a very handy feature known as slicing, which allows you to select sections of a string or list by specifying indices or ranges of indices. This idea generalizes very naturally to NumPy arrays. For example, you can extract just the parts you need from B, without affecting the original array:

Python
>>> B[2, 0]
9
>>> B[1, :]
array([1, 5])

In the first example above, you picked out the single element in row 2 and column 0 using B[2, 0]. The second example uses a slice to pick out a sub-array. Here, the index 1 in B[1, :] selects row 1 of B. The : in the second index position selects all the elements in that row. As a result, the expression B[1, :] returns an array with one row and two columns, containing all the elements from row 1 of B.

If you need to work with matrices having three or more dimensions, then NumPy has you covered. The syntax is flexible enough to cover any case. In this tutorial, though, you’ll only deal with one- and two-dimensional arrays.

If you have any questions as you play with NumPy, the official NumPy docs are thorough and well-written. You’ll find them indispensable if you do serious development using NumPy.

NumPy’s max(): The Maximum Element in an Array

In this section, you’ll become familiar with np.max(), a versatile tool for finding maximum values in various circumstances.

np.max() is the tool that you need for finding the maximum value or values in a single array. Ready to give it a go?

Using max()

To illustrate the max() function, you’re going to create an array named n_scores containing the test scores obtained by the students in Professor Newton’s linear algebra class.

Each row represents one student, and each column contains the scores on a particular test. So column 0 contains all the student scores for the first test, column 1 contains the scores for the second test, and so on. Here’s the n_scores array:

Python
>>> import numpy as np
>>> n_scores = np.array([
...        [63, 72, 75, 51, 83],
...        [44, 53, 57, 56, 48],
...        [71, 77, 82, 91, 76],
...        [67, 56, 82, 33, 74],
...        [64, 76, 72, 63, 76],
...        [47, 56, 49, 53, 42],
...        [91, 93, 90, 88, 96],
...        [61, 56, 77, 74, 74],
... ])

You can copy and paste this code into your Python console if you want to follow along. To simplify the formatting before copying, click >>> at the top right of the code block. You can do the same with any of the Python code in the examples. Once you’ve done that, the n_scores array is in memory. You can ask the interpreter for some of its attributes:

Python
>>> n_scores.size
40
>>> n_scores.shape
(8, 5)

The .shape and .size attributes, as above, confirm that you have 8 rows representing students and 5 columns representing tests, for a total of 40 test scores.

Suppose now that you want to find the top score achieved by any student on any test. For Professor Newton’s little linear algebra class, you could find the top score fairly quickly just by examining the data. But there’s a quicker method that’ll show its worth when you’re dealing with much larger datasets, containing perhaps thousands of rows and columns.

Try using the array’s .max() method:

Python
>>> n_scores.max()
96

The .max() method has scanned the whole array and returned the largest element. Using this method is exactly equivalent to calling np.max(n_scores).

But perhaps you want some more detailed information. What was the top score for each test? Here you can use the axis parameter:

Python
>>> n_scores.max(axis=0)
array([91, 93, 90, 91, 96])

The new parameter axis=0 tells NumPy to find the largest value out of all the rows. Since n_scores has five columns, NumPy does this for each column independently. This produces five numbers, each of which is the maximum value in that column. The axis parameter uses the standard convention for indexing dimensions. So axis=0 refers to the rows of an array, and axis=1 refers to the columns.

The top score for each student is just as easy to find:

Python
>>> n_scores.max(axis=1)
array([83, 57, 91, 82, 76, 56, 96, 77])

This time, NumPy has returned an array with eight elements, one per student. The n_scores array contains one row per student. The parameter axis=1 told NumPy to find the maximum value for each student, across the columns. Therefore, each element of the output contains the highest score attained by the corresponding student.

Perhaps you want the top scores per student, but you’ve decided to exclude the first and last tests. Slicing does the trick:

Python
>>> filtered_scores = n_scores[:, 1:-1]
>>> filtered_scores.shape
(8, 3)

>>> filtered_scores
array([72, 75, 51],
      [53, 57, 56],
      [77, 82, 91],
      [56, 82, 33],
      [76, 72, 63],
      [56, 49, 53],
      [93, 90, 88],
      [56, 77, 74]])

>>> filtered_scores.max(axis=1)
array([75, 57, 91, 82, 76, 56, 93, 77])

You can understand the slice notation n_scores[:, 1:-1] as follows. The first index range, represented by the lone :, selects all the rows in the slice. The second index range after the comma, 1:-1, tells NumPy to take the columns, starting at column 1 and ending 1 column before the last. The result of the slice is stored in a new array named filtered_scores.

With a bit of practice, you’ll learn to do array slicing on the fly, so you won’t need to create the intermediate array filtered_scores explicitly:

Python
>>> n_scores[:, 1:-1].max(axis=1)
array([75, 57, 91, 82, 76, 56, 93, 77])

Here you’ve performed the slice and the method call in a single line, but the result is the same. NumPy returns the per-student set of maximum n_scores for the restricted set of tests.

Handling Missing Values in np.max()

So now you know how to find maximum values in any completely filled array. But what happens when a few array values are missing? This is pretty common with real-world data.

To illustrate, you’ll create a small array containing a week’s worth of daily temperature readings, in Celsius, from a digital thermometer, starting on Monday:

Python
>>> temperatures_week_1 = np.array([7.1, 7.7, 8.1, 8.0, 9.2, np.nan, 8.4])
>>> temperatures_week_1.size
 7

It seems the thermometer had a malfunction on Saturday, and the corresponding temperature value is missing, a situation indicated by the np.nan value. This is the special value Not a Number, which is commonly used to mark missing values in real-world data applications.

So far, so good. But a problem arises if you innocently try to apply .max() to this array:

Python
>>> temperatures_week_1.max()
nan

Since np.nan reports a missing value, NumPy’s default behavior is to flag this by reporting that the maximum, too, is unknown. For some applications, this makes perfect sense. But for your application, perhaps you’d find it more useful to ignore the Saturday problem and get a maximum value from the remaining, valid readings. NumPy has provided the np.nanmax() function to take care of such situations:

Python
>>> np.nanmax(temperatures_week_1)
9.2

This function ignores any nan values and returns the largest numerical value, as expected. Notice that np.nanmax() is a function in the NumPy library, not a method of the ndarray object.

You’ve now seen the most common examples of NumPy’s maximum-finding capabilities for single arrays. But there are a few more NumPy functions related to maximum values that are worth knowing about.

For example, instead the maximum values in an array, you might want the indices of the maximum values. Let’s say you want to use your n_scores array to identify the student who did best on each test. The .argmax() method is your friend here:

Python
>>> n_scores.argmax(axis=0)
array([6, 6, 6, 2, 6])

It appears that student 6 obtained the top score on every test but one. Student 2 did best on the fourth test.

You’ll recall that you can also apply np.max() as a function of the NumPy package, rather than as a method of a NumPy array. In this case, the array must be supplied as the first argument of the function. For historical reasons, the package-level function np.max() has an alias, np.amax(), which is identical in every respect apart from the name:

Python
>>> n_scores.max(axis=1)
array([83, 57, 91, 82, 76, 56, 96, 77])

>>> np.max(n_scores, axis=1)
array([83, 57, 91, 82, 76, 56, 96, 77])

>>> np.amax(n_scores, axis=1)
array([83, 57, 91, 82, 76, 56, 96, 77])

In the code above, you’ve called .max() as a method of the n_scores object, and as a stand-alone library function with n_scores as its first parameter. You’ve also called the alias np.amax() in the same way. All three calls produce exactly the same results.

Now you’ve seen how to use np.max(), np.amax(), or .max() to find maximum values for an array along various axes. You’ve also used np.nanmax() to find the maximum values while ignoring nan values, as well as np.argmax() or .argmax() to find the indices of the maximum values.

You won’t be surprised to learn that NumPy has an equivalent set of minimum functions: np.min(), np.amin(), .min(), np.nanmin(), np.argmin(), and .argmin(). You won’t deal with those here, but they behave exactly like their maximum cousins.

NumPy’s maximum(): Maximum Elements Across Arrays

Another common task in data science involves comparing two similar arrays. NumPy’s maximum() function is the tool of choice for finding maximum values across arrays. Since maximum() always involves two input arrays, there’s no corresponding method. The np.maximum() function expects the input arrays as its first two parameters.

Using np.maximum()

Continuing with the previous example involving class scores, suppose that Professor Newton’s colleague—and archrival—Professor Leibniz is also running a linear algebra class with eight students. Construct a new array with the values for Leibniz’s class:

Python
>>> l_scores = np.array([
...         [87, 73, 71, 59, 67],
...         [60, 53, 82, 80, 58],
...         [92, 85, 60, 79, 77],
...         [67, 79, 71, 69, 87],
...         [86, 91, 92, 73, 61],
...         [70, 66, 60, 79, 57],
...         [83, 51, 64, 63, 58],
...         [89, 51, 72, 56, 49],
... ])

>>> l_scores.shape
(8, 5)

The new array, l_scores, has the same shape as n_scores.

You’d like to compare the two classes, student by student and test by test, to find the higher score in each case. NumPy has a function, np.maximum(), specifically designed for comparing two arrays in an element-by-element manner. Check it out in action:

Python
>>> np.maximum(n_scores, l_scores)
array([[87, 73, 75, 59, 83],
       [60, 53, 82, 80, 58],
       [92, 85, 82, 91, 77],
       [67, 79, 82, 69, 87],
       [86, 91, 92, 73, 76],
       [70, 66, 60, 79, 57],
       [91, 93, 90, 88, 96],
       [89, 56, 77, 74, 74]])

If you visually check the arrays n_scores and l_scores, then you’ll see that np.maximum() has indeed picked out the higher of the two scores for each [row, column] pair of indices.

What if you only want to compare the best test results in each class? You can combine np.max() and np.maximum() to get that effect:

Python
>>> best_n = n_scores.max(axis=0)
>>> best_n
array([91, 93, 90, 91, 96])

>>> best_l = l_scores.max(axis=0)
>>> best_l
array([92, 91, 92, 80, 87])

>>> np.maximum(best_n, best_l)
array([92, 93, 92, 91, 96])

As before, each call to .max() returns an array of maximum scores for all the students in the relevant class, one element for each test. But this time, you’re feeding those returned arrays into the maximum() function, which compares the two arrays and returns the higher score for each test across the arrays.

You can combine those operations into one by dispensing with the intermediate arrays, best_n and best_l:

Python
>>> np.maximum(n_scores.max(axis=0), l_scores.max(axis=0))
array([91, 93, 90, 91, 96])

This gives the same result as before, but with less typing. You can choose whichever method you prefer.

Handling Missing Values in np.maximum()

Remember the temperatures_week_1 array from an earlier example? If you use a second week’s temperature records with the maximum() function, you may spot a familiar problem.

First, you’ll create a new array to hold the new temperatures:

Python
>>> temperatures_week_2 = np.array(
...     [7.3, 7.9, np.nan, 8.1, np.nan, np.nan, 10.2]
... )

There are missing values in the temperatures_week_2 data, too. Now see what happens if you apply the np.maximum function to these two temperature arrays:

Python
>>> np.maximum(temperatures_week_1, temperatures_week_2)
array([ 7.3,  7.9,  nan,  8.1,  nan,  nan, 10.2])

All the nan values in both arrays have popped up as missing values in the output. There’s a good reason for NumPy’s approach to propagating nan. Often it’s important for the integrity of your results that you keep track of the missing values, rather than brushing them under the rug. But here, you just want to get the best view of the weekly maximum values. The solution, in this case, is another NumPy package function, np.fmax():

Python
>>> np.fmax(temperatures_week_1, temperatures_week_2)
array([ 7.3,  7.9,  8.1,  8.1,  9.2,  nan, 10.2])

Now, two of the missing values have simply been ignored, and the remaining floating-point value at that index has been taken as the maximum. But the Saturday temperature can’t be fixed in that way, because both source values are missing. Since there’s no reasonable value to insert here, np.fmax() just leaves it as a nan.

Just as np.max() and np.nanmax() have the parallel minimum functions np.min() and np.nanmin(), so too do np.maximum() and np.fmax() have corresponding functions, np.minimum() and np.fmin(), that mirror their functionality for minimum values.

Advanced Usage

You’ve now seen examples of all the basic use cases for NumPy’s max() and maximum(), plus a few related functions. Now you’ll investigate some of the more obscure optional parameters to these functions and find out when they can be useful.

Reusing Memory

When you call a function in Python, a value or object is returned. You can use that result immediately by printing it or writing it to disk, or by feeding it directly into another function as an input parameter. You can also save it to a new variable for future reference.

If you call the function in the Python REPL but don’t use it in one of those ways, then the REPL prints out the return value on the console so that you’re aware that something has been returned. All of this is standard Python stuff, and not specific to NumPy.

NumPy’s array functions are designed to handle huge inputs, and they often produce huge outputs. If you call such a function many hundreds or thousands of times, then you’ll be allocating very large amounts of memory. This can slow your program down and, in an extreme case, might even cause a memory or stack overflow.

This problem can be avoided by using the out parameter, which is available for both np.max() and np.maximum(), as well as for many other NumPy functions. The idea is to pre-allocate a suitable array to hold the function result, and keep reusing that same chunk of memory in subsequent calls.

You can revisit the temperature problem to create an example of using the out parameter with the np.max() function. You’ll also use the dtype parameter to control the type of the returned array:

Python
>>> temperature_buffer = np.empty(7, dtype=np.float32)
>>> temperature_buffer.shape
(7,)

>>> np.maximum(temperatures_week_1, temperatures_week_2, out=temperature_buffer)
array([ 7.3,  7.9,  nan,  8.1,  nan,  nan, 10.2], dtype=float32)

The initial values in temperature_buffer don’t matter, since they’ll be overwritten. But the array’s shape is important in that it must match the output shape. The displayed result looks like the output that you received from the original np.maximum() example. So what’s changed? The difference is that you now have the same data stored in temperature_buffer:

Python
>>> temperature_buffer
array([ 7.3,  7.9,  nan,  8.1,  nan,  nan, 10.2], dtype=float32)

The np.maximum() return value has been stored in the temperature_buffer variable, which you previously created with the right shape to accept that return value. Since you also specified dtype=np.float32 when you declared this buffer, NumPy will do its best to convert the output data to that type.

Remember to use the buffer contents before they’re overwritten by the next call to this function.

Filtering Arrays

Another parameter that’s occasionally useful is where. This applies a filter to the input array or arrays, so that only those values for which the where condition is True will be included in the comparison. The other values will be ignored, and the corresponding elements of the output array will be left unaltered. In most cases, this will leave them holding arbitrary values.

For the sake of the example, suppose you’ve decided, for whatever reason, to ignore all scores less than 60 for calculating the per-student maximum values in Professor Newton’s class. Your first attempt might go like this:

Python
>>> n_scores
array([[63, 72, 75, 51, 83],
       [44, 53, 57, 56, 48],
       [71, 77, 82, 91, 76],
       [67, 56, 82, 33, 74],
       [64, 76, 72, 63, 76],
       [47, 56, 49, 53, 42],
       [91, 93, 90, 88, 96],
       [61, 56, 77, 74, 74]])

>>> n_scores.max(axis=1, where=(n_scores >= 60))
ValueError: reduction operation 'maximum' does not have an identity,
            so to use a where mask one has to specify 'initial'

The problem here is that NumPy doesn’t know what to do with the students in rows 1 and 5, who didn’t achieve a single test score of 60 or better. The solution is to provide an initial parameter:

Python
>>> n_scores.max(axis=1, where=(n_scores >= 60), initial=60)
array([83, 60, 91, 82, 76, 60, 96, 77])

With the two new parameters, where and initial, n_scores.max() considers only the elements greater than or equal to 60. For the rows where there is no such element, it returns the initial value of 60 instead. So the lucky students at indices 1 and 5 got their best score boosted to 60 by this operation! The original n_scores array is untouched.

Comparing Differently Shaped Arrays With Broadcasting

You’ve learned how to use np.maximum() to compare arrays with identical shapes. But it turns out that this function, along with many others in the NumPy library, is much more versatile than that. NumPy has a concept called broadcasting that provides a very useful extension to the behavior of most functions involving two arrays, including np.maximum().

Whenever you call a NumPy function that operates on two arrays, A and B, it checks their .shape properties to see if they’re compatible. If they have exactly the same .shape, then NumPy just matches the arrays element by element, pairing up the element at A[i, j] with the element at B[i, j]. np.maximum() works like this too.

Broadcasting enables NumPy to operate on two arrays with different shapes, provided there’s still a sensible way to match up pairs of elements. The simplest example of this is to broadcast a single element over an entire array. You’ll explore broadcasting by continuing the example of Professor Newton and his linear algebra class. Suppose he asks you to ensure that none of his students receives a score below 75. Here’s how you might do it:

Python
>>> np.maximum(n_scores, 75)
array([[75, 75, 75, 75, 83],
       [75, 75, 75, 75, 75],
       [75, 77, 82, 91, 76],
       [75, 75, 82, 75, 75],
       [75, 76, 75, 75, 76],
       [75, 75, 75, 75, 75],
       [91, 93, 90, 88, 96],
       [75, 75, 77, 75, 75]])

You’ve applied the np.maximum() function to two arguments: n_scores, whose .shape is (8, 5), and the single scalar parameter 75. You can think of this second parameter as a 1 × 1 array that’ll be stretched inside the function to cover eight rows and five columns. The stretched array can then be compared element by element with n_scores, and the pairwise maximum can be returned for each element of the result.

The result is the same as if you had compared n_scores with an array of its own shape, (8, 5), but with the value 75 in each element. This stretching is just conceptual—NumPy is smart enough to do all this without actually creating the stretched array. So you get the notational convenience of this example without compromising efficiency.

You can do much more with broadcasting. Professor Leibniz has noticed Newton’s skulduggery with his best_n_scores array, and decides to engage in a little data manipulation of her own.

Leibniz’s plan is to artificially boost all her students’ scores to be at least equal to the average score for a particular test. This will have the effect of increasing all the below-average scores—and thus produce some quite misleading results! How can you help the professor achieve her somewhat nefarious ends?

Your first step is to use the array’s .mean() method to create a one-dimensional array of means per test. Then you can use np.maximum() and broadcast this array over the entire l_scores matrix:

Python
>>> mean_l_scores = l_scores.mean(axis=0, dtype=np.integer)
>>> mean_l_scores
array([79, 68, 71, 69, 64])

>>> np.maximum(mean_l_scores, l_scores)
array([[87, 73, 71, 69, 67],
       [79, 68, 82, 80, 64],
       [92, 85, 71, 79, 77],
       [79, 79, 71, 69, 87],
       [86, 91, 92, 73, 64],
       [79, 68, 71, 79, 64],
       [83, 68, 71, 69, 64],
       [89, 68, 72, 69, 64]])

The broadcasting happens in the highlighted function call. The one-dimensional mean_l_scores array has been conceptually stretched to match the two-dimensional l_scores array. The output array has the same .shape as the larger of the two input arrays, l_scores.

Following Broadcasting Rules

So, what are the rules for broadcasting? A great many NumPy functions accept two array arguments. np.maximum() is just one of these. Arrays that can be used together in such functions are termed compatible, and their compatibility depends on the number and size of their dimensions—that is, on their .shape.

The simplest case occurs if the two arrays, say A and B, have identical shapes. Each element in A is matched, for the function’s purposes, to the element at the same index address in B.

Broadcasting rules get more interesting when A and B have different shapes. The elements of compatible arrays must somehow be unambiguously paired together so that each element of the larger array can interact with an element of the smaller array. The output array will have the .shape of the larger of the two input arrays. So compatible arrays must follow these rules:

  1. If one array has fewer dimensions than the other, only the trailing dimensions are matched for compatibility. The trailing dimensions are those that are present in the .shape of both arrays, counting from the right. So if A.shape is (99, 99, 2, 3) and B.shape is (2, 3), then A and B are compatible because (2, 3) are the trailing dimensions of each. You can completely ignore the two leftmost dimensions of A.

  2. Even if the trailing dimensions aren’t equal, the arrays are still compatible if one of those dimensions is equal to 1 in either array. So if A.shape is (99, 99, 2, 3) as before and B.shape is (1, 99, 1, 3) or (1, 3) or (1, 2, 1) or (1, 1), then B is still compatible with A in each case.

You can get a feel for the broadcasting rules by playing around in the Python REPL. You’ll be creating some toy arrays to illustrate how broadcasting works and how the output array is generated:

Python
>>> A = np.arange(24).reshape(2, 3, 4)
>>> A
array([[[ 0,  1,  2,  3], [ 4,  5,  6,  7], [ 8,  9, 10, 11]],
       [[12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23]]])

>>> A.shape
(2, 3, 4)

>>> B = np.array(
...     [
...         [[-7, 11, 10,  2], [-6,  7, -2, 14], [ 7,  4,  4, -1]],
...         [[18,  5, 22,  7], [25,  8, 15, 24], [31, 15, 19, 24]],
...     ]
... )

>>> B.shape
(2, 3, 4)

>>> np.maximum(A, B)
array([[[ 0, 11, 10,  3], [ 4,  7,  6, 14], [ 8,  9, 10, 11]],
       [[18, 13, 22, 15], [25, 17, 18, 24], [31, 21, 22, 24]]])

There’s nothing really new to see here yet. You’ve created two arrays of identical .shape and applied the np.maximum() operation to them. Notice that the handy .reshape() method lets you build arrays of any shape. You can verify that the result is the element-by-element maximum of the two inputs.

The fun starts when you experiment with comparing two arrays of different shapes. Try slicing B to make a new array, C:

Python
>>> C = B[:, :1, :]
>>> C
array([[[-7, 11, 10,  2]],
       [[18,  5, 22,  7]]])

>>> C.shape
(2, 1, 4)

>>> np.maximum(A, C)
array([[[ 0, 11, 10,  3], [ 4, 11, 10,  7], [ 8, 11, 10, 11]],
       [[18, 13, 22, 15], [18, 17, 22, 19], [20, 21, 22, 23]]]))

The two arrays, A and C, are compatible because the new array’s second dimension is 1, and the other dimensions match. Notice that the .shape of the result of the maximum() operation is the same as A.shape. That’s because C, the smaller array, is being broadcast over A. The result of a broadcast operation between arrays will always have the .shape of the larger array.

Now you can try an even more radical slicing of B:

Python
>>> D = B[:, :1, :1]
>>> D
array([[[-7]],[[18]]])

>>> D.shape
(2, 1, 1)

>>> np.maximum(A, D)
array([[[ 0,  1,  2,  3], [ 4,  5,  6,  7], [ 8,  9, 10, 11]],
       [[18, 18, 18, 18], [18, 18, 18, 19], [20, 21, 22, 23]]])

Once again, the trailing dimensions of A and D are all either equal or 1, so the arrays are compatible and the broadcast works. The result has the same .shape as A.

Perhaps the most extreme type of broadcasting occurs when one of the array parameters is passed as a scalar:

Python
>>> np.maximum(A, 10)
array([[[10, 10, 10, 10], [10, 10, 10, 10], [10, 10, 10, 11]],
       [[12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23]]])

NumPy automatically converts the second parameter, 10, to an array([10]) with .shape (1,), determines that this converted parameter is compatible with the first, and duly broadcasts it over the entire 2 × 3 × 4 array A.

Finally, here’s a case where broadcasting fails:

Python
>>> E = B[:, 1:, :]
>>> E
array([[[-6,  7, -2, 14], [ 7,  4,  4, -1]],
       [[25,  8, 15, 24], [31, 15, 19, 24]]])

>>> E.shape
(2, 2, 4)

>>> np.maximum(A, E)
Traceback (most recent call last):
...
ValueError: operands could not be broadcast together with shapes (2,3,4) (2,2,4)

If you refer back to the broadcasting rules above, you’ll see the problem: the second dimensions of A and E don’t match, and neither is equal to 1, so the two arrays are incompatible.

You can read more about broadcasting in Look Ma, No for Loops: Array Programming With NumPy. There’s also a good description of the rules in the NumPy docs.

The broadcasting rules can be confusing, so it’s a good idea to play around with some toy arrays until you get a feel for how it works!

Conclusion

In this tutorial, you’ve explored the NumPy library’s max() and maximum() operations to find the maximum values within or across arrays.

Here’s what you’ve learned:

  • Why NumPy has its own max() function, and how you can use it
  • How the maximum() function differs from max(), and when it’s needed
  • Which practical applications exist for each function
  • How you can handle missing data so your results make sense
  • How you can apply your knowledge to the complementary task of finding minimum values

Along the way, you’ve learned or refreshed your knowledge of the basics of NumPy syntax. NumPy is a hugely popular library because of its powerful support for array operations.

Now that you’ve mastered the details of NumPy’s max() and maximum(), you’re ready to use them in your applications, or continue learning about more of the hundreds of array functions supported by NumPy.

If you’re interested in using NumPy for data science, then you’ll also want to investigate pandas, a very popular data-science library built on top of NumPy. You can learn about it in The Pandas DataFrame: Make Working With Data Delightful. And if you want to produce compelling images from data, take a look at Python Plotting With Matplotlib (Guide).

The applications of NumPy are limitless. Wherever your NumPy adventure takes you next, go forth and matrix-multiply!

🐍 Python Tricks 💌

Get a short & sweet Python Trick delivered to your inbox every couple of days. No spam ever. Unsubscribe any time. Curated by the Real Python team.

Python Tricks Dictionary Merge

About Charles de Villiers

Charles teaches Physics and Math. When he isn't teaching or coding, he spends way too much time playing online chess.

» More about Charles

Each tutorial at Real Python is created by a team of developers so that it meets our high quality standards. The team members who worked on this tutorial are:

Master Real-World Python Skills With Unlimited Access to Real Python

Locked learning resources

Join us and get access to thousands of tutorials, hands-on video courses, and a community of expert Pythonistas:

Level Up Your Python Skills »

Master Real-World Python Skills
With Unlimited Access to Real Python

Locked learning resources

Join us and get access to thousands of tutorials, hands-on video courses, and a community of expert Pythonistas:

Level Up Your Python Skills »

What Do You Think?

Rate this article:

What’s your #1 takeaway or favorite thing you learned? How are you going to put your newfound skills to use? Leave a comment below and let us know.

Commenting Tips: The most useful comments are those written with the goal of learning from or helping out other students. Get tips for asking good questions and get answers to common questions in our support portal.


Looking for a real-time conversation? Visit the Real Python Community Chat or join the next “Office Hours” Live Q&A Session. Happy Pythoning!

Keep Learning

Related Topics: basics data-science numpy

Related Tutorials: