The NumPy library supports expressive, efficient numerical programming in Python. Finding extreme values is a very common requirement in data analysis. The NumPy
maximum() functions are two examples of how NumPy lets you combine the coding comfort offered by Python with the runtime efficiency you’d expect from C.
In this tutorial, you’ll learn how to:
- Use the NumPy
- Use the NumPy
maximum()function and understand why it’s different from
- Solve practical problems with these functions
- Handle missing values in your data
- Apply the same concepts to finding minimum values
This tutorial includes a very short introduction to NumPy, so even if you’ve never used NumPy before, you should be able to jump right in. With the background provided here, you’ll be ready to continue exploring the wealth of functionality to be found in the NumPy library.
Free Bonus: Click here to get access to a free NumPy Resources Guide that points you to the best tutorials, videos, and books for improving your NumPy skills.
NumPy is short for Numerical Python. It’s an open source Python library that enables a wide range of applications in the fields of science, statistics, and data analytics through its support of fast, parallelized computations on multidimensional arrays of numbers. Many of the most popular numerical packages use NumPy as their base library.
The NumPy library is built around a class named
np.ndarray and a set of methods and functions that leverage Python syntax for defining and manipulating arrays of any shape or size.
NumPy’s core code for array manipulation is written in C. You can use functions and methods directly on an
ndarray as NumPy’s C-based code efficiently loops over all the array elements in the background. NumPy’s high-level syntax means that you can simply and elegantly express complex programs and execute them at high speeds.
You can use a regular Python
list to represent an array. However, NumPy arrays are far more efficient than lists, and they’re supported by a huge library of methods and functions. These include mathematical and logical operations, sorting, Fourier transforms, linear algebra, array reshaping, and much more.
NumPy is easy to install with your package manager, for example
conda. For detailed instructions plus a more extensive introduction to NumPy and its capabilities, take a look at NumPy Tutorial: Your First Steps Into Data Science in Python or the NumPy Absolute Beginner’s Guide.
In this tutorial, you’ll learn how to take your very first steps in using NumPy. You’ll then explore NumPy’s
You’ll start your investigation with a quick overview of NumPy arrays, the flexible data structure that gives NumPy its versatility and power.
The fundamental building block for any NumPy program is the
ndarray is a Python object wrapping an array of numbers. It may, in principle, have any number of dimensions of any size. You can declare an array in several ways. The most straightforward method starts from a regular Python list or tuple:
>>> import numpy as np >>> A = np.array([3, 7, 2, 4, 5]) >>> A array([3, 7, 2, 4, 5]) >>> B = np.array(((1, 4), (1, 5), (9, 2))) >>> B array([[1, 4], [1, 5], [9, 2]])
numpy under the alias
np. This is a standard, widespread convention, so you’ll see it in most tutorials and programs.
In this example,
A is a one-dimensional array of numbers, while
B is two-dimensional.
Notice that the
np.array() factory function expects a Python list or tuple as its first parameter, so the list or tuple must therefore be wrapped in its own set of brackets or parentheses, respectively. Just throwing in an unwrapped bunch of numbers won’t work:
>>> np.array(3, 7, 2, 4, 5) Traceback (most recent call last): ... TypeError: array() takes from 1 to 2 positional arguments but 5 were given
With this syntax, the interpreter sees five separate positional arguments, so it’s confused.
In your constructor for array
B, the nested tuple argument needs an extra pair of parentheses to identify it, in its entirety, as the first parameter of
Addressing the array elements is straightforward. NumPy’s indices start at zero, like all Python sequences. By convention, a two-dimensional array is displayed so that the first index refers to the row, and the second index refers to the column. So
A is the first element of the one-dimensional array
B[2, 1] is the second element in the third row of the two-dimensional array
>>> A # First element of A 3 >>> A # Fifth and last element of A 5 >>> A[-1] # Last element of A, same as above 5 >>> A # This won't work because A doesn't have a sixth element Traceback (most recent call last): ... IndexError: index 5 is out of bounds for axis 0 with size 5 >>> B[2, 1] # Second element in third row of B 2
So far, it seems that you’ve simply done a little extra typing to create arrays that look very similar to Python lists. But looks can be deceptive! Each
ndarray object has approximately a hundred built-in properties and methods, and you can pass it to hundreds more functions in the NumPy library.
Almost anything that you can imagine doing to an array can be achieved in a few lines of code. In this tutorial, you’ll only be using a few functions, but you can explore the full power of arrays in the NumPy API documentation.
You’ve already created some NumPy arrays from Python sequences. But arrays can be created in many other ways. One of the simplest is
np.arange(), which behaves rather like a souped-up version of Python’s built-in
>>> np.arange(10) array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >>> np.arange(2, 3, 0.1) array([ 2., 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9])
In the first example above, you only specified the upper limit of
10. NumPy follows the standard Python convention for ranges and returns an
ndarray containing the integers
9. The second example specifies a starting value of
2, an upper limit of
3, and an increment of
0.1. Unlike Python’s standard
np.arange() can handle non-integer increments, and it automatically generates an array with
np.float elements in this case.
NumPy arrays can contain various types of integers, floating-point numbers, and complex numbers, but all the elements in an array must be of the same type.
You’ll start by using built-in
ndarray properties to understand the arrays
>>> A.size 5 >>> A.shape (5,) >>> B.size 6 >>> B.shape (3, 2)
.size attribute counts the elements in the array, and the
.shape attribute contains an ordered tuple of dimensions, which NumPy calls axes.
A is a one-dimensional array with one row containing five elements. Because
A has only one axis,
A.shape returns a one-element tuple.
By convention, in a two-dimensional matrix, axis
0 corresponds to the rows, and axis
1 corresponds to the columns, so the output of
B.shape tells you that
B has three rows and two columns.
Python strings and lists have a very handy feature known as slicing, which allows you to select sections of a string or list by specifying indices or ranges of indices. This idea generalizes very naturally to NumPy arrays.
For example, you can extract just the parts you need from
B, without affecting the original array:
>>> B[2, 0] 9 >>> B[1, :] array([1, 5])
In the first example above, you picked out the single element in row
2 and column
B[2, 0]. The second example uses a slice to pick out a sub-array. Here, the index
B[1, :] selects row
: in the second index position selects all the elements in that row. As a result, the expression
B[1, :] returns an array with one row and two columns, containing all the elements from row
If you need to work with matrices having three or more dimensions, then NumPy has you covered. The syntax is flexible enough to cover any case. In this tutorial, though, you’ll only deal with one- and two-dimensional arrays.
If you have any questions as you play with NumPy, the official NumPy docs are thorough and well-written. You’ll find them indispensable if you do serious development using NumPy.
In this section, you’ll become familiar with
np.max(), a versatile tool for finding maximum values in various circumstances.
Note: NumPy has both a package-level function and an
ndarray method named
max(). They work in the same way, though the package function
np.max() requires the target array name as its first parameter. In what follows, you’ll be using the function and the method interchangeably.
Python also has a built-in
max() function that can calculate maximum values of iterables. You can use this built-in
max() to find the maximum element in a one-dimensional NumPy array, but it has no support for arrays with more dimensions. When dealing with NumPy arrays, you should stick to NumPy’s own maximum functions and methods.
For the rest of this tutorial,
max() will always refer to the NumPy version.
np.max() is the tool that you need for finding the maximum value or values in a single array. Ready to give it a go?
To illustrate the
max() function, you’re going to create an array named
n_scores containing the test scores obtained by the students in Professor Newton’s linear algebra class.
Each row represents one student, and each column contains the scores on a particular test. So column
0 contains all the student scores for the first test, column
1 contains the scores for the second test, and so on. Here’s the
>>> import numpy as np >>> n_scores = np.array([ ... [63, 72, 75, 51, 83], ... [44, 53, 57, 56, 48], ... [71, 77, 82, 91, 76], ... [67, 56, 82, 33, 74], ... [64, 76, 72, 63, 76], ... [47, 56, 49, 53, 42], ... [91, 93, 90, 88, 96], ... [61, 56, 77, 74, 74], ... ])
You can copy and paste this code into your Python console if you want to follow along. To simplify the formatting before copying, click
>>> at the top right of the code block. You can do the same with any of the Python code in the examples.
Once you’ve done that, the
n_scores array is in memory. You can ask the interpreter for some of its attributes:
>>> n_scores.size 40 >>> n_scores.shape (8, 5)
.size attributes, as above, confirm that you have
8 rows representing students and
5 columns representing tests, for a total of
40 test scores.
Suppose now that you want to find the top score achieved by any student on any test. For Professor Newton’s little linear algebra class, you could find the top score fairly quickly just by examining the data. But there’s a quicker method that’ll show its worth when you’re dealing with much larger datasets, containing perhaps thousands of rows and columns.
Try using the array’s
>>> n_scores.max() 96
.max() method has scanned the whole array and returned the largest element. Using this method is exactly equivalent to calling
But perhaps you want some more detailed information. What was the top score for each test?
Here you can use the
>>> n_scores.max(axis=0) array([91, 93, 90, 91, 96])
The new parameter
axis=0 tells NumPy to find the largest value out of all the rows. Since
n_scores has five columns, NumPy does this for each column independently. This produces five numbers, each of which is the maximum value in that column.
axis parameter uses the standard convention for indexing dimensions. So
axis=0 refers to the rows of an array, and
axis=1 refers to the columns.
The top score for each student is just as easy to find:
>>> n_scores.max(axis=1) array([83, 57, 91, 82, 76, 56, 96, 77])
This time, NumPy has returned an array with eight elements, one per student. The
n_scores array contains one row per student. The parameter
axis=1 told NumPy to find the maximum value for each student, across the columns. Therefore, each element of the output contains the highest score attained by the corresponding student.
Perhaps you want the top scores per student, but you’ve decided to exclude the first and last tests. Slicing does the trick:
>>> filtered_scores = n_scores[:, 1:-1] >>> filtered_scores.shape (8, 3) >>> filtered_scores array([72, 75, 51], [53, 57, 56], [77, 82, 91], [56, 82, 33], [76, 72, 63], [56, 49, 53], [93, 90, 88], [56, 77, 74]]) >>> filtered_scores.max(axis=1) array([75, 57, 91, 82, 76, 56, 93, 77])
You can understand the slice notation
n_scores[:, 1:-1] as follows. The first index range, represented by the lone
:, selects all the rows in the slice.
The second index range after the comma,
1:-1, tells NumPy to take the columns, starting at column
1 and ending
1 column before the last. The result of the slice is stored in a new array named
With a bit of practice, you’ll learn to do array slicing on the fly, so you won’t need to create the intermediate array
>>> n_scores[:, 1:-1].max(axis=1) array([75, 57, 91, 82, 76, 56, 93, 77])
Here you’ve performed the slice and the method call in a single line, but the result is the same. NumPy returns the per-student set of maximum
n_scores for the restricted set of tests.
So now you know how to find maximum values in any completely filled array. But what happens when a few array values are missing? This is pretty common with real-world data.
To illustrate, you’ll create a small array containing a week’s worth of daily temperature readings, in Celsius, from a digital thermometer, starting on Monday:
>>> temperatures_week_1 = np.array([7.1, 7.7, 8.1, 8.0, 9.2, np.nan, 8.4]) >>> temperatures_week_1.size 7
It seems the thermometer had a malfunction on Saturday, and the corresponding temperature value is missing, a situation indicated by the
np.nan value. This is the special value Not a Number, which is commonly used to mark missing values in real-world data applications.
So far, so good. But a problem arises if you innocently try to apply
.max() to this array:
>>> temperatures_week_1.max() nan
np.nan reports a missing value, NumPy’s default behavior is to flag this by reporting that the maximum, too, is unknown. For some applications, this makes perfect sense. But for your application, perhaps you’d find it more useful to ignore the Saturday problem and get a maximum value from the remaining, valid readings. NumPy has provided the
np.nanmax() function to take care of such situations:
>>> np.nanmax(temperatures_week_1) 9.2
This function ignores any
nan values and returns the largest numerical value, as expected.
np.nanmax() is a function in the NumPy library, not a method of the
You’ve now seen the most common examples of NumPy’s maximum-finding capabilities for single arrays. But there are a few more NumPy functions related to maximum values that are worth knowing about.
For example, instead the maximum values in an array, you might want the indices of the maximum values. Let’s say you want to use your
n_scores array to identify the student who did best on each test. The
.argmax() method is your friend here:
>>> n_scores.argmax(axis=0) array([6, 6, 6, 2, 6])
It appears that student
6 obtained the top score on every test but one. Student
2 did best on the fourth test.
You’ll recall that you can also apply
np.max() as a function of the NumPy package, rather than as a method of a NumPy array. In this case, the array must be supplied as the first argument of the function.
For historical reasons, the package-level function
np.max() has an alias,
np.amax(), which is identical in every respect apart from the name:
>>> n_scores.max(axis=1) array([83, 57, 91, 82, 76, 56, 96, 77]) >>> np.max(n_scores, axis=1) array([83, 57, 91, 82, 76, 56, 96, 77]) >>> np.amax(n_scores, axis=1) array([83, 57, 91, 82, 76, 56, 96, 77])
In the code above, you’ve called
.max() as a method of the
n_scores object, and as a stand-alone library function with
n_scores as its first parameter. You’ve also called the alias
np.amax() in the same way. All three calls produce exactly the same results.
Now you’ve seen how to use
.max() to find maximum values for an array along various axes. You’ve also used
np.nanmax() to find the maximum values while ignoring
nan values, as well as
.argmax() to find the indices of the maximum values.
You won’t be surprised to learn that NumPy has an equivalent set of minimum functions:
.argmin(). You won’t deal with those here, but they behave exactly like their maximum cousins.
Another common task in data science involves comparing two similar arrays. NumPy’s
maximum() function is the tool of choice for finding maximum values across arrays. Since
maximum() always involves two input arrays, there’s no corresponding method. The
np.maximum() function expects the input arrays as its first two parameters.
Continuing with the previous example involving class scores, suppose that Professor Newton’s colleague—and archrival—Professor Leibniz is also running a linear algebra class with eight students. Construct a new array with the values for Leibniz’s class:
>>> l_scores = np.array([ ... [87, 73, 71, 59, 67], ... [60, 53, 82, 80, 58], ... [92, 85, 60, 79, 77], ... [67, 79, 71, 69, 87], ... [86, 91, 92, 73, 61], ... [70, 66, 60, 79, 57], ... [83, 51, 64, 63, 58], ... [89, 51, 72, 56, 49], ... ]) >>> l_scores.shape (8, 5)
The new array,
l_scores, has the same shape as
You’d like to compare the two classes, student by student and test by test, to find the higher score in each case.
NumPy has a function,
np.maximum(), specifically designed for comparing two arrays in an element-by-element manner.
Check it out in action:
>>> np.maximum(n_scores, l_scores) array([[87, 73, 75, 59, 83], [60, 53, 82, 80, 58], [92, 85, 82, 91, 77], [67, 79, 82, 69, 87], [86, 91, 92, 73, 76], [70, 66, 60, 79, 57], [91, 93, 90, 88, 96], [89, 56, 77, 74, 74]])
If you visually check the arrays
l_scores, then you’ll see that
np.maximum() has indeed picked out the higher of the two scores for each [row, column] pair of indices.
What if you only want to compare the best test results in each class? You can combine
np.maximum() to get that effect:
>>> best_n = n_scores.max(axis=0) >>> best_n array([91, 93, 90, 91, 96]) >>> best_l = l_scores.max(axis=0) >>> best_l array([92, 91, 92, 80, 87]) >>> np.maximum(best_n, best_l) array([92, 93, 92, 91, 96])
As before, each call to
.max() returns an array of maximum scores for all the students in the relevant class, one element for each test. But this time, you’re feeding those returned arrays into the
maximum() function, which compares the two arrays and returns the higher score for each test across the arrays.
You can combine those operations into one by dispensing with the intermediate arrays,
>>> np.maximum(n_scores.max(axis=0), l_scores.max(axis=0)) array([91, 93, 90, 91, 96])
This gives the same result as before, but with less typing. You can choose whichever method you prefer.
temperatures_week_1 array from an earlier example? If you use a second week’s temperature records with the
maximum() function, you may spot a familiar problem.
First, you’ll create a new array to hold the new temperatures:
>>> temperatures_week_2 = np.array( ... [7.3, 7.9, np.nan, 8.1, np.nan, np.nan, 10.2] ... )
There are missing values in the
temperatures_week_2 data, too. Now see what happens if you apply the
np.maximum function to these two temperature arrays:
>>> np.maximum(temperatures_week_1, temperatures_week_2) array([ 7.3, 7.9, nan, 8.1, nan, nan, 10.2])
nan values in both arrays have popped up as missing values in the output. There’s a good reason for NumPy’s approach to propagating
nan. Often it’s important for the integrity of your results that you keep track of the missing values, rather than brushing them under the rug. But here, you just want to get the best view of the weekly maximum values. The solution, in this case, is another NumPy package function,
>>> np.fmax(temperatures_week_1, temperatures_week_2) array([ 7.3, 7.9, 8.1, 8.1, 9.2, nan, 10.2])
Now, two of the missing values have simply been ignored, and the remaining floating-point value at that index has been taken as the maximum. But the Saturday temperature can’t be fixed in that way, because both source values are missing. Since there’s no reasonable value to insert here,
np.fmax() just leaves it as a
np.nanmax() have the parallel minimum functions
np.nanmin(), so too do
np.fmax() have corresponding functions,
np.fmin(), that mirror their functionality for minimum values.
You’ve now seen examples of all the basic use cases for NumPy’s
maximum(), plus a few related functions.
Now you’ll investigate some of the more obscure optional parameters to these functions and find out when they can be useful.
When you call a function in Python, a value or object is returned. You can use that result immediately by printing it or writing it to disk, or by feeding it directly into another function as an input parameter. You can also save it to a new variable for future reference.
If you call the function in the Python REPL but don’t use it in one of those ways, then the REPL prints out the return value on the console so that you’re aware that something has been returned. All of this is standard Python stuff, and not specific to NumPy.
NumPy’s array functions are designed to handle huge inputs, and they often produce huge outputs. If you call such a function many hundreds or thousands of times, then you’ll be allocating very large amounts of memory. This can slow your program down and, in an extreme case, might even cause a memory or stack overflow.
This problem can be avoided by using the
out parameter, which is available for both
np.maximum(), as well as for many other NumPy functions. The idea is to pre-allocate a suitable array to hold the function result, and keep reusing that same chunk of memory in subsequent calls.
You can revisit the temperature problem to create an example of using the
out parameter with the
np.max() function. You’ll also use the
dtype parameter to control the type of the returned array:
>>> temperature_buffer = np.empty(7, dtype=np.float32) >>> temperature_buffer.shape (7,) >>> np.maximum(temperatures_week_1, temperatures_week_2, out=temperature_buffer) array([ 7.3, 7.9, nan, 8.1, nan, nan, 10.2], dtype=float32)
The initial values in
temperature_buffer don’t matter, since they’ll be overwritten. But the array’s shape is important in that it must match the output shape. The displayed result looks like the output that you received from the original
np.maximum() example. So what’s changed? The difference is that you now have the same data stored in
>>> temperature_buffer array([ 7.3, 7.9, nan, 8.1, nan, nan, 10.2], dtype=float32)
np.maximum() return value has been stored in the
temperature_buffer variable, which you previously created with the right shape to accept that return value. Since you also specified
dtype=np.float32 when you declared this buffer, NumPy will do its best to convert the output data to that type.
Remember to use the buffer contents before they’re overwritten by the next call to this function.
Another parameter that’s occasionally useful is
where. This applies a filter to the input array or arrays, so that only those values for which the
where condition is
True will be included in the comparison. The other values will be ignored, and the corresponding elements of the output array will be left unaltered. In most cases, this will leave them holding arbitrary values.
For the sake of the example, suppose you’ve decided, for whatever reason, to ignore all scores less than
60 for calculating the per-student maximum values in Professor Newton’s class. Your first attempt might go like this:
>>> n_scores array([[63, 72, 75, 51, 83], [44, 53, 57, 56, 48], [71, 77, 82, 91, 76], [67, 56, 82, 33, 74], [64, 76, 72, 63, 76], [47, 56, 49, 53, 42], [91, 93, 90, 88, 96], [61, 56, 77, 74, 74]]) >>> n_scores.max(axis=1, where=(n_scores >= 60)) ValueError: reduction operation 'maximum' does not have an identity, so to use a where mask one has to specify 'initial'
The problem here is that NumPy doesn’t know what to do with the students in rows
5, who didn’t achieve a single test score of
60 or better. The solution is to provide an
>>> n_scores.max(axis=1, where=(n_scores >= 60), initial=60) array([83, 60, 91, 82, 76, 60, 96, 77])
With the two new parameters,
n_scores.max() considers only the elements greater than or equal to
60. For the rows where there is no such element, it returns the
initial value of
60 instead. So the lucky students at indices
5 got their best score boosted to
60 by this operation!
n_scores array is untouched.
You’ve learned how to use
np.maximum() to compare arrays with identical shapes. But it turns out that this function, along with many others in the NumPy library, is much more versatile than that. NumPy has a concept called broadcasting that provides a very useful extension to the behavior of most functions involving two arrays, including
Whenever you call a NumPy function that operates on two arrays,
B, it checks their
.shape properties to see if they’re compatible. If they have exactly the same
.shape, then NumPy just matches the arrays element by element, pairing up the element at
A[i, j] with the element at
np.maximum() works like this too.
Broadcasting enables NumPy to operate on two arrays with different shapes, provided there’s still a sensible way to match up pairs of elements. The simplest example of this is to broadcast a single element over an entire array. You’ll explore broadcasting by continuing the example of Professor Newton and his linear algebra class. Suppose he asks you to ensure that none of his students receives a score below
75. Here’s how you might do it:
>>> np.maximum(n_scores, 75) array([[75, 75, 75, 75, 83], [75, 75, 75, 75, 75], [75, 77, 82, 91, 76], [75, 75, 82, 75, 75], [75, 76, 75, 75, 76], [75, 75, 75, 75, 75], [91, 93, 90, 88, 96], [75, 75, 77, 75, 75]])
You’ve applied the
np.maximum() function to two arguments:
.shape is (8, 5), and the single scalar parameter
75. You can think of this second parameter as a 1 × 1 array that’ll be stretched inside the function to cover eight rows and five columns. The stretched array can then be compared element by element with
n_scores, and the pairwise maximum can be returned for each element of the result.
The result is the same as if you had compared
n_scores with an array of its own shape, (8, 5), but with the value
75 in each element.
This stretching is just conceptual—NumPy is smart enough to do all this without actually creating the stretched array. So you get the notational convenience of this example without compromising efficiency.
You can do much more with broadcasting. Professor Leibniz has noticed Newton’s skulduggery with his
best_n_scores array, and decides to engage in a little data manipulation of her own.
Leibniz’s plan is to artificially boost all her students’ scores to be at least equal to the average score for a particular test. This will have the effect of increasing all the below-average scores—and thus produce some quite misleading results! How can you help the professor achieve her somewhat nefarious ends?
Your first step is to use the array’s
.mean() method to create a one-dimensional array of means per test. Then you can use
np.maximum() and broadcast this array over the entire
>>> mean_l_scores = l_scores.mean(axis=0, dtype=np.integer) >>> mean_l_scores array([79, 68, 71, 69, 64]) >>> np.maximum(mean_l_scores, l_scores) array([[87, 73, 71, 69, 67], [79, 68, 82, 80, 64], [92, 85, 71, 79, 77], [79, 79, 71, 69, 87], [86, 91, 92, 73, 64], [79, 68, 71, 79, 64], [83, 68, 71, 69, 64], [89, 68, 72, 69, 64]])
The broadcasting happens in the highlighted function call. The one-dimensional
mean_l_scores array has been conceptually stretched to match the two-dimensional
l_scores array. The output array has the same
.shape as the larger of the two input arrays,
So, what are the rules for broadcasting? A great many NumPy functions accept two array arguments.
np.maximum() is just one of these.
Arrays that can be used together in such functions are termed compatible, and their compatibility depends on the number and size of their dimensions—that is, on their
The simplest case occurs if the two arrays, say
B, have identical shapes. Each element in
A is matched, for the function’s purposes, to the element at the same index address in
Broadcasting rules get more interesting when
B have different shapes. The elements of compatible arrays must somehow be unambiguously paired together so that each element of the larger array can interact with an element of the smaller array. The output array will have the
.shape of the larger of the two input arrays. So compatible arrays must follow these rules:
If one array has fewer dimensions than the other, only the trailing dimensions are matched for compatibility. The trailing dimensions are those that are present in the
.shapeof both arrays, counting from the right. So if
(99, 99, 2, 3)and
(2, 3), then
Bare compatible because
(2, 3)are the trailing dimensions of each. You can completely ignore the two leftmost dimensions of
Even if the trailing dimensions aren’t equal, the arrays are still compatible if one of those dimensions is equal to
1in either array. So if
(99, 99, 2, 3)as before and
(1, 99, 1, 3)or
(1, 2, 1)or
(1, 1), then
Bis still compatible with
Ain each case.
You can get a feel for the broadcasting rules by playing around in the Python REPL. You’ll be creating some toy arrays to illustrate how broadcasting works and how the output array is generated:
>>> A = np.arange(24).reshape(2, 3, 4) >>> A array([[[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]], [[12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23]]]) >>> A.shape (2, 3, 4) >>> B = np.array( ... [ ... [[-7, 11, 10, 2], [-6, 7, -2, 14], [ 7, 4, 4, -1]], ... [[18, 5, 22, 7], [25, 8, 15, 24], [31, 15, 19, 24]], ... ] ... ) >>> B.shape (2, 3, 4) >>> np.maximum(A, B) array([[[ 0, 11, 10, 3], [ 4, 7, 6, 14], [ 8, 9, 10, 11]], [[18, 13, 22, 15], [25, 17, 18, 24], [31, 21, 22, 24]]])
There’s nothing really new to see here yet. You’ve created two arrays of identical
.shape and applied the
np.maximum() operation to them. Notice that the handy
.reshape() method lets you build arrays of any shape. You can verify that the result is the element-by-element maximum of the two inputs.
The fun starts when you experiment with comparing two arrays of different shapes. Try slicing
B to make a new array,
>>> C = B[:, :1, :] >>> C array([[[-7, 11, 10, 2]], [[18, 5, 22, 7]]]) >>> C.shape (2, 1, 4) >>> np.maximum(A, C) array([[[ 0, 11, 10, 3], [ 4, 11, 10, 7], [ 8, 11, 10, 11]], [[18, 13, 22, 15], [18, 17, 22, 19], [20, 21, 22, 23]]]))
The two arrays,
C, are compatible because the new array’s second dimension is
1, and the other dimensions match.
Notice that the
.shape of the result of the
maximum() operation is the same as
A.shape. That’s because
C, the smaller array, is being broadcast over
A. The result of a broadcast operation between arrays will always have the
.shape of the larger array.
Now you can try an even more radical slicing of
>>> D = B[:, :1, :1] >>> D array([[[-7]],[]]) >>> D.shape (2, 1, 1) >>> np.maximum(A, D) array([[[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]], [[18, 18, 18, 18], [18, 18, 18, 19], [20, 21, 22, 23]]])
Once again, the trailing dimensions of
D are all either equal or
1, so the arrays are compatible and the broadcast works. The result has the same
Perhaps the most extreme type of broadcasting occurs when one of the array parameters is passed as a scalar:
>>> np.maximum(A, 10) array([[[10, 10, 10, 10], [10, 10, 10, 10], [10, 10, 10, 11]], [[12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23]]])
NumPy automatically converts the second parameter,
10, to an
(1,), determines that this converted parameter is compatible with the first, and duly broadcasts it over the entire 2 × 3 × 4 array
Finally, here’s a case where broadcasting fails:
>>> E = B[:, 1:, :] >>> E array([[[-6, 7, -2, 14], [ 7, 4, 4, -1]], [[25, 8, 15, 24], [31, 15, 19, 24]]]) >>> E.shape (2, 2, 4) >>> np.maximum(A, E) Traceback (most recent call last): ... ValueError: operands could not be broadcast together with shapes (2,3,4) (2,2,4)
If you refer back to the broadcasting rules above, you’ll see the problem: the second dimensions of
E don’t match, and neither is equal to
1, so the two arrays are incompatible.
You can read more about broadcasting in Look Ma, No
for Loops: Array Programming With NumPy.
There’s also a good description of the rules in the NumPy docs.
The broadcasting rules can be confusing, so it’s a good idea to play around with some toy arrays until you get a feel for how it works!
In this tutorial, you’ve explored the NumPy library’s
maximum() operations to find the maximum values within or across arrays.
Here’s what you’ve learned:
- Why NumPy has its own
max()function, and how you can use it
- How the
maximum()function differs from
max(), and when it’s needed
- Which practical applications exist for each function
- How you can handle missing data so your results make sense
- How you can apply your knowledge to the complementary task of finding minimum values
Along the way, you’ve learned or refreshed your knowledge of the basics of NumPy syntax. NumPy is a hugely popular library because of its powerful support for array operations.
Now that you’ve mastered the details of NumPy’s
maximum(), you’re ready to use them in your applications, or continue learning about more of the hundreds of array functions supported by NumPy.
Free Bonus: Click here to get access to a free NumPy Resources Guide that points you to the best tutorials, videos, and books for improving your NumPy skills.
If you’re interested in using NumPy for data science, then you’ll also want to investigate pandas, a very popular data-science library built on top of NumPy. You can learn about it in The Pandas DataFrame: Make Working With Data Delightful. And if you want to produce compelling images from data, take a look at Python Plotting With Matplotlib (Guide).
The applications of NumPy are limitless. Wherever your NumPy adventure takes you next, go forth and matrix-multiply!