# How to Use Conditional Expressions With NumPy where()

by Ian Eyre Sep 11, 2024

The NumPy `where()` function is a powerful tool for filtering array elements in lists, tuples, and NumPy arrays. It works by using a conditional predicate, similar to the logic used in the WHERE or HAVING clauses in SQL queries. It’s okay if you’re not familiar with SQL—you don’t need to know it to follow along with this tutorial.

You would typically use `np.where()` when you have an array and need to analyze its elements differently depending on their values. For example, you might need to replace negative numbers with zeros or replace missing values such as `None` or `np.nan` with something more meaningful. When you run `where()`, you’ll produce a new array containing the results of your analysis.

You generally supply three parameters when using `where()`. First, you provide a condition against which each element of your original array is matched. Then, you provide two additional parameters: the first defines what you want to do if an element matches your condition, while the second defines what you want to do if it doesn’t.

If you think this all sounds similar to Python’s ternary operator, you’re correct. The logic is the same.

Before you start, you should familiarize yourself with NumPy arrays and how to use them. It will also be helpful if you understand the subject of broadcasting, particularly for the latter part of this tutorial.

In addition, you may want to use the data analysis tool Jupyter Notebook as you work through the examples in this tutorial. Alternatively, JupyterLab will give you an enhanced notebook experience, but feel free to use any Python environment.

The NumPy library is not part of core Python, so you’ll need to install it. If you’re using a Jupyter Notebook, create a new code cell and type `!python -m pip install numpy` into it. When you run the cell, the library will install. If you’re working at the command line, use the same command, only without the exclamation point (!).

With these preliminaries out of the way, you’re now good to go.

Take the Quiz: Test your knowledge with our interactive “How to Use Conditional Expressions With NumPy where()” quiz. You’ll receive a score upon completion to help you track your learning progress:

Interactive Quiz

How to Use Conditional Expressions With NumPy where()

This quiz aims to test your understanding of the np.where() function. You won't find all the answers in the tutorial, so you'll need to do additional research. It's recommended that you make sure you can do all the exercises in the tutorial before tackling this quiz. Enjoy!

## How to Write Conditional Expressions With NumPy `where()`

One of the most common scenarios for using `where()` is when you need to replace certain elements in a NumPy array with other values depending on some condition.

Consider the following array:

Python
``````>>> import numpy as np

>>> test_array = np.array(
...     [
...         [3.1688358, 3.9091694, 1.66405549, -3.61976783],
...         [7.33400434, -3.25797286, -9.65148913, -0.76115911],
...         [2.71053173, -6.02410179, 7.46355805, 1.30949485],
...     ]
... )
``````
Copied!

To begin with, you need to import the NumPy library into your program. It’s standard practice to do so using the alias `np`, which allows you to refer to the library using this abbreviated form.

The resulting array has a shape of three rows and four columns, each containing a floating-point number.

Now suppose you wanted to replace all the negative numbers with their positive equivalents:

Python
``````>>> np.where(
...     test_array < 0,
...     test_array * -1,
...     test_array,
... )
array([[3.1688358 , 3.9091694 , 1.66405549, 3.61976783],
[7.33400434, 3.25797286, 9.65148913, 0.76115911],
[2.71053173, 6.02410179, 7.46355805, 1.30949485]])
``````
Copied!

The result is a new NumPy array with the negative numbers replaced by positives. Look carefully at the original `test_array` and then at the corresponding elements of the new `all_positives` array, and you’ll see that the result is exactly what you wanted.

Before moving on to other use cases of `where()`, you’ll take a closer look at how this all works. To achieve your aim in the previous example, you passed in `test_array < 0` as the condition. In NumPy, this creates a Boolean array that `where()` uses:

Python
``````>>> test_array < 0
array([[False, False, False,  True],
[False,  True,  True,  True],
[False,  True, False, False]])
``````
Copied!

The Boolean array, often called the mask, consists only of elements that are either `True` or `False`. If an element matches the condition, the corresponding element in the Boolean array will be `True`. Otherwise, it’ll be `False`.

This mask array is always the same shape as the original array it’s based on, producing a one-to-one correspondence between the two. This means that the elements in the mask can be matched against the corresponding elements in the `test_array` to determine how the conditions are applied to each `test_array` element.

To see this, take a look at the top-left element of `test_array`, which is `3.1688358`. Since this is not less than zero, the top-left element in the Boolean array is `False`. Conversely, the final element in the top row of `test_array` does match the condition because -`3.61976783` is indeed less than zero. The final element in the top row of the Boolean array is, therefore, `True`.

In this example, you want to apply `test_array * -1` to each element matching a `True` value of this Boolean array. Conversely, if the original element is zero or more, the original `test_array` element will be applied instead. In other words, it’ll remain unchanged.

Take a careful look back at the original `test_array` and the resulting `all_positives` array. You’ll see that all negative elements from `test_array` have been replaced with their positive counterparts, while the original positive elements haven’t been changed. Had an element been `0`, it wouldn’t have changed either.

Congratulations! You’ve now written some code that demonstrates the basic use case of the `where()` function. If you’re ready for more, read on to learn how to use more complex conditions.

## How to Use Multiple Conditional Expressions

In the previous section, you successfully replaced all negative numbers with their positive counterparts. Suppose you wanted to do this only for values between -2 and 3, while leaving all others unchanged. To do this, you need to apply a more complex condition.

With your existing knowledge of `if-else`, you might be tempted to try something like this:

Python
``````>>> import numpy as np

>>> test_array = np.array(
...     [
...         [3.1688358, 3.9091694, 1.66405549, -3.61976783],
...         [7.33400434, -3.25797286, -9.65148913, -0.76115911],
...         [2.71053173, -6.02410179, 7.46355805, 1.30949485],
...     ]
... )

>>> np.where(
...     (test_array > -2) and (test_array < 3),
...     test_array * -1,
...     test_array,
... )
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
ValueError: The truth value of an array with more than one element is ambiguous.
``````
Copied!

Instead of giving you an answer, your code has raised a `ValueError` exception and crashed. Not exactly what you’d hoped for.

The reason this happened is because the `and` operator can only work with individual elements. It doesn’t understand arrays of values. When you write code such as `test_array > -2`, Python creates a Boolean array behind the scenes. While the `where()` function can cope with this as its condition parameter, using it with `and` raises the error.

The solution is to use the bitwise AND operator (&) instead. In NumPy, this operator is overloaded to do elementwise AND operations. It compares the values of both Boolean arrays element by element and returns a single Boolean array of the result. This result can then be understood by the `where()` function and safely applied to the original array.

The code below shows the steps required to produce such an array:

Python
``````>>> test_array > -2
array([[ True,  True,  True, False],
[ True, False, False,  True],
[ True, False,  True,  True]])

>>> test_array < 3
array([[False, False,  True,  True],
[False,  True,  True,  True],
[ True,  True, False,  True]])

>>> (test_array > -2) & (test_array < 3)
array([[False, False,  True, False],
[False, False, False,  True],
[ True, False, False,  True]])
``````
Copied!

As you already know, running `test_array > -2` and `test_array < 3` produces two Boolean arrays. This time, to compute the logical conjunction of both, you used the `&` operator, which has successfully created a third Boolean array based on the result of applying `&` against each pair of elements. The resulting array will contain `True` values if the corresponding elements in both Boolean arrays are `True`. All other values will be `False`.

When this Boolean array is passed into `where()`, the result is far more palatable:

Python
``````>>> np.where(
...     (test_array > -2) & (test_array < 3),
...     test_array * -1,
...     test_array,
... )
array([[ 3.1688358 ,  3.9091694 , -1.66405549, -3.61976783],
[ 7.33400434, -3.25797286, -9.65148913,  0.76115911],
[-2.71053173, -6.02410179,  7.46355805, -1.30949485]])
``````
Copied!

This time, only values between `-2` and `3` have changed their sign. In other words, those values that are both greater than `-2` and less than `3` are replaced.

Knowing how to use multiple conditions and understanding how to use parentheses to control operator precedence allows you to create some really complex analysis conditions and unleash the real power of `where()`.

As another example, suppose you wanted to flip the signs in your original `test_array`, but only if the number is less than or equal to -2 or greater than or equal to 3:

Python
``````>>> np.where(
...     (test_array <= -2) | (test_array >= 3),
...     test_array * -1,
...     test_array,
... )
array([[-3.1688358 , -3.9091694 ,  1.66405549,  3.61976783],
[-7.33400434,  3.25797286,  9.65148913, -0.76115911],
[ 2.71053173,  6.02410179, -7.46355805,  1.30949485]])
``````
Copied!

Here, you’ve used the bitwise OR operator (`|`). Similarly to `&`, this operator has been overloaded to do elementwise OR operations. The expression `(test_array <= -2) | (test_array >= 3)` once again produces two Boolean arrays before using the `|` operator to combine them into one. This time, `True` will appear in the resulting Boolean array if, and only if, at least one of the corresponding elements is `True`. Attempting this with `or` will again produce a `ValueError` for the same reason `and` did earlier.

Take a careful look at both your original `test_array` and the resulting array, and you’ll see that the conversion has only been applied to numbers that fall outside the (-2, 3) interval.

It’s time to consolidate your learning with an exercise. Have a try at this:

Create a five-row by four-column array using the following code:

Python
``````>>> question_1 = np.arange(-10, 10).reshape(5, 4)
``````
Copied!

Your array will contain all numbers from `-10` to `9` in a sequence. Now, use it to solve the following challenges:

1. Use `where()` to create an array that replaces the elements in `question_1` with the number `9` if they are either negative or even. Before you run your code, see if you can predict how many nines there will be in the new array. Were you correct?

2. Next, use `where()` to create an array that has squared each negative odd number in `question_1`.

3. Finally, use `where()` to create an array to replace all elements in `question_1` that are between `3` and `7` or equal to `1`, with `-10`. For all other elements, subtract one from them. Oh, and do take care with operator precedence.

One possible solution for the first question is:

Python
``````>>> np.where(
...     (question_1 < 0) | (question_1 % 2 == 0),
...     9,
...     question_1,
... )
array([[9, 9, 9, 9],
[9, 9, 9, 9],
[9, 9, 9, 1],
[9, 3, 9, 5],
[9, 7, 9, 9]])
``````
Copied!

Here, you used the less than operator (<) to filter negative numbers and the modulo operator (%) to filter out each even number. By using the elementwise `|` operator, you filtered values that matched either condition.

If you said there would be sixteen nines in the result, well done. If you said there would only be fifteen, well done on understanding how `where()` works. Unfortunately, you forgot to count the existing `9`.

One possible solution for the second question is:

Python
``````>>> np.where(
...     (question_1 < 0) & (question_1 % 2 != 0),
...     np.square(question_1),
...     question_1,
... )
array([[-10,  81,  -8,  49],
[ -6,  25,  -4,   9],
[ -2,   1,   0,   1],
[  2,   3,   4,   5],
[  6,   7,   8,   9]])
``````
Copied!

This time, you used the less than operator (<) to filter negative numbers and the modulo operator (%) to filter out each odd number. By using the `&` operator, you filtered values that matched both conditions. The `np.square()` function did the squaring of the filtered elements for you.

One possible solution for third question is:

Python
``````>>> np.where(
...     ((question_1 > 3) & (question_1 < 7)) | (question_1 == 1),
...     -10,
...     question_1 - 1,
... )
array([[-11, -10,  -9,  -8],
[ -7,  -6,  -5,  -4],
[ -3,  -2,  -1, -10],
[  1,   2, -10, -10],
[-10,   6,   7,   8]])
``````
Copied!

You used the `&` operator to filter values between `3` and `7`. Then, you took the resulting Boolean array and used the `|` operator to include values that equal `1`.

With that workout complete, it’s time for you to move on and learn how to perform array broadcasting conditionally.

## How to Use Array Broadcasting in Conditional Expressions

In the examples you’ve seen so far, the conditions have performed a calculation on the existing array’s elements to produce a new value. While this is a very common use case for `where()`, you can also use the `where()` function to replace elements in an array with those from other arrays, depending on the result of your condition.

To make this possible, the arrays you use in the condition must be broadcast compatible with the original array whose values you want to replace. Broadcasting allows you to perform operations between arrays with different shapes without having to write complicated loops.

Two arrays are broadcast compatible if their rightmost dimensions are identical, or either of these dimensions is 1. Once your arrays are broadcast compatible, you can use them together with the `where()` function.

As an example, suppose you have the following array:

Python
``````>>> booking_data = np.array(
...     [
...         [np.nan, np.nan, 1],
...         [1, 1, np.nan],
...         [1, np.nan, 1],
...         [1, 1, 1],
...     ]
... )
``````
Copied!

Next, imagine that your `booking_data` array contains the details of meal reservations for a hotel. Each row represents a separate guest, while each column represents menu requirements. You use a `1` in the leftmost column to represent a breakfast request, a `1` in the center column to represent a lunch request, and a `1` in the rightmost column to represent an evening meal request. `np.nan` indicates that the meal hasn’t been requested.

Your array contains four rows and three columns. This is defined by the `booking_data` array’s `.shape` instance variable:

Python
``````>>> booking_data.shape
(4, 3)
``````
Copied!

Now consider this array:

Python
``````>>> meal_prices = np.array([5.1, 8.2, 20.3]).reshape(1, 3)
>>> no_charge = 0
``````
Copied!

The `meal_prices` array contains one row and three columns of price information. The price of breakfast is \$5.10, lunch is \$8.20, and an evening meal is \$20.30. The array shape this time is (1, 3).

The point to note here is that `booking_data` and `meal_prices` are broadcast compatible because their rightmost dimensions of `3` are identical. This allows you to replace elements in one array with those from the other.

You’ve also created a `no_charge` variable and assigned it a value of `0`. Although this is a single number and not an array, single numbers are broadcastable across any size of array. In other words, they are always broadcast compatible.

Now, suppose you want to clean up your `booking_data` array by creating a new `booking_prices` array that replaces each `1` with its corresponding prices and each `np.nan` with a `0`. The `where()` function can do this for you:

Python
``````>>> booking_prices = np.where(booking_data == 1, meal_prices, no_charge)
>>> booking_prices
array([[ 0. ,  0. , 20.3],
[ 5.1,  8.2,  0. ],
[ 5.1,  0. , 20.3],
[ 5.1,  8.2, 20.3]])
``````
Copied!

As you can see, where the `booking_data == 1`, the corresponding element from `meal_prices` has been inserted into `booking_prices`. Otherwise, a `0` has been inserted.

Although this is certainly a powerful use of `where()`, the principles here are the same as in earlier use cases. The `booking_data == 1` parameter created a Boolean array. In cases where an element in this Boolean array is `True`, the corresponding element from the `meal_prices` array is used in the result. Where an element is `False`, the value of `no_charge`, or `0`, is used instead.

You may have noticed that the inserted value of `no_charge` is a `float`. This is because there are already floats in the array, so any integers are automatically upsized to become `float` types to keep the array homogeneous.

Time for another workout:

Create a `question_2 array` using this code:

Python
``````>>> question_2 = np.arange(12).reshape(3, 4)
``````
Copied!

Next, create two variables—`high` and `low`—and assign them strings as shown:

Python
``````>>> high = "HIGH"
>>> low = "LOW"
``````
Copied!

Now, use the `where()` function to replace all numbers greater than `6` with the string “HIGH”, and everything else with “LOW” using the following three techniques:

1. Use the `question_2`, `high`, and `low` variables as defined above.

2. Assign new `high` and `low` variables with arrays that contain the strings “HIGH” and “LOW”, respectively.

3. As an extra challenge, see if you can make both of these arrays different shapes, but still broadcast compatible with `question_2`.

In each case, the result should be identical.

One possible solution for the first question is:

Python
``````>>> question_2 = np.arange(12).reshape(3, 4)  # Shape (3, 4)

>>> high = "HIGH"
>>> low = "LOW"

>>> np.where(question_2 > 6, high, low)
array([['LOW', 'LOW', 'LOW', 'LOW'],
['LOW', 'LOW', 'LOW', 'HIGH'],
['HIGH', 'HIGH', 'HIGH', 'HIGH']], dtype='<U4')
``````
Copied!

One possible solution for the second question is:

Python
``````>>> question_2 = np.arange(12).reshape(3, 4)  # Shape (3, 4)

>>> high = np.array(["HIGH"])  # Shape (1,)
>>> low = np.array(["LOW"])    # Shape (1,)

>>> np.where(question_2 > 6, high, low)
array([['LOW', 'LOW', 'LOW', 'LOW'],
['LOW', 'LOW', 'LOW', 'HIGH'],
['HIGH', 'HIGH', 'HIGH', 'HIGH']], dtype='<U4')
``````
Copied!

One possible solution for the third question is:

Python
``````>>> question_2 = np.arange(12).reshape(3, 4)  # Shape (3, 4)

>>> high = np.array(["HIGH", "HIGH", "HIGH", "HIGH"]) # Shape (4,)
>>> low = np.array(["LOW"]) # Shape (1,)

>>> np.where(question_2 > 6, high, low)
array([['LOW', 'LOW', 'LOW', 'LOW'],
['LOW', 'LOW', 'LOW', 'HIGH'],
['HIGH', 'HIGH', 'HIGH', 'HIGH']], dtype='<U4')
``````
Copied!

In this last solution, you could have swapped the shapes of `high` and `low` around.

To finish off, you’ll see what is effectively the simplest use case of `where()`. You’ll also learn the importance of reading documentation carefully to highlight such use cases.

## How Not to Use np.where() - A Final Quirk

When you read the official documentation for `where()`, the definition of the function may make it look a little more complicated than it is:

Python Syntax
``````numpy.where(condition, [x, y, ]/)
``````
Copied!

As with all Python documentation, it’s tempting to skip over this information and look at some examples instead. However, if you take some time to read it, you’ll gain a better understanding of the different ways the function can be used.

First of all, the definition ends with a forward slash (/) character. You might think this represents a division or line continuation symbol, but it’s neither. By placing the forward slash special parameter at the end, the documentation is telling you that each parameter passed must be passed by position and not by keyword.

The first parameter is the condition the elements are tested against, while the second and third parameters, formally documented as `x` and `y`, define the true or false actions to be taken depending on the result of the condition. However, using these parameter names in code is not allowed.

You may also notice that the `x` and `y` parameters are encased in square brackets. You could be forgiven for thinking this is telling you to supply these parameters as a Python list. In fact, the square brackets here indicate that both `x` and `y` are optional. You should also note that you can’t pass only one of them.

In this tutorial, you’ve always used three parameters because this is the most common approach. However, now that you know only the first parameter is mandatory, you may be wondering what happens if you omit the other two. To find out, take a look at the code shown below:

Python
``````>>> import numpy as np

>>> mostly_zeroes = np.array(
...     [[9, 0, 0],
...      [0, 8, 5],
...      [0, 0, 7]])
>>> np.where(mostly_zeroes != 0)
(array([0, 1, 1, 2]), array([0, 1, 2, 2]))
``````
Copied!

If you provide the `where()` function with only a `condition` parameter, it’ll return a Python tuple containing arrays of the indices of those elements whose values are non-zero. There will be one array for each dimension. This is why two arrays are returned in the above example: `mostly_zeroes` has two dimensions (3, 3).

This somewhat confusing output tells you that the elements at positions (0, 0), (1, 1), (1, 2), and (2, 2) are all non-zero. In other words, they correspond to `True` values in the underlying Boolean array produced by the condition. The other elements are zero.

This is extremely useful for highlighting non-zero elements in a data analysis.

The documentation doesn’t recommend using `where()` this way, but instead advises you to use the `nonzero()` function directly:

Python
``````>>> np.nonzero(mostly_zeroes)
(array([0, 1, 1, 2]), array([0, 1, 2, 2]))
``````
Copied!

The result is identical to the previous example because passing only a `condition` argument to `where()` results in a call to `nonzero()` behind the scenes. There’s little point in using `where()` to do this because it only adds overhead to the `nonzero()` call. You can also use `nonzero()` to find the indices of other conditions:

Python
``````>>> np.nonzero(mostly_zeroes == 5)
(array([1]), array([2]))
``````
Copied!

In this case, only the element at (1, 2) is equal to five. This works because, as you’ve seen earlier, the condition `mostly_zeroes == 5` is interpreted as a Boolean array. Then, in that mask `True` is interpreted as `1` and `False` as `0`. In other words, all elements satisfying the condition are non-zero.

## Conclusion

You now have a comprehensive understanding of how to use NumPy’s `where()` function, its parameters, and how they’re used to perform tasks on array elements depending on the value of those elements.

Congratulations on completing this tutorial, and enjoy applying these newfound skills to your future data analysis projects!

Take the Quiz: Test your knowledge with our interactive “How to Use Conditional Expressions With NumPy where()” quiz. You’ll receive a score upon completion to help you track your learning progress:

Interactive Quiz

How to Use Conditional Expressions With NumPy where()

This quiz aims to test your understanding of the np.where() function. You won't find all the answers in the tutorial, so you'll need to do additional research. It's recommended that you make sure you can do all the exercises in the tutorial before tackling this quiz. Enjoy!

🐍 Python Tricks 💌

Get a short & sweet Python Trick delivered to your inbox every couple of days. No spam ever. Unsubscribe any time. Curated by the Real Python team.

Ian is an avid Pythonista and Real Python contributor who loves to learn and teach others.

Each tutorial at Real Python is created by a team of developers so that it meets our high quality standards. The team members who worked on this tutorial are:

What Do You Think?