# Using the NumPy Random Number Generator

by Ian Eyre Jun 05, 2023

Random numbers are a very useful feature in many different types of programs, from mathematics and data analysis through to computer games and encryption applications. You may be surprised to learn that it’s actually quite difficult to get a computer to generate true randomness. However, if you’re careful, the NumPy random number generator can generate random enough numbers for everyday purposes.

Maybe you’ve already worked with randomly generated data in Python. While modules like `random` are great options for producing random scalars, using the `numpy.random` module will unlock even more possibilities for you.

In this tutorial, you’ll learn how to:

• Generate NumPy arrays of random numbers
• Randomize NumPy arrays
• Randomly select parts of NumPy arrays
• Take random samples from statistical distributions

Before starting this tutorial, you should understand the basics of NumPy arrays. With that knowledge, you’re ready to dive in.

## Understanding the NumPy Pseudo-Random Number Generator

When you ask a computer to perform any task for you, it does so by following a set of instructions defined by an algorithm. When you need it to generate random numbers, the computer uses a pseudo-random number generator (PRNG) algorithm. There are several of these available, some of which are better than others.

To generate random numbers, Python uses the `random` module, which generates numbers using the Mersenne twister algorithm. While this is still widely used in Python code, it’s possible to predict the numbers that it generates, and it requires significant computing power.

Since version 1.17, NumPy uses the more efficient permuted congruential generator-64 (PCG64) algorithm. This produces less-predictable numbers, as shown by its performance in the industry-standard TestU01 statistical test. PCG64 is also faster and requires fewer resources to work.

PRNGs are called pseudo-random because they’re not random! PRNGs are deterministic, which means they generate sequences of numbers that are reproducible. PRNGs require a seed number to initialize their number generation. PRNGs that use the same seed will generate the same numbers.

PRNGs also have a period property, which is the number of iterations they go through before they start repeating. Because the generated numbers depend on the seed, they’re not truly random but are instead pseudo-random.

Because seeds should be random, you need one random number to generate another. For this purpose, PRNGs use the computer hardware clock’s time as their default seed. This is measured to the nanosecond, so running number generators consecutively results in different seed values and therefore different sequences of random numbers. NumPy uses a hashing technique to ensure that the seed is 128 bits long, even if you only supply a 64-bit integer.

The period does mean that the same numbers could reappear. In practice, this isn’t a concern because the period lengths are huge. The period of PCG64, for example, is about 50 billion times the number of atoms that exist inside of you!

The core of NumPy’s number generation is the `BitGenerator` class. This class allows you to specify an algorithm and seed. To access the random numbers, the `BitGenerator` is passed into a separate `Generator` object. Generators have methods that allow you to access a range of random numbers and perform several randomizing operations. The `numpy.random` module provides this capability.

You may have noticed that the `NumPy.random` documentation also contains information about the `RandomState` class. This is a container class for the slower Mersenne twister PRNG. The more modern `Generator` class has now superseded `RandomState`, which you should no longer use in new code. However, `RandomState` is still around for existing legacy applications.

Before you go any further, be aware that the NumPy PRNGs are not suitable for cryptographic purposes. They’re only suitable for data analysis tasks. If you need random numbers for cryptographic purposes, then you need a cryptographically secure pseudo-random number generator (CSPRNG).

## Generating Random Data With the NumPy Random Number Generator

Now that you understand a computer’s capabilities for generating random numbers, in this section, you’ll learn how to generate both floating-point numbers and integers randomly using NumPy. After generating individual numbers, you’ll learn how to generate NumPy arrays of random numbers.

### Random Numbers

If you’re happy to let NumPy perform all of your random number generation work for you, you can use its default values. In other words, your `BitGenerator` will use PCG64 with a seed from the computer’s clock. To facilitate the defaults, NumPy provides a very handy `default_rng()` function. This sets everything up for you and returns a reference to a `Generator` object for you to use to produce random numbers using its range of powerful methods.

To begin with, this code generates a floating-point number using NumPy’s defaults:

>>>
``````>>> import numpy as np

>>> default_rng = np.random.default_rng()
>>> default_rng
'Generator(PCG64) at 0x1E9F2ABBF20'

>>> default_rng.random()
0.47418635476614734
``````

As you can see, `BitGenerator` uses PCG64. To actually generate a pseudo-random number, you call the generator’s `.random()` method. To satisfy yourself that the code is indeed generating a random number, run it several times and notice that you get a different number each time. Remember, this is because the seed value passed will be different.

By default, `Generator.random()` returns a 64-bit float in the half-open interval [0.0, 1.0). This notation is used to define a number range. The [ is the closed parameter and indicates inclusivity. In this example, 0.0 could be one of the numbers randomly generated. The ) is the open parameter and indicates the value 1.0 is just beyond what could be generated. In other words, [0.0, 1.0) defines the range 0.0 ≤ x < 1.0.

Earlier you learned how passing a seed value determines the sequence of random numbers generated. Recall that passing identical seed values into separate `BitGenerator` objects forces them to produce the same output. In the following example, you’ll see this for yourself.

In the code snippet below, you seed two separate identical `Generator` objects, both with a seed value of `100`:

>>>
``````>>> rng1 = np.random.default_rng(seed=100)
>>> rng1.random()
0.7852902058808499
>>> rng1.random()
0.7142492625022044

>>> rng2 = np.random.default_rng(seed=100)
>>> rng2.random()
0.7852902058808499
>>> rng2.random()
0.7142492625022044
``````

As expected, each `Generator` has generated two numbers, but they’re actually pseudo-random numbers! As you can see, the results give you a feeling of déjà vu. Seeding `Generator` objects identically always produces identical results! It doesn’t matter if both use the default PCG64 or both use the updated PCG64DXSM. The results would still be identical.

The `random()` method also includes a `dtype` parameter, which is rarely used. By default, this is set to `np.float64`, which generates 64-bit floats. If you set this parameter to `np.float32`, you can generate 32-bit floats instead.

### Random Floating-Point Numbers

You already know the `.random()` method will happily generate random floating-point numbers in the range [0.0, 1.0). Suppose you want to specify your own range. Unfortunately, this isn’t directly possible with `.random()`, unless you start adding some arithmetic to its output. To specify a range of floats, you can use the `.uniform()` method.

As you can probably guess from the `.uniform()` method’s signature, it defaults to generating a floating-point number in the same way that `.random()` does. However, unlike with `.random()`, you can optionally specify your own low and high parameters. The `.uniform()` method also contains a `size` parameter. You’ll learn more about this when you learn to generate random NumPy arrays later.

In its most basic form, you can call the `.uniform()` method with no parameters:

>>>
``````>>> import numpy as np
>>> rng = np.random.default_rng()

>>> rng.uniform()
0.5425301829704396
``````

When you run the above code,`.uniform()` will generate a single random floating-point number in the range [0, 1). In other words, the lowest number will be 0, while the largest number will be just less than 1. Should you want a number from 0 to just before 10, for example, you’d need to multiply the output by 10. However, this is limited because the lower bound will always be zero, and the coding is unclear.

A far better way is to utilize the real power of `.uniform()` by passing in its `low` and `high` parameters:

>>>
``````>>> rng.uniform(low=3.4, high=5.6)
4.656018709365851
``````

Here, because you set `low=3.4` and `high=5.6`, the `.uniform()` method generates another single float, but this time in the range [3.4, 5.6). Again, remember, 3.4 is inclusive, while 5.6 is exclusive.

You may wonder why a method used to generate floats is called `.uniform()`. The `.uniform()` method draws its numbers randomly from a uniform probability distribution. A uniform probability distribution means each of the values in the specified range [low, high) has an equal chance of being chosen. You’ll see later that this isn’t the only type of probability distribution NumPy supports.

### Random Integer Numbers

If you need to, you can also generate random integers. You do this using the `Generator` object’s `.integers()` method.

In its most basic form, you use `.integers()` with its one mandatory parameter:

>>>
``````>>> import numpy as np

>>> rng = np.random.default_rng()
>>> for i in range(5):
...     rng.integers(3)
...
1
2
0
1
2
``````

When you call `.integers()` with a single parameter, that parameter defines the upper exclusive bound of the numbers generated. In this example, because you’ve passed in `3`, the possibe outputs are in the range [0, 3). In other words, you might get 0, 1, or 2.

If you call `.integers()` five times using a `for` loop and the `range()` function, then it produces five values.

While you might find the above example your most common way of using `.integers()`, the method is actually far more flexible. Unfortunately, it’s also a little confusing. The `.integers()` method’s signature includes three parameters for defining the range of numbers from which the method will generate your random numbers: `low`, `high`, and `endpoint`. Unfortunately, they aren’t quite as intuitive to use as their names suggest.

The `low` parameter is the only one that’s mandatory. While its name suggests its value will define the lowest integer that the `.integers()` function can select, this is actually only true if you provide a value for the `high` parameter as well. So if you set `low=1` and `high=4`, then the `.integers()` method will select a random number in the range [1, 4):

>>>
``````>>> for count in range(5):
...     rng.integers(low=1, high=4)
...
3
2
3
1
2
``````

Here, you’ve only generated the numbers 1, 2, and 3. Once again, while `low` is possible, `high` isn’t.

Now, here’s where confusion may occur. If you pass only a `low` argument and accept the default `high` value of `None`, then `.integers()` sets its `low` parameter to `0` and use the value you provide for the upper limit. So, for example, if you call `.integers(low=7)` or `.integers(7)`, although `7` is the `low` parameter, the method will return a value in the range [0, 7):

>>>
``````>>> for count in range(5):
...     rng.integers(low=7)
...
3
4
2
6
0
``````

In the above code, you generate integers in the range from 0 to 6. Naming `low` here is misleading. Instead you should call `rng.integers(7)` or use the more explicit `rng.integers(low=0, high=7)` to do the same thing. In general, you should always include `high` if you pass `low` as a keyword argument.

The third parameter that defines the range is the `endpoint` parameter, which determines whether the interval includes the `high` value. Remember how the default is a half-open interval? That’s because `endpoint` defaults to `False`, meaning the sample interval is [low, high). However, if you set it to `True`, then the interval becomes inclusive at both ends, [low, high]. Knowing this, you can now write an inclusive interval:

>>>
``````>>> for count in range(5):
...     rng.integers(low=1, high=4, endpoint=True)
...
1
4
4
3
2

>>> for count in range(5):
...     rng.integers(7, endpoint=True)
...
5
7
6
2
0
``````

This time, the first loop will generate either 1, 2, 3, or 4. The second loop will generate numbers in the range 0 to 7, inclusive. Setting `endpoint=True` might make your integer intervals more intuitive.

The `.integers()` method produces 64-bit integers by default. This is because its `dtype` parameter is set to `np.int64`. If you set `dtype` to `np.int32`, then you’d obtain 32-bit integers instead.

### Random NumPy Arrays

When working with NumPy, you may wish to produce a NumPy array containing random numbers. You do this using the `size` parameter of either the `.random()`, `.uniform()`, or `.integers()` method of the `Generator` object. In all three methods, the default value of `size` is `None`, which causes a single number to be generated. However, if you assign a tuple to `size`, then you’ll generate an array.

In the example below, you generate a variety of NumPy arrays using different-size tuples:

>>>
``````>>> import numpy as np

>>> rng = np.random.default_rng()

>>> rng.random(size=(5,))
array([0.18097689, 0.19402707, 0.82936953, 0.29470017, 0.73697751])

>>> rng.random(size=(5, 3))
array([[0.85815152, 0.44158512, 0.49992378],
[0.99656444, 0.40376014, 0.93886646],
[0.31424733, 0.23561498, 0.43465744],
[0.02478389, 0.60644643, 0.52940267],
[0.54349223, 0.77175087, 0.2834884 ]])

>>> rng.random(size=(3, 4, 2))
array([[[0.78538958, 0.93149463],
[0.73405027, 0.32193268],
[0.14362809, 0.59825765],
[0.07675847, 0.0434828 ]],

[[0.28681395, 0.66925531],
[0.59575724, 0.18366003],
[0.54722312, 0.02620217],
[0.06019602, 0.48735061]],

[[0.40764258, 0.00756601],
[0.32556725, 0.44165999],
[0.05679186, 0.01690106],
[0.87091753, 0.46327738]]])
``````

If you want to create a one-dimensional array, then you set the `size` parameter to an integer or a tuple with a single element. Either `size=x` or `size=(x, )` allows you to generate a one-dimensional array with `x` elements.

Similarly, if you want to create a two-dimensional array with `x` rows and `y` columns, then you use `size=(x, y).` Setting the `size` parameter to a tuple with the elements `(x, y, z)` allows you to generate a three-dimensional array with `x` sets of `y` rows and `z` columns.

In this next example, you randomly generate two arrays, but this time, you specify the acceptable ranges of numbers:

>>>
``````>>> rng = np.random.default_rng()
>>> rng.integers(size=(2, 3), low=1, high=5)
array([[4, 2, 3],
[1, 1, 2]], dtype=int64)

>>> rng.uniform(size=(2, 3), low=1, high=5)
array([[4.97441068, 1.02042664, 1.43584549],
[2.87965746, 1.99063036, 2.86212453]])
``````

As you can see, the first array contains integers, while the second one contains floating-point numbers. Both are in the range [1, 5).

Now that you’ve gained confidence in creating random integers and floats, both individually and in NumPy arrays, you’ll next see how you can randomize NumPy arrays themselves.

## Randomizing Existing NumPy Arrays

Once you have a NumPy array, regardless of whether you’ve generated it randomly or obtained it from a more ordered source, there may be times when you need to select elements from it randomly or reorder its structure randomly. You’ll learn how to do this next.

### Selecting Array Elements Randomly

Suppose you have a NumPy array of data collected from a survey, and you wish to use a random sample of its elements for analysis. The `Generator` object’s `.choice()` method allows you to select random samples from a given array in a variety of different ways. You give this a whirl in the next few examples:

>>>
``````>>> import numpy as np

>>> rng = np.random.default_rng()

>>> input_array_1d = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
>>> rng.choice(input_array_1d, size=3, replace=False)
array([ 6, 12, 10])

>>> rng.choice(input_array_1d, size=(2, 3), replace=False)
array([[ 8, 12, 11],
[10,  7,  5]])
``````

In this example, you’re analyzing a one-dimensional NumPy array. The first `.choice()` call creates a one-dimensional array containing three elements chosen randomly from the original array. The second `.choice()` call creates a two-by-three array of six random elements from the original data.

The previous code makes use of the `replace` parameter. If you set `replace` to `False`, then you can’t select the same element more than once. By default, it’s set to `True`, meaning the same element might be selected multiple times. This is analogous to selecting a ball from a bag, replacing it, and then selecting again. When you need to avoid duplication, you should set `replace` to `False`.

One point to note is that the `.choice()` method selects elements based on their position in the original array. Should the same value appear twice in the original data, you could end up having both values selected regardless of which `replace` parameter setting you use.

### Selecting Rows and Columns Randomly

Suppose you wanted to randomly select one or more entire rows or columns from an array. The `.choice()` method allows this by means of its `axis` parameter. The `axis` parameter allows you to specify the direction in which you wish to analyze. For a two-dimensional array, setting `axis=0`, which is the default, means that you’ll be analyzing by row, while setting `axis=1` means that you’ll analyze by column.

Here are some examples of random selection in both directions:

>>>
``````>>> import numpy as np

>>> rng = np.random.default_rng()

>>> input_array = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])

>>> rng.choice(input_array, size=2)
array([[10, 11, 12],
[10, 11, 12]])

>>> rng.choice(input_array, size=2, axis=1)
array([[ 2,  1],
[ 5,  4],
[ 8,  7],
[11, 10]])
``````

You’re using an original NumPy array with four rows and three columns. The first analysis will randomly select two unique rows. In this case, the same row has been selected twice, but this won’t always be the case. As you might expect, the output is a two-by-three NumPy array. The second analysis randomly selects two columns. Again, duplicates may occur when you run the code, but you haven’t gotten any on this occasion.

To prevent the possibility of the same row or column being chosen multiple times, you set the `replace` parameter of `.choice()` to `False`, rather than its default value of `True`:

>>>
``````>>> rng.choice(input_array, size=3, replace=False)
array([[10, 11, 12],
[ 1,  2,  3],
[ 4,  5,  6]])
``````

You can run the above code as many times as you wish, and you’ll never see any row—or column if `axis` was set to `1`—more than once!

The `.choice()` method also contains a `shuffle` parameter that adds in an extra layer of randomness. It allows entire rows—or columns if `axis=1`—to be reordered after their initial random selection. Individual element order will remain the same within each row or column.

For `shuffle` to have an effect, you must first set `replace` to `False`. That effectively makes the shuffling operation available, but only if `shuffle` is at its default of `True`. If you set `shuffle` to `False` or set `replace` to `True`, then the additional shuffling operation doesn’t occur. In the next few examples, you’ll explore the effects of changing `replace` and `shuffle`. Pay careful attention to the output data and its order:

>>>
``````>>> rng = np.random.default_rng(seed=100)
>>> rng.choice(input_array, size=3, replace=False, shuffle=False)
array([[4, 5, 6],
[7, 8, 9],
[1, 2, 3]])
``````

As the output above shows, you’ve selected three rows from the array. These have been displayed in their selected order. Setting `shuffle` to `False` removes the extra shuffle operation that NumPy otherwise does by default, so you’ve sped up your code. Although `replace` made shuffling a possibility, setting `shuffle` to `False` prevented it from occurring.

To add more randomization, you can keep the default by omitting `shuffle`, or set `shuffle` to `True` explicitly:

>>>
``````>>> rng = np.random.default_rng(seed=100)
>>> rng.choice(input_array, size=3, replace=False, shuffle=True)
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
``````

This time, the same three rows were selected because the same seed value was used, but the `shuffle` parameter has reordered them. By setting `shuffle` to `True`, you’ve shuffled the output.

Try running the previous code repeatedly, and you’ll find that the output each time is identical. Not only is the original row selection pseudo-random, but the subsequent shuffling is pseudo-random as well! Both are based on the seed.

Now if you set `replace` to `True`, effectively switching `shuffle` off, then you’re in for a surprise:

>>>
``````>>> rng = np.random.default_rng(seed=100)
>>> rng.choice(input_array, size=3, replace=True, shuffle=False)
array([[10, 11, 12],
[10, 11, 12],
[ 1,  2,  3]])

>>> rng = np.random.default_rng(seed=100)
>>> rng.choice(input_array, size=3, replace=True, shuffle=True)
array([[10, 11, 12],
[10, 11, 12],
[ 1,  2,  3]])
``````

When you run the code this time, both outputs are identical. This is because both seed values are identical, and `shuffle` only has an effect if `replace` is set to `False`. However, the selected rows are different from those selected in the previous code, despite the fact that the seed values have remained the same.

Remember, the seed value that you explicitly provide to a PRNG is actually only the number it uses to calculate its real seed. In this case, the value of the `replace` parameter contributes to the calculation.

### Shuffling Arrays Randomly

It’s possible for you to randomize the order of the elements in a NumPy array by using the `Generator` object’s `.shuffle()` method. Although you can reach for it in several use cases, a very common application is a card game simulation.

To begin with, you create a function that produces a NumPy array of concatenated strings representing the various cards in a deck, albeit without the jokers:

>>>
``````>>> import numpy as np

>>> def create_deck():
...     RANKS = "2 3 4 5 6 7 8 9 10 J Q K A".split()
...     SUITS = "♣ ♢ ♡ ♠".split()
...     return np.array([r + s for s in SUITS for r in RANKS])
...
>>> create_deck()
array(['2♣', '3♣', '4♣', '5♣', '6♣', '7♣', '8♣', '9♣', '10♣', 'J♣', 'Q♣',
'K♣', 'A♣', '2♢', '3♢', '4♢', '5♢', '6♢', '7♢', ...,
'7♠', '8♠', '9♠', '10♠', 'J♠', 'Q♠', 'K♠', 'A♠'], dtype='<U3')
``````

The `create_deck()` function returns a NumPy array containing strings such as `"2♣"`, `"3♢"`, `"8♡"`, and `"K♠"`. As you can see, it does this by defining two constants containing the card ranks and their suits, and then it uses a list comprehension to create a Python list, which is then converted into a NumPy array.

Now suppose you want to draw three cards randomly from the deck after a shuffle. The `.shuffle()` method allows you to modify an array in place by shuffling its contents. By performing the shuffle in place, you save valuable memory space:

>>>
``````>>> rng = np.random.default_rng()
>>> deck_of_cards = create_deck()

>>> rng.shuffle(deck_of_cards)
>>> deck_of_cards[0:3]
array(['4♡', '6♠', '2♠'], dtype='<U3')

>>> rng.shuffle(deck_of_cards)
>>> deck_of_cards[0:3]
array(['K♠', '2♣', '6♠'], dtype='<U3')
``````

Your code shuffles the Numpy array card deck and then displays the first three elements—that is, the top three cards—from the shuffled version. You then reshuffle the deck and draw the top three cards again. Well done if you noticed that you performed both shuffles on the complete deck!

Although you could’ve used the `.choice()` method that you learned about earlier to select the cards, the `.shuffle()` method is a better option because it actually randomizes the array elements in place. This saves memory.

### Reordering Arrays Randomly

Previously, you learned how to select entire rows or columns of a NumPy array at random. Now suppose you wanted to randomize the elements of a multidimensional array. The generator offers `.shuffle()` that you’ve seen already, as well as the `.permutation()` and `.permuted()` methods for this purpose.

The `.permutation()` method randomly rearranges entire rows or columns. In other words, the elements within each row or column will stay in those rows and columns, but their order will be changed.

To illustrate these methods, you’ll use an altered version of your `create_deck()` function:

>>>
``````>>> import numpy as np

>>> def create_high_cards():
...     HIGH_CARDS = "10 J Q K A".split()
...     SUITS = "♣ ♢ ♡ ♠".split()
...     return np.array([r + s for s in SUITS for r in HIGH_CARDS])
...
``````

This time, you produce a deck containing only the tens, aces, and faces. This will make the results of the next few examples easier for you to see.

The initial deck looks like this:

>>>
``````>>> NUMBER_OF_SUITS = 4
>>> NUMBER_OF_RANKS = 5
>>> high_deck = create_high_cards().reshape((NUMBER_OF_SUITS, NUMBER_OF_RANKS))
>>> high_deck
array([['10♣', 'J♣', 'Q♣', 'K♣', 'A♣'],
['10♢', 'J♢', 'Q♢', 'K♢', 'A♢'],
['10♡', 'J♡', 'Q♡', 'K♡', 'A♡'],
['10♠', 'J♠', 'Q♠', 'K♠', 'A♠']], dtype='<U3')
``````

As you can see, the suit order is ♣, ♢, ♡, and ♠, in ascending order from ten to ace.

Now you want to organize the cards from each suit in a random order. To do this, you randomize the position of the rows within the array:

>>>
``````>>> NUMBER_OF_SUITS = 4
>>> NUMBER_OF_RANKS = 5
>>> high_deck = create_high_cards().reshape((NUMBER_OF_SUITS, NUMBER_OF_RANKS))

>>> rng = np.random.default_rng()
>>> rng.permutation(high_deck, axis=0)
array([['10♡', 'J♡', 'Q♡', 'K♡', 'A♡'],
['10♢', 'J♢', 'Q♢', 'K♢', 'A♢'],
['10♣', 'J♣', 'Q♣', 'K♣', 'A♣'],
['10♠', 'J♠', 'Q♠', 'K♠', 'A♠']], dtype='<U3')

>>> rng.permutation(high_deck, axis=0)
array([['10♢', 'J♢', 'Q♢', 'K♢', 'A♢'],
['10♠', 'J♠', 'Q♠', 'K♠', 'A♠'],
['10♡', 'J♡', 'Q♡', 'K♡', 'A♡'],
['10♣', 'J♣', 'Q♣', 'K♣', 'A♣']], dtype='<U3')
``````

You populate the initial array using your new `create_high_deck()` function. This time, you reshape the array into four rows of suits and five columns of card ranks. Then the `.permutation()` method works row-wise because `axis=0`. It randomizes the position of each row, but the content of each row remains in its original order.

Also note that the original `high_deck` is untouched. The previous operations randomized copies of the deck:

>>>
``````>>> high_deck
array([['10♣', 'J♣', 'Q♣', 'K♣', 'A♣'],
['10♢', 'J♢', 'Q♢', 'K♢', 'A♢'],
['10♡', 'J♡', 'Q♡', 'K♡', 'A♡'],
['10♠', 'J♠', 'Q♠', 'K♠', 'A♠']], dtype='<U3')
``````

The suits are still in the original order. Now say you want to organize the cards from each value in a random order. To do this, you randomize the position of the columns:

>>>
``````>>> rng.permutation(high_deck, axis=1)
array([['10♣', 'Q♣', 'A♣', 'J♣', 'K♣'],
['10♢', 'Q♢', 'A♢', 'J♢', 'K♢'],
['10♡', 'Q♡', 'A♡', 'J♡', 'K♡'],
['10♠', 'Q♠', 'A♠', 'J♠', 'K♠']], dtype='<U3')

>>> rng.permutation(high_deck, axis=1)
array([['Q♣', 'K♣', 'J♣', 'A♣', '10♣'],
['Q♢', 'K♢', 'J♢', 'A♢', '10♢'],
['Q♡', 'K♡', 'J♡', 'A♡', '10♡'],
['Q♠', 'K♠', 'J♠', 'A♠', '10♠']], dtype='<U3')
``````

In the above code, the `.permutation()` method works column-wise because `axis=1`. This time, you’ve randomized the position of each column with `.permutation()`, but the content of each column remains in the initial order. As you can see, the queens have taken the place of the ten rank as the first column, but you’ll notice that the suits are in the same, original order. That’s because you’ve rearranged the columns but kept the rows intact.

It’s easy to become confused when comparing the output from this new `.permutation()` method and from your earlier `.shuffle()` method. Both methods rearrange array elements in the same way. The difference is that the `.permutation()` method creates a new array of results, while `.shuffle()` updates the original array.

In the examples that you just coded, the `.permutation()` method randomized the original array. Each call to `.permutation()` resulted in a randomized copy of the original array.

With the `.shuffle()` method, you would’ve replaced the original array with the randomized version.

As before, you start off by creating a deck of high cards:

>>>
``````>>> NUMBER_OF_SUITS = 4
>>> NUMBER_OF_RANKS = 5
>>> high_deck = create_high_cards().reshape((NUMBER_OF_SUITS, NUMBER_OF_RANKS))
``````

As you can see, the four suits of the high cards are in this mini-deck. Pay attention to the order of the cards and the fact that you’re referencing them through a variable named `high_cards`.

Next you shuffle the cards:

>>>
``````>>> rng = np.random.default_rng()
>>> rng.shuffle(high_deck, axis=0)
>>> high_deck
array([['10♢', 'J♢', 'Q♢', 'K♢', 'A♢'],
['10♠', 'J♠', 'Q♠', 'K♠', 'A♠'],
['10♡', 'J♡', 'Q♡', 'K♡', 'A♡'],
['10♣', 'J♣', 'Q♣', 'K♣', 'A♣']], dtype='<U3')
``````

As you can see, setting the `axis` parameter to `0` has caused the row-order to be randomized. However, the content of each row remains the same.

Now carefully watch what happens when you call `.shuffle()` a second time:

>>>
``````>>> rng.shuffle(high_deck, axis=1)
>>> high_deck
array([['J♢', 'A♢', 'K♢', '10♢', 'Q♢'],
['J♠', 'A♠', 'K♠', '10♠', 'Q♠'],
['J♡', 'A♡', 'K♡', '10♡', 'Q♡'],
['J♣', 'A♣', 'K♣', '10♣', 'Q♣']], dtype='<U3')
``````

Again, as expected, because you’ve set `axis` to `1`, the column order is randomized. However, the content of each column remains the same. This is because `.shuffle()` has randomized the previously randomized card deck. In contrast, `.permutation()` wouldn’t have done this because it would’ve randomized the original, unrandomized version of the deck.

As a final point, you can make both `.permutation()` and `.shuffle()` do the same thing. To do this, you would use `high_cards=rng.permutation(high_cards, axis=0)` or `rng.shuffle(high_cards, axis=0)`. Do remember, however, that your results will probably be different due to the randomization effects.

The `.permuted()` method randomizes row or column elements independently of the other rows or columns and places the result into a new array. This is best seen by example.

Suppose you wanted to mix up the rows:

>>>
``````>>> NUMBER_OF_SUITS = 4
>>> NUMBER_OF_RANKS = 5
>>> high_deck = create_high_cards().reshape((NUMBER_OF_SUITS, NUMBER_OF_RANKS))

>>> rng = np.random.default_rng()
>>> rng.permuted(high_deck, axis=0)
array([['10♡', 'J♠', 'Q♠', 'K♠', 'A♠'],
['10♢', 'J♣', 'Q♡', 'K♣', 'A♣'],
['10♣', 'J♡', 'Q♣', 'K♢', 'A♡'],
['10♠', 'J♢', 'Q♢', 'K♡', 'A♢']], dtype='<U3')
``````

The `.permuted()` method works row-wise (`axis=0`) on the original array. In other words, it changes the content in each row independently of the other rows. Practically, this means that it randomly rearranges the elements of each column. Each column still contains the same cards, but their order is randomized. As a result, the rows contain different suits. In other words, you’ve randomly shuffled all of the value cards.

As you’ve probably guessed, you could also mix up the columns:

>>>
``````>>> rng.permuted(high_deck, axis=1)
array([['J♣', 'A♣', '10♣', 'Q♣', 'K♣'],
['K♢', 'A♢', 'J♢', 'Q♢', '10♢'],
['J♡', 'K♡', '10♡', 'Q♡', 'A♡'],
['J♠', 'Q♠', 'A♠', '10♠', 'K♠']], dtype='<U3')
``````

This time, you’re working column-wise (`axis=1`) on the original array. Now you’re changing the content in each column independently of the other columns. Practically, you’re randomly rearranging the elements of each row. Each row still contains the same cards, but their order is randomized. As a result, the columns contain different suits. In other words, you’ve randomly shuffled the same-suit cards.

Finally, suppose you wanted to completely randomize the deck:

>>>
``````>>> rng.permuted(rng.permuted(high_deck, axis=1), axis=0)
array([['A♡', 'Q♡', 'A♢', 'K♡', 'J♡'],
['Q♢', 'Q♠', '10♡', 'J♠', 'K♣'],
['10♠', 'J♣', 'K♠', 'A♣', 'K♢'],
['Q♣', 'J♢', '10♣', '10♢', 'A♠']], dtype='<U3')
``````

To perform a complete shuffle, you call the `.permuted()` method twice—first row-wise and then column-wise. As a result, you’ve randomized all the elements. This time, you’ve shuffled the entire deck, so it’s ready for dealing. Also note that you could alternatively use `rng.permuted(rng.permuted(high_cards, axis=0), axis=1)`. This would still randomize everything to the same level.

## Selecting Random Poisson Samples

The Poisson distribution is a popular probability distribution that you can use to determine the probability that a specific number of events will occur, assuming you know the average number of such events occurring. This is usually measured over a period of time.

As an example of the kinds of problems that the Poisson statistical distribution can help you solve, consider the following. Suppose a school safety department wants to investigate the traffic passing a school. They know, on average, one car passes the school every fifteen seconds. The safety department wants to know the probability of each of the following occurring:

• No cars will pass in any given minute.
• Four cars will pass in any given minute.
• Eight cars will pass in any given minute.

To calculate this, you use the Poisson probability mass function (PMF).

The probability of some random variable, X, taking on some discrete value, k, is given by:

Here, λ is the mean number of events that you expect to occur in the timescale under consideration, e is Euler’s constant (2.72), and k is the number of events whose probability you wish to determine.

Looking back at the cars example, you can assume the answers follow a Poisson distribution. The example asks you to determine the probabilities for zero, four, and eight cars passing over a period of one minute. The first value that you need to determine is λ. You know that one car passes, on average, every fifteen seconds, so the average number of cars passing per minute is four. This means λ equals four.

You then need to use the above formula to work out the probabilities. You can do that with the following code:

>>>
``````>>> import math

>>> lam = 4
>>> cars_per_minute = [0, 4, 8]

>>> for cars in cars_per_minute:
...     probability = lam**cars * math.exp(-lam) / math.factorial(cars)
...     print(f"P({cars}) = {probability:.1%}")
...
'P(0) = 1.8%'
'P(4) = 19.5%'
'P(8) = 3.0%'
``````

You’ve set the lambda parameter, `lam`, to four and have added the desired set of k values, `cars_per_minute`, to a list. You’ve passed each element of the list into the formula and printed the results.

The main point that you should take away from his example is the three answers—1.8, 19.5, and 3.0 percent—all follow a Poisson probability distribution.

To allow you to visualize this, you could run the previous calculation again with more values and plot them. One way to do this is to take advantage of NumPy’s vectorization capabilities:

``````import numpy as np
import matplotlib.pyplot as plt
from scipy.special import factorial

lam = 4
k_values = np.arange(0, 30)
probabilities = np.power(lam, k_values) * np.exp(-lam) / factorial(k_values)

plt.plot(k_values, probabilities, "ro")
plt.title("Sample Poisson Distribution.")
plt.xlabel("k")
plt.ylabel("P(k)")
plt.show()
``````

To create the plot, you must install and import two additional libraries. The `matplotlib.pyplot` library allows you to create a visualization of the data. The `scipy.special` library includes a `factorial()` function that can operate on each element of a NumPy array.

The code once more assumes lambda to be four, but this time, it works out the probability of thirty k values starting at zero. This is achieved by passing a NumPy array of thirty numbers, 0 to 29, into the `scipy.special.factorial()` function. Your resulting plot forms the shape of a Poisson distribution curve:

The shape of this curve shows that the data conforms to a Poisson distribution. It tells you that the most probable event is a 20 percent, or 0.2, chance that four cars will pass the school in any chosen minute. The probabilty rises sharply up to four cars, before falling quickly away again. It is, for example, highly unlikely that ten or more cars will pass the school in any chosen minute.

As you’ll now see, it’s possible to generate a range of random sample data that follows a Poisson distribution. To achieve this, you call the `Generator` object’s `.poisson()` method.

The `poisson()` method takes two paramters: `lam` and `size`. The `lam` parameter takes the known lambda value for the data under consideration. In the earlier example, this would’ve been `4`. The `size` parameter determines the quantity and format of the data that’s produced.

Before you see some examples of this in action, keep in mind that you’re generating random values fitting a Poisson distribution. There are several contexts in which you’d do this. The generated numbers could, for example, refer to the modern example of cars passing a school in a given minute, a historic concern, like accidental deaths by horse kick of soldiers in the Prussian army, or anything else you like.

It’s possible to generate a single number, an array of numbers, or a multidimensional array of numbers, all of which belong to a Poisson distribution:

>>>
``````>>> import numpy as np
>>> rng = np.random.default_rng()

>>> scalar = rng.poisson(lam=5)
>>> scalar
4

>>> sample_1d_array = rng.poisson(lam=5, size=4)
>>> sample_1d_array
array([4, 9, 6, 3], dtype=int64)

>>> sample_2d_array = rng.poisson(lam=5, size=(2, 3))
>>> sample_2d_array
array([[6, 6, 6],
[4, 1, 7]], dtype=int64)
``````

The first sample, `scalar`, contains one number. You can’t say anything about the sample distribution of one number, but repeatedly calling `rng.poisson()` would yield a set of numbers that conform to a Poisson distribution.

The second sample, `sample_1d_array`, contains a one-dimensional array of four Poisson numbers. The final sample, `sample_2d_array`, contains a two-by-three array of randomly distributed Poisson variables. You can sample as much as you like in as many dimensions as you like.

To show that the data from `.poisson()` conforms to a Poisson distribution, you can plot the method’s output:

``````import numpy as np
import matplotlib.pyplot as plt

rng = np.random.default_rng()
samples = rng.poisson(lam=5, size=10_000)
values, frequency = np.unique(samples, return_counts=True)

plt.title("Random Poisson Distribution.")
plt.xlabel("Values")
plt.ylabel("Frequency")
plt.plot(values, frequency, "ro")
plt.show()
``````

With this code, you produce the following plot:

You first generate a NumPy array of ten thousand random `samples` from the Poisson distribution whose λ value is `5`. NumPy’s `unique()` function then produces a frequency distribution by counting each unique `sample` value. You then plot the frequency of each individual value, and the plot’s shape proves that the ten thousand random `samples` conform to a Poisson distribution.

## Conclusion

You’re now familiar with how pseudo-random number generation works and which random number generation features NumPy offers. You can use your new skills to generate random numbers both individually and as NumPy arrays. You know how to select random items, rows, and columns from an array and how to randomize them. Finally, you gained insight into how NumPy supports random selection from statistical distributions.

In this tutorial, you’ve learned:

• How computers perform pseudo-random number generation
• How to generate NumPy arrays of random numbers
• How to randomize NumPy arrays
• How to randomly select elements, rows, and columns from a NumPy array
• How to select random samples from the Poisson statistical distribution

If you’d like to continue exploring NumPy’s capabilities, then your next step could be to explore getting normally distributed random numbers. If you want even more, then you can take a look at the range of the `Generator` object’s statistical methods in the NumPy documentation.

Do you have an interesting example of using random numbers? Perhaps you have a card trick to entertain your fellow programmers with, or a lottery number predictor that can make us all rich. Share your ideas with the community in the comments below.

🐍 Python Tricks 💌

Get a short & sweet Python Trick delivered to your inbox every couple of days. No spam ever. Unsubscribe any time. Curated by the Real Python team. Each tutorial at Real Python is created by a team of developers so that it meets our high quality standards. The team members who worked on this tutorial are: What Do You Think?