Random Data Generation in Python
Hello and welcome to the Real Python video series, Generating Random Data in Python. In these videos, you’ll explore a variety of ways to create random—or seemingly random—data in your programs and see how Python makes randomness happen.
Why You May Want to Generate Random Data
Why might you want to generate random data in your programs? There are many reasons (games, testing, and so on), but these reasons generally fall under two broad categories:
- Simulation
- Security
The random
Module
For simulation, it’s best to start with the random
module.
Most people getting started in Python are quickly introduced to this module, which is part of the Python Standard Library. This means that it’s built into the language. random
provides a number of useful tools for generating what we call pseudo-random data. It’s known as a Pseudo-Random Number Generator, or PRNG.
We’ll come back to that term later, because it’s important, but right now, let’s consider this guess-a-number game.
Importing random
gives us a method, randint()
, that will generate a random integer within a range we specify. This range includes its bounds. In other words, invoking randint()
, as done here, can generate any number between and including 1
and 100
.
In this program, we’re having the computer simulate a person who is thinking of a number in their head and having a second person (in our case, the user) make a series of guesses until they guess correctly. Let’s look at another example.
Here, represented by a list of strings, is a deck of 52 cards. The first part of each string is the card’s value, and the last character is the suit. We’re going to simulate drawing a random card from this deck.
The choice()
method in random
is a good candidate for this simulation. Its job is to return a random element from a list or sequence. Yes, we could do the same thing by generating a random index value using randint()
, but choice()
reads a lot better. Just pass the sequence as an argument.
It might seem logical that if we wanted to simulate a full game of cards, say Blackjack or Go Fish, then we could repeatedly invoke random.choice()
for the number of cards in our hand.
However, this could generate duplicates. Each call to random.choice()
uses the original sequence so there is the potential to pull the same random value more than once. In fact, there is a method in the random
module, choices()
, that saves us the trouble of repeating random.choice()
.
But again, the potential for duplicates is there, and that is why the documentation for choices()
uses the words “with replacement.” The simulation, in this case, is like placing each card back in the deck after it’s pulled. What if we wanted to avoid pulling duplicates?
It just so happens that random
provides another useful method, sample()
, which pulls a number of random values from a sequence without replacement. This means that, unlike choice()
and choices()
, when a card is pulled using sample()
, it is no longer in play. This is more appropriate for the real life example of dealing a hand from a deck of cards, since we won’t get duplicates.
Speaking of cards, we also get a shuffle()
method from random. To use shuffle()
, we need to pass it a mutable sequence. In other words, you can pass shuffle()
a list, but not a tuple or string. This is because shuffle()
does not return a new value but rather shuffles what you give it. If you shuffle this deck of cards, then the original sequence is lost.
Sometimes, Python programmers forget this and place a variable assignment in front of random.shuffle()
. This variable will hold the value None
because shuffle()
did the shuffling job on the actual list and had nothing to return.
If we needed to shuffle a sequence but retain the original order, then we’d have to make a copy and shuffle the copy.
What if we wanted to use shuffle()
to create a scrambled word puzzle? Strings aren’t mutable. What we could do is create a list, which is mutable and therefore can be scrambled, out of the string. We could then shuffle it and use .join()
to piece it back into a string.
In the next video, you’ll learn why pseudo-randomness makes for a great modeling tool and see a few of its operations in data science using the NumPy library. See you there!
00:00 Hello and welcome to Real Python’s video series Generating Random Data in Python. In these videos, you’ll explore a variety of ways to create random or seemingly random data in your programs, and see how Python makes randomness happen.
00:16 So, why might you want to generate random data in your programs? There are many reasons—games, testing, et cetera. But these reasons generally fall under two broad categories, simulation and security.
00:30
For simulation, it’s best to start with the random
module. Most people getting started in Python are quickly introduced to this module, which is part of the Python standard library, meaning it’s built into the language.
00:43
random
provides a number of useful tools for generating what we call pseudo-random data. It’s known as a pseudo-random number generator, or a PRNG.
00:54 We’ll come back to this term later because it’s important. But right now, as an introduction to the module, let’s consider this very simple guess-a-number game.
01:04
Importing random
gives us the method randint()
. That will generate a random integer within a range we specify. This range includes its bounds.
01:13
In other words invoking, the randint()
method as we’ve done here can generate any number between, and including, 1
and 100
. In this program, we are having the computer simulate a person thinking of a number in their head and having a second person have a series of guesses until they guess correctly.
01:33 Let’s look at another example.
01:37 Here, represented by a simple list of strings, is a deck of 52 cards. The first part of each string is the card’s value and the last character is the suit.
01:48
We’re going to simulate drawing a random card from this deck. random
’s choice()
method is a good candidate for this simulation.
01:56
Its job is to return a random element from a list or sequence. Yes, we could do the same thing by generating a random index value using randint()
, but the choice()
method reads a lot better.
02:08
Simply pass the sequence as an argument. It might seem logical that if we wanted to simulate a full game of cards, say Blackjack or Go Fish, we could repeat random.choice()
for the number of cards that we needed in our hand. However, this could generate duplicates.
02:26
Each call to random.choice()
uses the same original sequence, so there’s the potential to pull the same random value more than once. In fact, there’s a method in the random
module called choices()
, with an s
, which saves us the trouble of repeating random.choice()
. But again, the potential for duplicates is there, and that is why the documentation for the choices()
method mentions the words “with replacement.” The simulation, in this case, is like placing each card back in the deck after pulling it. So, what do we do if we want to avoid pulling duplicates?
03:04
It just so happens, random
provides another useful method called sample()
. sample()
pulls a number of random values from a sequence without replacement, meaning unlike choice()
and choices()
, when a card is pulled using sample()
, it is no longer in play.
03:20 This is more appropriate for the real life example of dealing a hand from a deck of cards, since we won’t get duplicates.
03:29
Speaking of cards, we also get a shuffle()
method from random
. To use shuffle()
, we need to pass it a mutable sequence. In other words, you can pass shuffle()
a list, but not a tuple or a string.
03:42
The reason why is, shuffle()
does not return a new value—it shuffles what you give it. So if you shuffle this deck of cards, the original sequence is lost. Sometimes Python programmers forget this and place a variable assignment in front of random.shuffle()
.
03:58
This variable will hold the value None
because shuffle()
did the shuffling job on the actual list and returned nothing. If we needed to shuffle a sequence and retain the original order, we’d have to make a copy and shuffle the copy.
04:14
What if we wanted to use shuffle()
to create a scrambled word puzzle? Strings aren’t mutable, so what do we do? What we could do is create a mutable and therefore scramble-able list out of the string, shuffle that, and use string’s .join()
method to piece it back into a string.
04:36
Coming up in the next video, you’ll learn why pseudo-randomness is ideal for modeling and simulation, and you’ll learn about another random
module, this one incorporated in the NumPy package. See you there.
Jackie Wilson RP Team on June 29, 2019
Thanks for the feedback!
Become a Member to join the conversation.
Chaitanya on June 29, 2019
i would say its better to give more examples explaining the concept. just a suggestion and this is not an offense.