Hello and welcome to the Real Python video series, Generating Random Data in Python. In these videos, you’ll explore a variety of ways to create random—or seemingly random—data in your programs and see how Python makes randomness happen.
Why might you want to generate random data in your programs? There are many reasons (games, testing, and so on), but these reasons generally fall under two broad categories:
For simulation, it’s best to start with the
Most people getting started in Python are quickly introduced to this module, which is part of the Python Standard Library. This means that it’s built into the language.
random provides a number of useful tools for generating what we call pseudo-random data. It’s known as a Pseudo-Random Number Generator, or PRNG.
We’ll come back to that term later, because it’s important, but right now, let’s consider this guess-a-number game.
random gives us a method,
randint(), that will generate a random integer within a range we specify. This range includes its bounds. In other words, invoking
randint(), as done here, can generate any number between and including
In this program, we’re having the computer simulate a person who is thinking of a number in their head and having a second person (in our case, the user) make a series of guesses until they guess correctly. Let’s look at another example.
Here, represented by a list of strings, is a deck of 52 cards. The first part of each string is the card’s value, and the last character is the suit. We’re going to simulate drawing a random card from this deck.
choice() method in
random is a good candidate for this simulation. Its job is to return a random element from a list or sequence. Yes, we could do the same thing by generating a random index value using
choice() reads a lot better. Just pass the sequence as an argument.
It might seem logical that if we wanted to simulate a full game of cards, say Blackjack or Go Fish, then we could repeatedly invoke
random.choice() for the number of cards in our hand.
However, this could generate duplicates. Each call to
random.choice() uses the original sequence so there is the potential to pull the same random value more than once. In fact, there is a method in the
choices(), that saves us the trouble of repeating
But again, the potential for duplicates is there, and that is why the documentation for
choices() uses the words “with replacement.” The simulation, in this case, is like placing each card back in the deck after it’s pulled. What if we wanted to avoid pulling duplicates?
It just so happens that
random provides another useful method,
sample(), which pulls a number of random values from a sequence without replacement. This means that, unlike
choices(), when a card is pulled using
sample(), it is no longer in play. This is more appropriate for the real life example of dealing a hand from a deck of cards, since we won’t get duplicates.
Speaking of cards, we also get a
shuffle() method from random. To use
shuffle(), we need to pass it a mutable sequence. In other words, you can pass
shuffle() a list, but not a tuple or string. This is because
shuffle() does not return a new value but rather shuffles what you give it. If you shuffle this deck of cards, then the original sequence is lost.
Sometimes, Python programmers forget this and place a variable assignment in front of
random.shuffle(). This variable will hold the value
shuffle() did the shuffling job on the actual list and had nothing to return.
If we needed to shuffle a sequence but retain the original order, then we’d have to make a copy and shuffle the copy.
What if we wanted to use
shuffle() to create a scrambled word puzzle? Strings aren’t mutable. What we could do is create a list, which is mutable and therefore can be scrambled, out of the string. We could then shuffle it and use
.join() to piece it back into a string.
In the next video, you’ll learn why pseudo-randomness makes for a great modeling tool and see a few of its operations in data science using the NumPy library. See you there!