Hello and welcome to the Real Python video series, Generating Random Data in Python. In these videos, you’ll explore a variety of ways to create random—or seemingly random—data in your programs and see how Python makes randomness happen.
Why You May Want to Generate Random Data
Why might you want to generate random data in your programs? There are many reasons (games, testing, and so on), but these reasons generally fall under two broad categories:
- Simulation
- Security
The random
Module
For simulation, it’s best to start with the random
module.
Most people getting started in Python are quickly introduced to this module, which is part of the Python Standard Library. This means that it’s built into the language. random
provides a number of useful tools for generating what we call pseudo-random data. It’s known as a Pseudo-Random Number Generator, or PRNG.
We’ll come back to that term later, because it’s important, but right now, let’s consider this guess-a-number game.
Importing random
gives us a method, randint()
, that will generate a random integer within a range we specify. This range includes its bounds. In other words, invoking randint()
, as done here, can generate any number between and including 1
and 100
.
In this program, we’re having the computer simulate a person who is thinking of a number in their head and having a second person (in our case, the user) make a series of guesses until they guess correctly. Let’s look at another example.
Here, represented by a list of strings, is a deck of 52 cards. The first part of each string is the card’s value, and the last character is the suit. We’re going to simulate drawing a random card from this deck.
The choice()
method in random
is a good candidate for this simulation. Its job is to return a random element from a list or sequence. Yes, we could do the same thing by generating a random index value using randint()
, but choice()
reads a lot better. Just pass the sequence as an argument.
It might seem logical that if we wanted to simulate a full game of cards, say Blackjack or Go Fish, then we could repeatedly invoke random.choice()
for the number of cards in our hand.
However, this could generate duplicates. Each call to random.choice()
uses the original sequence so there is the potential to pull the same random value more than once. In fact, there is a method in the random
module, choices()
, that saves us the trouble of repeating random.choice()
.
But again, the potential for duplicates is there, and that is why the documentation for choices()
uses the words “with replacement.” The simulation, in this case, is like placing each card back in the deck after it’s pulled. What if we wanted to avoid pulling duplicates?
It just so happens that random
provides another useful method, sample()
, which pulls a number of random values from a sequence without replacement. This means that, unlike choice()
and choices()
, when a card is pulled using sample()
, it is no longer in play. This is more appropriate for the real life example of dealing a hand from a deck of cards, since we won’t get duplicates.
Speaking of cards, we also get a shuffle()
method from random. To use shuffle()
, we need to pass it a mutable sequence. In other words, you can pass shuffle()
a list, but not a tuple or string. This is because shuffle()
does not return a new value but rather shuffles what you give it. If you shuffle this deck of cards, then the original sequence is lost.
Sometimes, Python programmers forget this and place a variable assignment in front of random.shuffle()
. This variable will hold the value None
because shuffle()
did the shuffling job on the actual list and had nothing to return.
If we needed to shuffle a sequence but retain the original order, then we’d have to make a copy and shuffle the copy.
What if we wanted to use shuffle()
to create a scrambled word puzzle? Strings aren’t mutable. What we could do is create a list, which is mutable and therefore can be scrambled, out of the string. We could then shuffle it and use .join()
to piece it back into a string.
In the next video, you’ll learn why pseudo-randomness makes for a great modeling tool and see a few of its operations in data science using the NumPy library. See you there!
Chaitanya on June 29, 2019
i would say its better to give more examples explaining the concept. just a suggestion and this is not an offense.