# Randomness for Modeling and Simulation

Welcome to video 2 in **Generating Random Data in Python**. In the last video, you heard that the `random`

module provides pseudo-randomness.

That means the random data generated from the methods in `random`

are not truly random. The `random`

module is an example of a PRNG, the P being for **Pseudo**. A **True** random number generator would be a TRNG and typically involves hardware. In the real world, rolling an unbiased die is an example of a TRNG.

What makes the random module a PRNG? First, it’s implemented in software, and by design can be seeded to be deterministic. In other words, we can recreate and predict the generated series of random values. Data generated from `random`

are produced based on a value we call the seed. You can think of the seed as a starting point to get the random generation going.

When you invoked the `random`

methods you learned in the last video, the `random`

module had to come up with its own seed, typically your system time. It then uses that seed in an algorithm to generate values. The `random`

module also has a method called `random()`

. Let’s see it in action.

`random.random()`

generates a float value equal to or greater than `0.0`

but less than `1.0`

, which is conveyed with the notation `[0.0, 1.0)`

to indicate that the first value is inclusive, but the second value is exclusive.

While it’s convenient that the `random`

module can seed off of system time, sometimes you’ll want to repeat a random sequence for testing or demonstration.

For this purpose, there is the `seed()`

method. Pass an `int`

argument, and the method will use it as the seed. As a side note, you may also pass `seed()`

a string, bytes, or byte array, and then those values will be converted to an integer before use.

In this example, you’ll see the effect of explicitly seeding `random()`

. It provides us a way to duplicate the same random generation, which is a handy tool for testing.

In addition to seeding, we can capture the state of `random()`

at any time with the `getstate()`

method. This returns a tuple that we can then pass to a companion `setstate()`

method to duplicate the generation at that moment.

## Data Science: The `numpy.random`

Module

Because simulation is such a common implementation of pseudo-random generation, it’s important to talk about its application in data science, and its use in the NumPy package.

This video will cover a few of these functions in NumPy, but NumPy could be a course all on its own. There are many tutorials covering NumPy in depth available on *Real Python*, and one of them is all about random number generation in NumPy.

NumPy contains its own `random`

module. Where the standard `random`

module provided us a convenient way of generating random scalar values, NumPy’s random implementation is more geared towards random series of data. Let’s go ahead and import it and get to work.

Here we’re using a Jupyter notebook to demonstrate some basic NumPy `random`

methods. We first import NumPy with the alias `np`

. See how NumPy’s `random`

duplicates many of the same methods and method names as the standard random module? These include `random()`

, `randint()`

, `seed()`

, and others.

These methods mostly function the same.

Both the `random()`

and `seed()`

work similarly to the one in the standard `random`

.

It appears `randint()`

also works in a similar way, but there are a couple differences that I’ll explain later.

Here, you see that we can re-run our random seed cell to reset our `randint()`

results.

For sequences, we also have a similar `choice()`

method.

But in NumPy, there is no `choices()`

method. The `sample()`

in NumPy’s `random`

is very different. If you pass a sequence argument, then it’s read as the size for a multi-dimensional array.

In this next code, we’re running `randint()`

to simulate the roll of a die. This is to illustrate some differences from the standard `randint()`

:

- The upper bounds is exclusive, requiring us to have 6+1 as our upper bounds in order for the 6 to be included in the possibilities.
- We can pass a third argument to get an array with that number of elements, in this case 100 rolls.

Repeatedly rolling a die would result in a uniform distribution of values between 1 and 6, and there is an `np.random.uniform()`

method we could use with the same arguments, but it produces floats.

When it comes to rolling two dice, that will look more like a normal distribution or bell curve. We can see this is true if we create a second die roll and combine them with the first die roll. When added together, the most likely result would be 7, and the least likely results would 2 and 12.

We can see the result graphically with Matplotlib, but it’s better illustrated if we increase our data samples to 5000.

That brings us to the `normal()`

method, but like `uniform()`

, it produces floats. It will give us values that would resemble a bell curve however. In the standard `random`

, we do have a `normalvariate()`

method. It requires mean and standard deviation arguments, and it returns only one value.

`random`

gives us a normal distribution in the shape we specify in the arguments.

Now for just one more illustration. We know some factors grow or decrease relative to other factors. This is known as **correlation**. NumPy can build correlated random data given a mathematical covariance. This function here will get that for us.

Let’s suppose we have a correlation matrix with `1`

, `0.9`

, and `0.9`

, `1`

. This means we have a strong correlation.

Let’s suppose we’re talking about age as one data set, and percentage of gray hair as the second data set. As age grows, so does the chance that percentage will increase. My numbers might be off from real life, but bear with me.

You can see that our ages and percentages are floats, and some of our gray hair percentages are negative, but that’s more because I couldn’t think of a good example. You see, however, that the older people in this cross section of data do have higher percentages of gray hair.

If we scatter plot these points, we see the diagonal trend that suggests our correlation between age and gray hair.

## Comparing `random`

vs `numpy.random`

Let’s wrap this up by comparing some of the features in the standard `random`

side by side with the corresponding features in NumPy `random`

.

Finally, remember that if you only need a single random value or a small sequence, then standard random is usually the faster and better option. NumPy is specialized for building large, multi-dimensional arrays.

You’ve now seen the benefits of pseudo-randomness along with situations where you might want to repeat your random data generation. This feature makes the PRNGs like the `random`

module great for simulation, but not so great for security. In the next video, you’ll know why. See you there!

**00:00**
Welcome to video number two in Generating Random Data in Python. In the last video, you heard me briefly mention that the `random`

module provides pseudo-randomness. Put simply, that means the random data generated from `random`

methods are not truly random.

**00:17**
The `random`

module’s an example of a PRNG, the *P* being *Pseudo*. A true random number generator, on the other hand, would be a TRNG, and would typically involve hardware. In the real world, rolling an unbiased die is an example of a TRNG.

**00:39**
What makes the `random`

module a PRNG? First, it’s implemented in software, and by design, can be seeded and be deterministic. In other words, we can recreate and predict the generated series of random values.

**00:55**
Data generated from `random`

are produced based on a value we call the seed. You can think of the seed as a starting point to get the random generation going. When you invoked the `random`

methods you learned in the last video, the random method had to come up with its own seed—typically, your system time.

**01:15**
It then used that seed in an algorithm to generate the values. Within the `random`

module is also a method called `random()`

. Let’s see it in action.

**01:27**
`random.random()`

generates a float value equal or greater than `0.0`

but less than `1.0`

, which is conveyed with the notation *[0.0, 1.0)*.

**01:41**
This indicates the first value is inclusive with the square bracket, but the second value as exclusive with the ending parenthesis. So the potential values include 0.0, but never 1.0.

**01:54**
Under the hood, `random`

uses an algorithm known as Mersenne Twister. While it’s convenient that the `random`

module can seed off of system time, sometimes you will want to repeat a random sequence for testing or demonstration. For this purpose, there is the `random.seed()`

method.

**02:12**
Simply pass this method an integer argument and `random`

will use that as the seed. As a side note, you may also pass the `seed()`

method a string, bytes, or `bytearray`

, and those values will be first converted to an integer before seeding.

**02:30**
In this example, you’ll see the effect of explicitly seeding `random`

.

**02:37**
It provides us a way to duplicate the same random number generation, which is a handy tool for testing. In addition to seeding, we can capture `random`

’s state at any time with the `getstate()`

method.

**02:51**
This returns a tuple that we can then pass to a companion `setstate()`

method to duplicate the generation from that moment.

**03:01**
Because simulation is such a common implementation of pseudo-random generation, it’s important to talk about its application in data science, and in particular, the NumPy package.

**03:14**
This video is only going to cover a few of the functions available in NumPy. NumPy is huge and powerful, and there are many courses already available on the web that cover NumPy in depth. You’ll find some of these in Real Python. So for now, we’re just going to stick to some basics, just to give you a sampling of some of the random features in this package.

**03:37**
NumPy actually contains its own `random`

module. Where the standard `random`

module provided us an easy way of generating random scalar values, NumPy’s `random`

implementation is more geared towards random series of data.

**03:51**
Let’s go ahead, import it, and get to work. Here, we’re using a Jupyter Notebook to demonstrate some basic `numpy.random`

methods. We first import `numpy`

with the alias `np`

. Right away, you should notice that `numpy.random`

duplicates many of the same methods you’ve seen in the standard `random`

module.

**04:10**
These include `random()`

, `randint()`

, `seed()`

, and others, and they mostly function the same. Both the `random()`

and `seed()`

methods work in a similar way to standard `random`

.

**04:21**
It appears `randint()`

also works similarly, but that’s not entirely true and we’ll talk about the differences a little bit later.

**04:33**
Here, you see we can rerun our `random.seed()`

cell and reset our original `randint()`

result of `67`

. For sequences, we also have a similar `choice()`

method.

**04:46**
In NumPy, there is no `choices()`

method, however, with an `s`

. And `numpy.random`

’s `sample()`

is very different.

**04:53**
If you pass a sequence argument to `sample()`

, it’s read as the size for a multi-dimensional array.

**05:01**
And, we also have a `shuffle()`

method.

**05:05**
In this next code, we’re running `randint()`

to simulate the roll of a die. This is to illustrate some differences from the standard `randint()`

, the first being that the upper bounds is exclusive, requiring us to have 6 plus 1, or `7`

, as our upper bounds in order for the `6`

to be included in our possibilities.

**05:23**
The second difference is we can pass a third argument to get an array with that number of elements—in this case, `100`

rolls. Repeatedly rolling a die would result in a uniform distribution of values between `1`

and `6`

, meaning each number is equally likely to appear.

**05:39**
There is an `np.random.uniform()`

method we could use with the same arguments, but it produces floats. When it comes to rolling two dice, that will look more like a normal distribution or bell curve. We can see this is true if we create a second die roll and combine them with the first die roll. When added together, the most likely result will be around `7`

and the least likely results will be `2`

and `12`

.

**06:06**
We can see the result graphically with Matplotlib, but it’s better illustrated if we increase our data samples to `5000`

.

**06:21**
Here we can see the curve a little better. That brings us to the `normal()`

method, but like `uniform()`

, it produces floats. It will give us values that if plotted would resemble a bell curve.

**06:37**
In the standard `random`

, we do have a `normalvariate()`

method. It requires mean and standard deviation arguments and it only returns one value.

**06:47**
Another method, `randn()`

, gives us a normal distribution in the shape we specify for the arguments. Just one more illustration. We know some factors grow or decrease relative to other factors.

**06:59**
This is known as correlation. NumPy can build correlated data given a mathematical covariance. This function here will get that for us. Let’s suppose we have a correlation matrix with 1, 0.9 and 0.9, 1.0.

**07:15**
This means we have a strong correlation. Let’s suppose we’re talking about age as one dataset and percentage of gray hair as the second dataset. As age grows, so does the chance that the percentage will increase.

**07:28**
My numbers might be off from real life, but bear with me. You can see that our ages and percentages are floats and some of the gray hair percentages are negative, but that’s more because I couldn’t think of a good example. You see, however, that the older people in this cross section of data do have higher percentages of gray hair. If we scatter plot these points, we see the diagonal trend that suggests our correlation between age and gray hair.

**07:55**
Let’s wrap this up by comparing some of standard `random`

’s features alongside the corresponding features in `numpy.random`

. Finally, remember, if you only need a single random value or even a small sequence, the standard library’s `random`

is usually the faster and better option.

**08:13**
NumPy is specialized for building large, multi-dimensional arrays. You’ve now seen the benefit of pseudo-randomness along with situations where you might want to repeat your random data generation.

**08:27**
This feature makes the PRNGs, or pseudo-random number generators, like the `random`

module great for simulation, but not so great for security. In the next video, you’ll know why.

**Pygator** on Sept. 2, 2019

What is that run cell button on the side? Don’t think that’s with my notebooks. Is that an extension feature?

Become a Member to join the conversation.

Chaitanyaon June 29, 2019comparision between standard random and numpy random is not explained in a detailed way, the correlation example is also not clear