# Cryptographically Secure Random Data in Python

Copied!
Happy Pythoning!

Welcome to video 4 in Generating Random Data in Python. In the previous video, we learned why, in secure applications, it’s important to generate a random number in a cryptographically secure way through entropy. But how do we effectively incorporate entropy in code?

## `os.urandom()`

There are two standard library modules in Python, `secrets` and `uuid`, that provide us with the necessary entropy to generate cryptographically secure random numbers. Both modules get entropy from your operating system, through the `os` module’s `os.urandom()` method. Let’s take a look at this method first:

Python
``````>>> import math, os
>>> sec = os.urandom(32)
>>> sec = int.from_bytes(sec,sys.byteorder)
>>> len(str(sec))
>>> math.log10(sec)
``````
Copied!

After importing the necessary modules, we invoke `os.urandom()`, passing it the size we need, and it returns a value of type `bytes`. To make this easier to look at, let’s convert it to an integer where we can see that its length is `77` digits, a sufficiently sized random number.

## `random.secrets()`

As of Python 3.6, we have `secrets`, a short page of code that’s basically a wrapper around `os.urandom()`.

`secrets` is from PEP 506, which was introduced to more or less protect developers from themselves. In other words, developers who didn’t thoroughly read the documentation and used the `random` module for secure applications. If this describes you, then don’t be too embarrassed. A quick Google search will you show you that there are many others in this camp.

From now on, however, you don’t have an excuse. Use `secrets` instead.

`secrets` exports a handful of functions for generating random numbers, bytes, and strings. Let’s look at some examples.

After importing `secrets` and specifying a size, we can generate secure tokens from our system random in bytes, hex, or strings. We also have the familiar `choice()` method:

Python
``````>>> import secrets
>>> n = 16
>>> secrets.token_bytes(n)
b'\xe7\x04S\x9fv\xe9\xc4\x90\xf1\xb8\x98X\xb2\x0f\xf3B'
>>> secrets.token_hex(n)
'065c070f35c8ea534c3fc0f0d6c3e8d6'
>>> secrets.token_urlsafe(n)
'qNElCiWqsg_psF_mYeRnEw'
>>> secrets.choice('abcde')
'b'
``````
Copied!

Let’s see `secrets` in action with a URL shortener application. The real-life versions of these are a bit more involved, but ours will be pretty simple to demonstrate the operation of `token_urlsafe()`. This method, as its name suggests, returns a string that is URL safe in the number of bytes requested. We’ve incorporated this method in our `shorten()` function, where we keep track of our URL mappings in a global `DATABASE` variable.

In our main program, we pass a couple URL strings, and we print out our returned result for each, along with their database entries.

The reason we’re getting 7-character strings back when specifying 5 bytes is that `token_urlsafe` uses base64 encoding where each character is 6 bits, and our result will be the ceiling of 8 * 5 bytes 6.

shortly.py:

Python
``````from secrets import token_urlsafe

DATABASE = {}

def shorten(url: str, nbytes: int=5) -> str:
ext = token_urlsafe(nbytes=nbytes)
if ext in DATABASE:
return shorten(url, nbytes=nbytes)
else:
DATABASE.update({ext: url})
return f'short.ly/{ext}
``````
Copied!

In this example, we’re passing a string to the `shorten()` function to generate a random token for the URL to map to. If it exists, we rerun until it’s unique. We specify 5 bytes as the default length. Here’s the implementation:

Python
``````>>> urls = (
...     'https://realpython.com/',
...     'https://docs.python.org/3/library/secrets.html'
... )

>>> for u in urls:
...     print(shorten(u))
short.ly/p_Z4fLI
short.ly/fuxSyNY

>>> DATABASE
{'p_Z4fLI': 'https://realpython.com/',
'fuxSyNY': 'https://docs.python.org/3/howto/regex.html'}
``````
Copied!

## `random.uuid()`

The second module from the standard library, mentioned earlier, is `uuid`. UUID stands for universally unique identifier. A UUID is 128 bits or 16 bytes or 32 hex digits. In the `uuid` module, there’s a method, `uuid4()`.

There are others, ending in 1, 3, and 5 but those variations take input, such as your machine name, whereas `uuid4()` uses system random, so it’s the one that’s secure. Let’s see how `uuid4()` works:

Python
``````>>> import uuid
>>> tok = uuid.uuid4()
``````
Copied!

Notice that the `uuid4()` method doesn’t return a string, but rather a class. This offers some convenience as the class instance has the attributes hex, int, and bytes

If you’re wondering about collisions (another word for generating duplicates), the chances are super small: one in 2^128, improbable enough to be considered secure.

## `SystemRandom`

If you’ve been following along by looking at the standard library documentation for the modules we’ve been working with, then you may have noticed that the random module does provide a `SystemRandom` class that uses `os.urandom()`.

You might be wondering why Python’s `random` module wouldn’t simply default to using the safer, more secure system random. First, as we noted, it’s often necessary to reproduce test or modeling data. Second, implementing crypographically secure random tends to be slower.

## Hashing

Sometimes, there’s confusion about whether hashing involves randomness. In short, it does not. It’s an algorithm that produces a one-way, fixed-size string from a given input. A hash function will always produce the same string if given the same input. Its value is that it’s not reversible and can be used to verify digital integrity.

Some applications store hashes of user passwords so they can avoid storing plaintext passwords. The user types in their password, and then the app hashes it and compares the hash to the database.

A single hash cycle of the password is not secure enough for user passwords because it’s trivial to generate what’s known as a rainbow table, which is a sort of lookup guide for common words and their hashed equivalents. To safeguard against this, it’s common for systems to repeat or salt the hash. Salting the hash means adding some extra data to the original before it is hashed.

Sometimes that salt is generated randomly, but otherwise, hashing and randomness are otherwise not related.

## Recap

We’ve covered a lot of ground in this video series, so let’s take a moment to recap.

We started with the `random` module and many of its most useful methods and operations.

We then took a look at NumPy’s version of `random` and how it can be useful in basic data science applications.

Finally, we wrapped up with cryptographically secure Python in the form of `secrets`, which wraps `os.urandom()`, our system entropy, and another module, `uuid`, which uses that same entropy to generate unique IDs.

I hope you found the video series useful. If you have feedback or questions, please let us know in the comments below. Thanks for watching.

Justin Cletus

Hi @Jackie, It is great tutorial about learning python random module with other useful libraries.

Cody Roche

Hey, one odd thing I found on Windows was I needed to import sys for this to work. Not hard to figure out and I’m not sure if it’s Windows over version specific but figured you’d want to know.

Cody Roche

This was a great overview of random generation in Python. The examples in the standard library were great and full featured. I felt like I came away from them with a much deeper understanding of how they work!

that said, I feel like the coverage of secrets and uuid was much more shallow. Enough to get the basics, which is a great foundation. It left me wanting a follow-up that delves into more detail of cryptographicly secure randoms though.

Any chance there’ll be a part 2 covering that?

Jackie Wilson RP Team

Thanks for your feedback, Cody, and the information on the Windows issue. I’m making a note of your feedback… there has been discussion on developing more videos in the networking and security areas. This particular video series was intended as a complement to this written tutorial, realpython.com/python-random/ . You may be able to find some missing pieces there. Thanks again for your comments!

Cody Roche

Funny, I was just starting that video and guide tonight. I also found the Cryptography and PyNaCl packages on PyPi. If I’m looking to incorporate basic cryptography in my code I’m guessing using a package makes more sense than coding the crypto from the ground up, since just building the crypto functions into a module and getting them working right looks like it could take a while.

lironhayman

Cody is right, you need:

``````import sys
``````

lironhayman

Also I think this should say: 8 * 5 bytes / 6.

Great article!

carykinsfather

Pygator

You have a very soothing voice. Good presentation of most topics and explaining where the functions come from in the various modules. I’ll read the tutorial next to dive more into secrets!

Jackie Wilson RP Team

Thanks Pygator! RP has an excellent audio person :)

Marco Belo

I would suggest to change the course name, it’s focus a lot in security and keys/hash… When I saw it I thought that it would be a course explaining some library like Faker or model-mommy.

Very good presentation and nice voice 😊😊

mikesult

Thank you Jackie for this valuable information. This series on random is rich in new material for me and certainly worth multiple viewings.

Ghani

Great tutorial; thank you so much!