Cryptographically Secure Random Data in Python
Welcome to video 4 in Generating Random Data in Python. In the previous video, we learned why, in secure applications, it’s important to generate a random number in a cryptographically secure way through entropy. But how do we effectively incorporate entropy in code?
os.urandom()
There are two standard library modules in Python, secrets
and uuid
, that provide us with the necessary entropy to generate cryptographically secure random numbers. Both modules get entropy from your operating system, through the os
module’s os.urandom()
method. Let’s take a look at this method first:
>>> import math, os
>>> sec = os.urandom(32)
>>> sec = int.from_bytes(sec,sys.byteorder)
>>> len(str(sec))
>>> math.log10(sec)
After importing the necessary modules, we invoke os.urandom()
, passing it the size we need, and it returns a value of type bytes
. To make this easier to look at, let’s convert it to an integer where we can see that its length is 77
digits, a sufficiently sized random number.
random.secrets()
As of Python 3.6, we have secrets
, a short page of code that’s basically a wrapper around os.urandom()
.
secrets
is from PEP 506, which was introduced to more or less protect developers from themselves. In other words, developers who didn’t thoroughly read the documentation and used the random
module for secure applications. If this describes you, then don’t be too embarrassed. A quick Google search will you show you that there are many others in this camp.
From now on, however, you don’t have an excuse. Use secrets
instead.
secrets
exports a handful of functions for generating random numbers, bytes, and strings. Let’s look at some examples.
After importing secrets
and specifying a size, we can generate secure tokens from our system random in bytes, hex, or strings. We also have the familiar choice()
method:
>>> import secrets
>>> n = 16
>>> secrets.token_bytes(n)
b'\xe7\x04S\x9fv\xe9\xc4\x90\xf1\xb8\x98X\xb2\x0f\xf3B'
>>> secrets.token_hex(n)
'065c070f35c8ea534c3fc0f0d6c3e8d6'
>>> secrets.token_urlsafe(n)
'qNElCiWqsg_psF_mYeRnEw'
>>> secrets.choice('abcde')
'b'
Let’s see secrets
in action with a URL shortener application. The real-life versions of these are a bit more involved, but ours will be pretty simple to demonstrate the operation of token_urlsafe()
. This method, as its name suggests, returns a string that is URL safe in the number of bytes requested. We’ve incorporated this method in our shorten()
function, where we keep track of our URL mappings in a global DATABASE
variable.
In our main program, we pass a couple URL strings, and we print out our returned result for each, along with their database entries.
The reason we’re getting 7-character strings back when specifying 5 bytes is that token_urlsafe
uses base64 encoding where each character is 6 bits, and our result will be the ceiling of 8 * 5 bytes 6.
shortly.py:
from secrets import token_urlsafe
DATABASE = {}
def shorten(url: str, nbytes: int=5) -> str:
ext = token_urlsafe(nbytes=nbytes)
if ext in DATABASE:
return shorten(url, nbytes=nbytes)
else:
DATABASE.update({ext: url})
return f'short.ly/{ext}
In this example, we’re passing a string to the shorten()
function to generate a random token for the URL to map to. If it exists, we rerun until it’s unique. We specify 5 bytes as the default length. Here’s the implementation:
>>> urls = (
... 'https://realpython.com/',
... 'https://docs.python.org/3/library/secrets.html'
... )
>>> for u in urls:
... print(shorten(u))
short.ly/p_Z4fLI
short.ly/fuxSyNY
>>> DATABASE
{'p_Z4fLI': 'https://realpython.com/',
'fuxSyNY': 'https://docs.python.org/3/howto/regex.html'}
random.uuid()
The second module from the standard library, mentioned earlier, is uuid
. UUID stands for universally unique identifier. A UUID is 128 bits or 16 bytes or 32 hex digits. In the uuid
module, there’s a method, uuid4()
.
There are others, ending in 1, 3, and 5 but those variations take input, such as your machine name, whereas uuid4()
uses system random, so it’s the one that’s secure. Let’s see how uuid4()
works:
>>> import uuid
>>> tok = uuid.uuid4()
Notice that the uuid4()
method doesn’t return a string, but rather a class. This offers some convenience as the class instance has the attributes hex, int, and bytes
If you’re wondering about collisions (another word for generating duplicates), the chances are super small: one in 2^128, improbable enough to be considered secure.
SystemRandom
If you’ve been following along by looking at the standard library documentation for the modules we’ve been working with, then you may have noticed that the random module does provide a SystemRandom
class that uses os.urandom()
.
You might be wondering why Python’s random
module wouldn’t simply default to using the safer, more secure system random. First, as we noted, it’s often necessary to reproduce test or modeling data. Second, implementing crypographically secure random tends to be slower.
Hashing
Sometimes, there’s confusion about whether hashing involves randomness. In short, it does not. It’s an algorithm that produces a one-way, fixed-size string from a given input. A hash function will always produce the same string if given the same input. Its value is that it’s not reversible and can be used to verify digital integrity.
Some applications store hashes of user passwords so they can avoid storing plaintext passwords. The user types in their password, and then the app hashes it and compares the hash to the database.
A single hash cycle of the password is not secure enough for user passwords because it’s trivial to generate what’s known as a rainbow table, which is a sort of lookup guide for common words and their hashed equivalents. To safeguard against this, it’s common for systems to repeat or salt the hash. Salting the hash means adding some extra data to the original before it is hashed.
Sometimes that salt is generated randomly, but otherwise, hashing and randomness are otherwise not related.
Recap
We’ve covered a lot of ground in this video series, so let’s take a moment to recap.
We started with the random
module and many of its most useful methods and operations.
We then took a look at NumPy’s version of random
and how it can be useful in basic data science applications.
Finally, we wrapped up with cryptographically secure Python in the form of secrets
, which wraps os.urandom()
, our system entropy, and another module, uuid
, which uses that same entropy to generate unique IDs.
I hope you found the video series useful. If you have feedback or questions, please let us know in the comments below. Thanks for watching.
Congratulations, you made it to the end of the course! What’s your #1 takeaway or favorite thing you learned? How are you going to put your newfound skills to use? Leave a comment in the discussion section and let us know.
00:00 Welcome to video number four in Generating Random Data in Python. In the previous video, we learned why in secure applications it’s important to generate random numbers in a cryptographically secure way through entropy.
00:14 But how do we effectively incorporate entropy through code?
00:20
There are two standard library modules in Python, secrets
and uuid
, that provide us with the necessary entropy to generate cryptographically secure random numbers, and are therefore CSPRNGS.
00:34
Both modules get entropy from the operating system—through the os
module’s os.urandom()
method. So, let’s start off by taking a look at this method.
00:45
After importing the necessary modules, we invoke os.urandom()
, passing it the size we need. It then returns a value of type bytes
, and we can see that here. To make it easier to look at, let’s convert it to an integer.
01:03
Then, we can see that its length is 77 digits—a sufficiently-sized random number. As of Python 3.6, we have secrets
, a short page of code that comprises a module that’s basically a wrapper around os.urandom()
. secrets
is from PEP 506, which was introduced to more or less protect developers from themselves.
01:26
In other words, developers who didn’t thoroughly read the documentation and used the random
module for secure applications. If this describes you, don’t be too embarrassed, as a simple Google search will show you that many others are in this camp. From now on, however, you don’t have an excuse. Use secrets
instead.
01:44
secrets
exports a handful of functions for generating random numbers, bytes, and strings. Let’s look at some examples. After importing secrets
and specifying a size, we can generate secure tokens in bytes, hex, or string format.
01:58
We also have the familiar choice()
method for sequences. Let’s see secrets
in action with a URL shortening application. The real life versions of these apps are a bit more involved, but ours will be pretty simple to demonstrate the operation of token_urlsafe()
.
02:13
This method, as its name suggests, returns a string that is URL-safe in the number of bytes requested. We’ve incorporated this method in our shorten()
function, where we also keep track of our URL mappings in a global DATABASE
variable.
02:28
In the main program, we pass a couple URL strings and we print out the returned result for each, along with their database entries. The reason we’re getting seven character strings back when specifying five bytes is because token_urlsafe()
uses base-64 encoding. The second module from the standard library mentioned earlier is uuid
.
02:48
UUID stands for universally unique identifier. A UUID is 128 bits, or 16 bytes, or 32 hex. Within the uuid
module is a method uuid4()
. There are others ending in 1
, 3
, and 5
, but those variations take input, such as your machine name, whereas uuid4()
uses system random as input, so it’s the one that’s secure.
03:12
Let’s see how uuid4()
works.
03:18
Notice the uuid4()
method doesn’t return a string or digits, but rather a class. This offers some convenience, as the class instance has the attributes .hex
, .int
, and .bytes
. If you’re wondering about collisions, which is another word for generating duplicates, the chances are super small—1 in 2 to the 128th power—and probable enough to be considered secure.
03:43
If you’ve been following along with the standard library documentation for the modules we’ve been working with, you may have noticed the random
module does provide a SystemRandom
class that uses os.urandom()
.
03:56
You might be wondering why Python’s random
module wouldn’t simply default to using the safer, more secure system random. First, as we noted, it’s often necessary to reproduce test or modeling data.
04:10 And second, implementing cryptographically secure random numbers tends to be slower. Now for a word about hashing. Sometimes there’s confusion on whether hashing is random, because it looks random.
04:25 In short, hashing is not random. It’s an algorithm that produces a one-way fixed-size string from a given input. A hash function will always produce the same string if given that same input. Its value is that it’s not reversible and can be used to verify digital integrity.
04:46 Some applications store hashes of user passwords so they can avoid storing plaintext passwords. The user types in the password, the app hashes it and compares the hash to the database.
04:59 A single hashing of the password is not secure enough for user passwords because it’s trivial to generate what’s known as a rainbow table—a sort of lookup guide for common words and their hashed equivalents. To safeguard against this, it’s common for systems to repeat, or salt, the hash. Salting the hash means adding some extra data to the original before it’s hashed.
05:23 Sometimes that salt is generated randomly, but otherwise, hashing and randomness are not related.
05:32
We’ve covered a lot of ground in this video series, so let’s take a moment to recap. We started with the random
module and many of its most useful methods and operations.
05:43
We then took a look at NumPy’s version of random
and how it can be useful in basic data science applications. Finally, we wrapped up with cryptographically secure Python in the form of secrets
, which wraps os.urandom()
, our system entropy, and another module, uuid
, which uses that same entropy to generate unique IDs.
06:09 I hope you found the video series useful and enlightening. If you have feedback or questions, please let us know in the comments below. Thank you so much for watching.
Cody Roche on July 22, 2019
Hey, one odd thing I found on Windows was I needed to import sys for this to work. Not hard to figure out and I’m not sure if it’s Windows over version specific but figured you’d want to know.
Cody Roche on July 22, 2019
This was a great overview of random generation in Python. The examples in the standard library were great and full featured. I felt like I came away from them with a much deeper understanding of how they work!
that said, I feel like the coverage of secrets and uuid was much more shallow. Enough to get the basics, which is a great foundation. It left me wanting a follow-up that delves into more detail of cryptographicly secure randoms though.
Any chance there’ll be a part 2 covering that?
Jackie Wilson RP Team on July 24, 2019
Thanks for your feedback, Cody, and the information on the Windows issue. I’m making a note of your feedback… there has been discussion on developing more videos in the networking and security areas. This particular video series was intended as a complement to this written tutorial, realpython.com/python-random/ . You may be able to find some missing pieces there. Thanks again for your comments!
Cody Roche on July 28, 2019
Funny, I was just starting that video and guide tonight. I also found the Cryptography and PyNaCl packages on PyPi. If I’m looking to incorporate basic cryptography in my code I’m guessing using a package makes more sense than coding the crypto from the ground up, since just building the crypto functions into a module and getting them working right looks like it could take a while.
lironhayman on Aug. 7, 2019
Cody is right, you need:
import sys
lironhayman on Aug. 7, 2019
Also I think this should say: 8 * 5 bytes / 6.
Great article!
carykinsfather on Aug. 10, 2019
More secrets info please!
Pygator on Sept. 2, 2019
You have a very soothing voice. Good presentation of most topics and explaining where the functions come from in the various modules. I’ll read the tutorial next to dive more into secrets!
Jackie Wilson RP Team on Sept. 10, 2019
Thanks Pygator! RP has an excellent audio person :)
Marco Belo on Oct. 28, 2019
I would suggest to change the course name, it’s focus a lot in security and keys/hash… When I saw it I thought that it would be a course explaining some library like Faker or model-mommy.
Ranit Pradhan on April 15, 2020
Very good presentation and nice voice 😊😊
mikesult on June 25, 2020
Thank you Jackie for this valuable information. This series on random is rich in new material for me and certainly worth multiple viewings.
Ghani on Nov. 3, 2020
Great tutorial; thank you so much!
Adam Masiarek on June 13, 2021
i like the SPEED / TERSE approach of this video. thank you - great video / training.
Become a Member to join the conversation.
Justin Cletus on July 6, 2019
Hi @Jackie, It is great tutorial about learning python random module with other useful libraries.