Welcome to video 3 of Generating Random Data in Python. In the last video, you saw how Python and NumPy’s
random modules could prove useful in simulation and modeling. They are known as pseudo random number generators.
This same feature, however, makes them poor candidates for security.
How does randomness help us secure data? The short answer is that it helps us hide it. If you think about the children’s game hide and seek, to the seeker, the other children have chosen seemingly random places to hide.
In a similar way, random numbers can help us hide data using encryption. Encryption is used to secure data in transit, such as on a network, or at rest, such as on an encrypted hard drive.
Encryption uses an algorithm along with a key to sort of scramble the original data. The key is oftentimes randomly generated.
While there are many examples of encryption in computing, we’ll use the example of a secure web transaction, which involves both symmetric and asymmetric cryptography:
- In symmetric key cryptography, the sender uses a key to encrypt data, and the receiver uses that same key to decrypt data.
- In asymmetric key cryptography the sender and receiver use different keys, but I’we’ll cover that a little later.
We’re going to use a basic example, so if you’re familiar with cryptography’s role in computing, then you could consider skipping this next part.
This example is going to be pretty elementary, but let’s say the data you need to secure is simply the number 2. The encryption key that you’ll use to encrypt the data is the number 3. Your encryption algorithm, let’s say, is simple addition.
We apply the algorithm and the key, and our encrypted result is the number 5. The receiver, to decrypt the 5, needs to know two things:
- The algorithm you used for the encryption
- The encryption key
Without knowing those two things, the receiver can’t make any sense out of the number 5 and has no way to know the original data.
In the real world, it’s typical to know the encryption algorithms that are widely known, but the key needs to be kept secure.
So in this basic example, the receiver (and possible attackers) would already know that subtraction will decrypt the data, but only the receiver should know the number 3 is the missing piece. By knowing that the key is 3, the receiver can deduce the original data as being 2.
So given that basic explanation, why is randomness important? Because the choice of 3 was deliberate.
Let’s say you want to complete a secure transaction on the web with ABC company. To scramble your communication, you need to share a secret, random encryption key (like that number 3) so your traffic is safe. You can see the conundrum that presents itself: how do you establish and share that key without it being exposed?
This is where asymmetric cryptography comes in. Asymmetric cryptography algorithms are able to generate key pairs where one key encrypts while the other decrypts. The same key can not be used to do both.
They can only operate as a pair. One is considered public, and one is considered private. In the public key infrastructure (PKI for short), ABC company would generate a pair of keys, keep one key private and secret, and provide the world with the other key. A certificate authority, or CA, would be like a notary to say, “Yep, this is genuinely ABC’s public key.”
This is how the public keys work. If you want to encrypt data for only ABC company to see, then you encrypt it with ABC’s public key. Then only ABC can decrypt that data because they hold the only key that can do so, the companion private key, and this is how you can share a random symmetric key for the rest of the session.
In a reverse application, if ABC wanted to make a statement or a digital commitment publicly, then they could encrypt the digital contract with their private key. The fact that anyone in the world can use ABC’s public key to decrypt it means that ABC is the true originator of the data. This, along with hashing, facilitates digital signatures.
As mentioned earlier, secure web transactions are just one application of encryption. Randomness is also important in establishing secure wireless communication, generating nonces, one-time pads, and so on.
For a random number to be useful in security, it needs to be what we call cryptographically secure. The two factors that give a random number this distinction are entropy and size.
Entropy is basically a factor in the seed that is based on some external randomness that can’t be easily guessed, such as system state, and nature is often an excellent source for entropy. The size of the random number needs to be sufficiently large that an attacker could not easily deduce the pattern.
Let’s look at another simplified example.
Data are stored and transmitted as bits. Let’s say I want to encrypt the letter
h. To keep things simple, let’s use its ASCII equivalent.
A repeating encryption key of one bit would have to be a
1. In real life, my encryption algorithm would be more complex, but to keep it simple, let’s say it’s the XOR operation. A zero would produce the original data, and a one would produce a mirror image. With one bit, there is no encryption.
If we increased the key to 8 bits (or one byte), then our keys could be any value ranging from
255. If we picked the key, 85 in binary, we’d get a slightly more difficult encryption to crack. But it would only take a max of 256 iterations to figure it out. You can see where this is going.
It turns out that if you have 32 bytes of entropy, then you’re cryptographically secure. That’s 2 to the 256 number of bits, which is a 70-some digit number in decimal. That number of iterations in a reasonable amount of time is out of reach for most computers today.
We’ve established that the
random module, as a PRNG, is not cryptographically secure. What we need is a cryptographically secure pseudo-random number generator, or CSPRNG. How do we achieve the necessary 32 bytes of entropy using Python? One answer is coming up in the next video.