A Brief Introduction to Cryptography
00:00 In the previous lesson, I showed you how to write a simple Flask application server and use it on an obscure port to try and hide a message for your friends.
00:09 It was easy to show that this isn’t really secure. In this lesson, I’m going to introduce you to cryptography: a way of securing these kinds of messages.
00:19 Cryptography is the act of using codes or ciphers to protect secrets. Code is a word or phrase substitution. In the old movie The Shadow, when one character says to the other, “The sun is shining, but the ice is slippery,” meaning that he’s ready to start the action—that’s a code phrase.
00:38 A cipher, by contrast, is something that does letter substitution or substitution on groups of letters. For example, any time in your message the letter A comes up you replace it with the letter Z, the letter B comes up you replace it with the letter Y, the letter C comes up you replace it with the letter X, et cetera—this is a cipher. Substitution ciphers like this have been in use as early as 1500 BC, and although the usage far predates Julius Caesar, these are often called Caesar ciphers.
01:10 They’re called that because Caesar used them to encrypt his personal letters. The Caesar cipher is a letter to letter cipher. This is a monoalphabetic cipher—one letter replaces another. Monoalphabetic ciphers are vulnerable to frequency analysis.
01:26 This means, by looking at how letters are used, you can take a pretty good guess as to what’s been replaced. In the English language, the letter E is extremely common.
01:36 If you look at enciphered text that is long enough, you can start counting up the letters. The most common letter was probably an E. Using this kind of frequency analysis, you can break the cipher. Polyalphabetic ciphers, or ciphers where multiple letters replace multiple letters or single letters replace groupings of letters, started to appear in the 1400s.
01:58 These are harder to crack than monoalphabetic ciphers, but still are vulnerable to frequency analysis. A very famous example of a broken cipher is the Zimmerman Telegram.
02:09 This was a telegram between Germany and Mexico during World War I telling Mexico that Germany would support it if they attacked the United States to try to take back some of the southern states.
02:21 The British had broken the cipher and shared the results with the United States. This prompted the USA to enter World War I. Another famous example is the Enigma machine.
02:31 You’ve probably seen this mentioned in movies and books. This was a mechanical device that did encoding and decoding of letter ciphers. The Germans used it to communicate with their subs. Through a variety of mechanisms, including frequency analysis, the British had broken this code, unknown to the Germans.
02:49 It is suspected that the British having this code information shortened World War II by up to two or three years.
02:57 Public key—or asymmetric key—algorithms became public knowledge in 1976. Diffie and Hellman created the concept known as the Diffie-Hellman key exchange.
03:07 The reason I say this became public knowledge in 1976 is it’s known that GCHQ, the British coding institute, had developed a very similar mechanism seven years prior to that, but—them being a spy agency—they didn’t tell anyone. Modern ciphering mechanisms are based on mathematical transformations of data using keys. Generally speaking, the larger the key, the harder it is to attack the algorithm. In the 70s and 80s, DES was one of the standard algorithms.
03:35 It was used in the pin pads for bank machines. Because it was only 56 bits long, once computers got faster, this started to become easy to break. It was replaced by Triple-DES and eventually, even that became worrisome, and AES is now one of the modern encryption standards. RSA, which used to be inside of TLS, uses keys which are typically two kilobits long.
04:01 A symmetric cipher is one—if you and I are keeping a secret—where we have a shared key. The problem with this is now, we have to manage those keys. If we wish to share our secrets with somebody else, we have to give them the key.
04:14 If we wish to stop sharing secrets with someone else, we have to change our keys. This can all be problematic. Symmetric ciphers can be immune to a frequency analysis if the length of the key is same as the message length and the keys are not reused.
04:30 The first aspect means that the repetition of the letter E in the message that I used in the example before doesn’t matter because the letter doesn’t routinely map to the same thing.
04:40 Reuse of keys can be problematic, though, because that means from message to message you can start looking for patterns between the key’s use. If you don’t want to reuse your keys, you have a real key management problem, because every single time you send a message, you need a new key. Enter asymmetric encryption. Asymmetric encryption provides a
05:01 way of sharing your keys publicly. This is typically done with a pair of public and private keys. The public key, which is available to everyone, can be used by Alice to encrypt a message that only Bob can decrypt using his private key.
05:17 Hence, asymmetric. The math behind this kind of encryption is something called a one-way function. A one-way function is something that’s easy to compute in one direction, but hard to undo.
05:29 An example of this is multiplication and factoring. It’s easy to multiply two large prime numbers together, but very hard to factor them out and determine the original primes. In RSA, this pairing of primes is used to generate the public and private keys.
05:48 Enough history. Let’s write some code. First off, I’m going to use a dictionary comprehension to build a cipher.
05:57
The magic number 97
here is the ASCII value for the little letter 'a'
. 123
is 'z'
. I’m looping through the entire small-letter alphabet and mapping the character value, 97
, which is 'a'
, to the character value following it, 'b'
.
06:16
So, I’m mapping from 'a'
to 'b'
, 'b'
to 'c'
, 'd'
to 'e'
, et cetera.
06:21
You can see that here. In order to keep this cipher purely alphabetic, I can replace 'z'
with 'a'
.
06:33
And there’s the result. Using a list comprehension, I can enumerate through each letter in the word I want to encode and map it using the dictionary. In this case, I’m, enciphering the word "gorgonzola"
.
06:46
Everybody loves cheese! Rejoining the list into a single string, and here’s the output. Remember what I said about frequency analysis. "gorgonzola"
has a fair amount of repetition in it.
06:58
There’s two 'g'
s and three 'o'
s. That same repetition shows up in the enciphered text. This kind of pattern is what leads to frequency analysis breaking codes. In order to decipher this data, I’m going to build another dictionary comprehension—this time, swapping the key
and value
for value
and key
.
07:21
This creates a new dictionary where 'b'
maps to 'a'
, 'c'
maps to 'b'
, et cetera, undoing the cipher. Running the same list comprehension, using the encodable phrase and the decipher
dictionary,
07:37
joining it, and there you go—back to 'gorgonzola'
. You may have heard Python referred to as a batteries-included language. Well, you don’t have to do all this by hand. It’s built-in. The string
library has a translate mechanism that does essentially the same thing.
07:57
First off, create the cipher using the maketrans()
function. This takes two arguments. The first is the letters that you’re trying to translate, and the second—the letters that you’re trying to translate them to. In this case, I’ve put the alphabet offset by one letter.
08:13 This is the same cipher as I just created by hand.
08:17
maketrans()
actually stores the ASCII numbers inside of the dictionary, rather than the values themselves. You can see 97
maps to 98
, which is ASCII 'a'
mapping to ASCII 'b'
.
08:34
Calling the translate()
function, passing in this cipher, gives the same result as what was done before using our custom dictionary. One limitation of the translate()
function is if you pass in something that isn’t translated, you get the same result. In "Hello World!"
, here, The capital 'H'
, the capital 'W'
, the space, and the exclamation mark are not part of the Caesar cipher. They don’t get translated.
08:57 Our custom cipher from before has this problem, but worse: instead of passing it through, you’ll get an exception.
09:10
There is no capital "H"
in the cipher, so you get a KeyError
from the cipher dictionary. If you’re going to use the maketrans()
and translate()
features of the string
library, you need to make sure that you’ve got coverage of all the letters that you’re going to include inside of your strings. Alternatively, you can use the codecs
library to accomplish something similar.
09:32
This library includes something called the "rot_13"
encoding. "rot_13"
is short for rotation 13. This is a Caesar cipher where each letter maps to the letter 13 positions after it.
09:46 The advantage to this is, because it’s half of 26, calling it once encodes and calling the exact same algorithm the second time, decodes.
09:58 Here’s a little bit of frequency analysis in action. I’ve taken some texts and I’ve enciphered it using a Caesar cipher.
10:06 The table on the right-hand side is the letter frequency of the English language. E is the most common letter and shows up 11% of the time. By generating a similar table for this text, you can start making guesses as to how things work.
10:22 There are 51 R’s in this text and 32 G’s. The R’s are so frequent, it’s probably the E. Let’s assume that and fill it in.
10:35 Looking at what’s been discovered, there’s a fair number of three-letter words in here that end in e. Gur also is
10:43 something that can be used to start a sentence.
10:47 Going back to the frequency table, you can see that G and V are the next two most common letters, which map to probably T or probably A. That can give you a fairly strong feeling that this is the word The, or Are. Looking at the middle letter that maps to U, that probably maps to H. That removes R, gives us The, and all of a sudden, a lot more of this message is being filled in.
11:14 Similar reasoning can be applied to this word. rnpu maps to something that starts with E and ends in H. A quick look at a crossword dictionary will tell you that there really is only one word in English that has this pattern.
11:29 There’s three, but I had to look up what the other two even meant—they’re not common. The common usage is the word each. That tells you that the N maps to an P and the P maps to a C. Even more information is now available on the original message.
11:46 It’s starting to look like something that you could read. Another common word is showing up in this text that is capitalized and begins with Cae. Given the rules of English, this is probably a proper noun. Seeing as the topic has been ciphers and Caesar ciphers, it’s probably a reasonable guess to assume that this is the word Caesar.
12:07 That sounds like I might be cheating, but this has actually happened in history. In World War II, telegrams encoded by the Nazis using the Enigma device almost always signed off with HH—short for Heil Hitler—and then the name of the place that the telegram was being sent from. Using this information, the Brits saw the HH and the name, and because they knew where it was coming from—because they captured the information from that location—they had a whole bunch of context about the message. Using that same kind of context, I can guess that this word is Caesar. In the bottom corner here, I have another capitalized letter with some repetition in it.
12:49 If you think about it for a minute, Julius Caesar? Well, that gives you a lot of extra letters, and you’ve got most of the message. Similar techniques can be used to finish it off and fully decipher the code.
13:03
Not only have I figured out what the message says, but I now can create an exact mapping to decode any future messages that use this exact same key. This particular text is the first paragraph of the Wikipedia page on the Caesar cipher. The first attempt at cryptography using ciphers probably isn’t strong enough. Next up, I’m going to show you block substitution and how to use the Python cryptography
library to get much stronger encoding.
Become a Member to join the conversation.