A Brief Introduction to Cryptography
00:19 Cryptography is the act of using codes or ciphers to protect secrets. Code is a word or phrase substitution. In the old movie The Shadow, when one character says to the other, “The sun is shining, but the ice is slippery,” meaning that he’s ready to start the action—that’s a code phrase.
00:38 A cipher, by contrast, is something that does letter substitution or substitution on groups of letters. For example, any time in your message the letter A comes up you replace it with the letter Z, the letter B comes up you replace it with the letter Y, the letter C comes up you replace it with the letter X, et cetera—this is a cipher. Substitution ciphers like this have been in use as early as 1500 BC, and although the usage far predates Julius Caesar, these are often called Caesar ciphers.
01:10 They’re called that because Caesar used them to encrypt his personal letters. The Caesar cipher is a letter to letter cipher. This is a monoalphabetic cipher—one letter replaces another. Monoalphabetic ciphers are vulnerable to frequency analysis.
01:36 If you look at enciphered text that is long enough, you can start counting up the letters. The most common letter was probably an E. Using this kind of frequency analysis, you can break the cipher. Polyalphabetic ciphers, or ciphers where multiple letters replace multiple letters or single letters replace groupings of letters, started to appear in the 1400s.
02:09 This was a telegram between Germany and Mexico during World War I telling Mexico that Germany would support it if they attacked the United States to try to take back some of the southern states.
02:31 You’ve probably seen this mentioned in movies and books. This was a mechanical device that did encoding and decoding of letter ciphers. The Germans used it to communicate with their subs. Through a variety of mechanisms, including frequency analysis, the British had broken this code, unknown to the Germans.
03:07 The reason I say this became public knowledge in 1976 is it’s known that GCHQ, the British coding institute, had developed a very similar mechanism seven years prior to that, but—them being a spy agency—they didn’t tell anyone. Modern ciphering mechanisms are based on mathematical transformations of data using keys. Generally speaking, the larger the key, the harder it is to attack the algorithm. In the 70s and 80s, DES was one of the standard algorithms.
03:35 It was used in the pin pads for bank machines. Because it was only 56 bits long, once computers got faster, this started to become easy to break. It was replaced by Triple-DES and eventually, even that became worrisome, and AES is now one of the modern encryption standards. RSA, which used to be inside of TLS, uses keys which are typically two kilobits long.
04:01 A symmetric cipher is one—if you and I are keeping a secret—where we have a shared key. The problem with this is now, we have to manage those keys. If we wish to share our secrets with somebody else, we have to give them the key.
04:14 If we wish to stop sharing secrets with someone else, we have to change our keys. This can all be problematic. Symmetric ciphers can be immune to a frequency analysis if the length of the key is same as the message length and the keys are not reused.
04:40 Reuse of keys can be problematic, though, because that means from message to message you can start looking for patterns between the key’s use. If you don’t want to reuse your keys, you have a real key management problem, because every single time you send a message, you need a new key. Enter asymmetric encryption. Asymmetric encryption provides a
05:01 way of sharing your keys publicly. This is typically done with a pair of public and private keys. The public key, which is available to everyone, can be used by Alice to encrypt a message that only Bob can decrypt using his private key.
05:29 An example of this is multiplication and factoring. It’s easy to multiply two large prime numbers together, but very hard to factor them out and determine the original primes. In RSA, this pairing of primes is used to generate the public and private keys.
The magic number
97 here is the ASCII value for the little letter
'z'. I’m looping through the entire small-letter alphabet and mapping the character value,
97, which is
'a', to the character value following it,
And there’s the result. Using a list comprehension, I can enumerate through each letter in the word I want to encode and map it using the dictionary. In this case, I’m, enciphering the word
'g's and three
'o's. That same repetition shows up in the enciphered text. This kind of pattern is what leads to frequency analysis breaking codes. In order to decipher this data, I’m going to build another dictionary comprehension—this time, swapping the
joining it, and there you go—back to
'gorgonzola'. You may have heard Python referred to as a batteries-included language. Well, you don’t have to do all this by hand. It’s built-in. The
string library has a translate mechanism that does essentially the same thing.
First off, create the cipher using the
maketrans() function. This takes two arguments. The first is the letters that you’re trying to translate, and the second—the letters that you’re trying to translate them to. In this case, I’ve put the alphabet offset by one letter.
translate() function, passing in this cipher, gives the same result as what was done before using our custom dictionary. One limitation of the
translate() function is if you pass in something that isn’t translated, you get the same result. In
"Hello World!", here, The capital
'H', the capital
'W', the space, and the exclamation mark are not part of the Caesar cipher. They don’t get translated.
There is no capital
"H" in the cipher, so you get a
KeyError from the cipher dictionary. If you’re going to use the
translate() features of the
string library, you need to make sure that you’ve got coverage of all the letters that you’re going to include inside of your strings. Alternatively, you can use the
codecs library to accomplish something similar.
10:06 The table on the right-hand side is the letter frequency of the English language. E is the most common letter and shows up 11% of the time. By generating a similar table for this text, you can start making guesses as to how things work.
10:47 Going back to the frequency table, you can see that G and V are the next two most common letters, which map to probably T or probably A. That can give you a fairly strong feeling that this is the word The, or Are. Looking at the middle letter that maps to U, that probably maps to H. That removes R, gives us The, and all of a sudden, a lot more of this message is being filled in.
11:14 Similar reasoning can be applied to this word. rnpu maps to something that starts with E and ends in H. A quick look at a crossword dictionary will tell you that there really is only one word in English that has this pattern.
11:29 There’s three, but I had to look up what the other two even meant—they’re not common. The common usage is the word each. That tells you that the N maps to an P and the P maps to a C. Even more information is now available on the original message.
11:46 It’s starting to look like something that you could read. Another common word is showing up in this text that is capitalized and begins with Cae. Given the rules of English, this is probably a proper noun. Seeing as the topic has been ciphers and Caesar ciphers, it’s probably a reasonable guess to assume that this is the word Caesar.
12:07 That sounds like I might be cheating, but this has actually happened in history. In World War II, telegrams encoded by the Nazis using the Enigma device almost always signed off with HH—short for Heil Hitler—and then the name of the place that the telegram was being sent from. Using this information, the Brits saw the HH and the name, and because they knew where it was coming from—because they captured the information from that location—they had a whole bunch of context about the message. Using that same kind of context, I can guess that this word is Caesar. In the bottom corner here, I have another capitalized letter with some repetition in it.
12:49 If you think about it for a minute, Julius Caesar? Well, that gives you a lot of extra letters, and you’ve got most of the message. Similar techniques can be used to finish it off and fully decipher the code.
Not only have I figured out what the message says, but I now can create an exact mapping to decode any future messages that use this exact same key. This particular text is the first paragraph of the Wikipedia page on the Caesar cipher. The first attempt at cryptography using ciphers probably isn’t strong enough. Next up, I’m going to show you block substitution and how to use the Python
cryptography library to get much stronger encoding.
Become a Member to join the conversation.