In this lesson, you’ll learn about bits, bytes, octal, and hex notations in Python. To better understand Unicode and UTF-8 encoding, you need to be familiar with binary and hexadecimal numbers. You’ll also learn about three literal forms for binary in Python: binary, octal, and hex literals.
Working in Binary: Bits, Bytes, Oct, and Hex
In the previous lesson I showed you the Python
string module and the ASCII constants inside of it. In order to understand Unicode, you need to understand a little bit more about byte-wise representation, so in this lesson, I’ll be talking about bits, bytes, octal, and hex notations.
00:15 The way you were taught to count as a child was actually in decimal notation, or powers of 10. A number like 1234 breaks down into 1000 + 200 + 30 + 4.
Or, if you come at it from the right-hand side, 4 * 10^0 + 3 * 10^1 + 2 * 10^2 + 1 * 10^3. Now, why did I break it down this way? Well, because if you change the base, the same concept works, but instead of powers of 10, you switch to the powers of the base. In binary, the power is 2. To type a binary number in Python, you can prefix it with
0b. Here’s an example.
0b1001 is the binary number 1001. Like before, you can start on the right-hand side using powers of 2. 2^0, which is 1—in this case, 1 * 1. Then, 2^1—but we’re skipping this digit. 2^2—skipping this digit as well.
01:25 2^3 is 8. So, the two parts of this number that have a 1 are 2^3 and 2^0, which is 8 + 1, giving you 9 in decimal. In computer science, you often run into powers of 2, powers of 8, the normal decimal, and powers of 16. To show the differences here, let’s examine the decimal number 539 represented in the different bases.
01:51 First off, binary—power of 2. 2^0 is included, because there’s a 1. 2^1 is included because there’s a 1. 2^2 is skipped. 2^3, 2^4, and then skip 5 through 8, and 2^9.
02:08 If you sum up the right-hand side, you’ll get the result of 539. The octal representation, or power of 8, uses the numbers 0 through 7. The same principles apply here.
02:21 1 * 8^3 + 3 * 8^1 + 3 * 8^0 gives you a grand total of 539 in decimal. Decimal—old hat, no problem. And finally, hex, or hexadecimal. What do you do when you need a number larger than 9?
You can’t use 10 because in hex, that would mean 16 + 0, so instead, letters are used.
A is decimal 10,
B is 11, et cetera. So on the right-hand side,
B turns into 11 * 16^0 + 1 * 16^1 + 2 * 16^2, for a total of 539 in decimal.
03:03 You can find handy charts like this on the internet quite easily. If you’re doing a lot of work in binary and hex, it’s actually useful to just memorize this chart. So, what’s the big deal?
03:12 Why use hex? Well, it turns out it’s really, really easy to map hex to binary. Each hex digit maps to 4 binary digits.
03:22 That makes it really, really easy to convert back and forth. Writing out full binary numbers is quite long, so anywhere you want to use binary, it’s easy to switch into hex.
An example of this—back to 539. Taking hex
2—look it up in the chart—is 0010. The
1 is 0001, and
B—or decimal 11—is 1011.
03:47 The three digits on the left easily map to three groups of 4 bits on the right-hand side. 4 bits, or half a byte, is called a nibble. Each nibble becomes a hex digit. Converting in the other direction is just as easy.
The first nibble turns into
F, the second into
A, the third into
3, and finally,
9. And now into the REPL.
539, putting it in a variable name
number. For starters, I’m going to convert this into text using the f-string. No real surprise there—
'539'. Using the format option of f-string, it can be converted to other representations.
0b tells Python to convert it into binary.
You can also use capital
X so that the letters in your hexadecimal number are capitalized.
int() function allows you to go in the other direction, taking the string
'539' and turning it into an
You can also pass in a base. Now note, what’s happening here is not converting decimal
539 into hex, but representing the hex number
'539' and turning it into an
int, which by default, is a decimal. So
'539' hex is
You can do this with binary as well, but of course, 5, 3, and 9 are not valid digits in binary, so you get an exception. 1 and 1 are valid base-2 numbers, so binary
'11' gets converted into
05:41 Here’s a hex number directly,
05:45 the octal version, and finally, the lengthy binary version.
This little snippet of code is handy to make binary a little more readable. It converts
539, into a
str (string) and then iterates over it digit by digit. Using an f-string,
ord() gives the code point of that digit and presents it in binary, thus giving you byte-by-byte chunks of the three digits 539.
06:20 Well, that’s enough math for one day. Now, on to Unicode.
Thanks so much for pointing that out. Good catch! We’re working on a fix and will get it up shortly.
Could somebody elaborate what’s going on with int(539, base=16)?
Is the rule equal 5x16^2 + 3x16^1 + 9x16^0 ?
Yes, base=16 tells
int() to treat the incoming string as if it is in base 16, also known as hexadecimal. You are correct. To convert hex into decimal you use the general math formula of:
digit_value * 16 ^ digit_position - 1
digit_position starts at 1 and counts from the right hand side.
Become a Member to join the conversation.
Eric Koston Jr on July 22, 2020
There seems to be a typo when giving an example of converting binary to hex @4:02.
The last nibble is shown as 0b0101 which is said to convert to 0x9.
Doesn’t 0b0101 map to 0x5? Shouldn’t it be 0b1001?