Python Bitwise AND, OR, and NOT

Binary, Bytes, and Bitwise Operators in Python Christopher Trudeau 10:38

Transcript
Discussion

00:00 In the previous lesson, I introduced you to the two ways Python stores integers. In this lesson, I’ll show you the AND, OR, and NOT bitwise operators in Python.

00:12 This course has a lot of theory in it, and in case you skipped straight to the Python stuff, here is a quick recap. Binary storage is founded on base-2 math, with each bit in a binary value representing a power of 2. Negative numbers can be stored in several different ways, the most common of which is called two’s complement representation.

00:34 Python has two internal mechanisms for storing integers, which it switches back and forth between: Fixed precision, which is typically an eight-byte number, and arbitrary precision, which is a big num object representation without an upper limit.

00:50 One complication for bitwise operations is that Python does not have a native unsigned integer data type. Okay, good enough. Let’s open up the REPL and do some bitwise math.

01:03 I’ll start out with a lowly small integer. When I type it in, the REPL shows the decimal representation. Python supports the specification of an integer in a number of formats.

01:16 The 0o prefix specifies octal. 52 in octal is 42 in decimal. Note that some other programming languages use a different prefix for octal, just a leading zero. Python explicitly disallows this to avoid confusion.

01:35 The 0x prefix specifies a hexadecimal number. 2A in hex is 42 decimal as well. Want to take a guess at the next one? Yep!

01:47 0b is the prefix for binary. Yet another way of getting 42. Note that these different input methods are not generating different kinds of numbers.

01:56 They’re just different ways of inputting the same 42 stored the same way internally every time. Let’s put this in a variable. No surprises so far. Now, what if you want to see the binary representation of this value?

02:14 There are several ways of accomplishing this. If you’re okay with a string representation, you can use the formatting feature of the f-string. Here, the :b says to output the number in binary format.

02:28 This is often one of the easiest ways of debugging a number that you need to see in binary format. You can further adjust the format specifier to indicate how many digits.

02:41 This f-string shows an eight-bit byte. There are also a series of functions that you can use to convert to the string versions of different number representations.

02:52 oct() returns the octal format, hex() returns the hexadecimal format, and bin() for the binary. Let me finally follow through on my promise and do some bit math.

03:10 Here, I’ve combined two binary values. Recall that AND (&) looks for bits that are in both numbers. In this case, that’s the third bit. The third position is 2 to the power of 2, remember the first bit is 2 to the 0, and of course, 2 squared is 4.

03:26 Just like before, this is a numeric calculation, so the REPL shows the decimal result.

03:35 Wrapping the same call in bin() shows the binary string equivalent. And as you can see, the third bit is on. Let’s do the same thing with OR.

03:48 A 1 bit in either value causes a 1 bit in the result—in this case, four 1 bits. Well that’s AND and OR. How about NOT (~)? Well, that’s a bit unexpected. What happened here? For starters, understand that just because you input four bits doesn’t mean the math is being done on a four-bit number. Python integers on most platforms are eight bytes long.

04:15 The number here isn’t really 0110. It’s sixty 0s, then 0110. When I used the tilde (~), all sixty-four of those bits were inverted.

04:27 That gives you sixty-one 1 bits, two 0s, and then a final 1 bit. This is the two’s complement representation of -7 decimal. The bin() function then tries to be helpful. Rather than show the two’s complement number, it pulls the sign out front and then shows the positive value.

04:47 What you’re seeing here is minus (-) then binary 7. I believe I warned you about messing with bitwise math and signed integers. Hopefully, you’re starting to see why.

05:00 Let’s do that again without the call to bin(). Like I said, -7. Doing it another way. It might be weird, but it is consistent. You might recall from the lesson on two’s complement that NOT 0 is -1.

05:20 And calling bin() gives the signed-magnitude output once again. Okay. So, this is a bit ugly. Is there anything else that can be done? Well, sort of. Let me import the c_uint8 (C unsigned int8) object from the ctypes module.

05:44 This module is typically used to interact with C language extensions. c_uint8 is short for C-language unsigned integer eight bits. I’ve renamed that to the slightly more friendly unsigned_byte during the import.

06:02 Creating an unsigned byte with a value of -42 gives 214 decimal. As the byte is unsigned, the two’s complement representation of -42 has flipped the same bits that result in 214 in unsigned mode.

06:21 You can use the value accessor of the unsigned byte object to get the contents back out as a regular old integer. Great! So this solves our problem. Why did I spend all that time talking about the fact that Python doesn’t have an unsigned integer? Why not just use the ctypes library?

06:38 Well…

06:47 It turns out ctypes doesn’t support bitwise operations. They’re helpful for conversions, but can’t be used as a replacement for an unsigned int. What’s that old thing about skinning cats? Let’s look at the array object from the array module.

07:05 You can construct an array object using the small-b template, meaning signed bytes.

07:15 Or you can construct one with the capital-B template, meaning unsigned. Using the .tobytes() method on the signed value and importing that into the unsigned object with the .frombytes() method

07:37 gives the familiar 214. This is another way of converting things back and forth. You can see the bits with the bin() function,

07:51 but unfortunately, if you start doing bitwise operations on these, it converts them to integers first. Not the same problem as ctypes, but it still doesn’t solve the issue.

08:02 There is yet a third way of converting byte values.

08:09 The struct module is useful when dealing with packed binary data. Like the array object, it takes a template.

08:21 Capital 'BB' means two bytes unsigned, like the array. This takes the signed -42 and gives you the unsigned binary representation.

08:31 But also like the array, if you start manipulating these values, you’re back in integer land and all the problems that that entails. There is a solution, but it doesn’t come standard with the Python library.

08:44 NumPy is a third-party scientific calculation library. It is quite popular and the underlying implementation is built in C, so it is also very performant.

08:55 NumPy provides a number of alternate data types, including unsigned ones. To install NumPy, use pip. As always, it is best practice to do this in a virtual env.

09:08 With NumPy installed, I can import the unsigned 32-bit integer.

09:18 Like everything in Python, this is an object. Unlike the C types, NumPy’s objects support bitwise operators.

09:30 And remember good old -7? A solution! This isn’t a negative number, and it also isn’t the weird signed-magnitude binary string. This is an actual inverted unsigned int. The REPL shows the integer value, but underneath, this is still a NumPy object.

09:53 And looking at the type, you see it’s a numpy.uint32. NumPy provides different byte sizes for its unsigned int. Here’s the eight-bit one.

10:08 And the ugly -7 problem has gone away. Depending on your use case, NumPy is probably overkill. If you’re just keeping some binary flags and sticking with positive numbers, you don’t need this. But if you’re doing hardcore bit manipulation, having access to a true unsigned value could be useful.

10:28 You’ve seen AND, OR, and NOT. Next up, let’s get shifty with it! I apologize to Rick and Morty fans everywhere.

Become a Member to join the conversation.