Join us and get access to hundreds of tutorials and a community of expert Pythonistas.

Unlock This Lesson

This lesson is for members only. Join us and get access to hundreds of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Hint: You can adjust the default video playback speed in your account settings.
Hint: You can set the default subtitles language in your account settings.
Sorry! Looks like there’s an issue with video playback 🙁 This might be due to a temporary outage or because of a configuration issue with your browser. Please see our video player troubleshooting guide to resolve the issue.

Defining bytes Objects With bytes()

Give Feedback

In the last lesson, you saw how you could create a bytes object using a string literal with the addition of a 'b' prefix. In this lesson, you’ll learn how to use bytes() to create a bytes object. You’ll explore three different forms of using bytes():

  1. bytes(<s>, <encoding>) creates a bytes object from a string.
  2. bytes(<size>) creates a bytes object consisting of null (0x00) bytes.
  3. bytes(<iterable>) creates a bytes object from an iterable.

Here’s how to use bytes(<s>, <encoding>):

>>>
>>> a = bytes('bacon and egg', 'utf8')
>>> a
b'bacon and egg'
>>> type(a)
<class 'bytes'>

>>> b = bytes('Hello ∑ €', 'utf8')
>>> b
b'Hello \xe2\x88\x91 \xe2\x82\xac'

>>> len(a)
13
>>> a
b'bacon and egg'
>>> b
b'Hello \xe2\x88\x91 \xe2\x82\xac'
>>> len(b)
13
>>> a[0]
98
>>> a[1]
97
>>> a[2]
99
>>> b[0]
72
>>> b[1]
101
>>> b[5]
32
>>> b[6]
226
>>> b[7]
136
>>> b[8]
145

Here’s how to use bytes(<size>):

>>>
>>> c = bytes(8)
>>> c
b'\x00\x00\x00\x00\x00\x00\x00\x00'
>>>len(c)
8

Here’s how to use bytes(<iterable>):

>>>
>>> d = bytes([115, 112, 97, 109, 33])
>>> d
b'spam!'
>>> type(d)
<class 'bytes'>
>>> d[0]
115
>>> d[3]
109

00:00 Another way to define a bytes object is by using the built-in bytes() function.

00:06 There’s three different approaches that you’re going to practice. The first is you use a function bytes(), and enclosed in arguments you put in a string followed by the type of encoding.

00:17 This will create a bytes object from that string. The second way is by using the bytes() function and entering in a size. This is going to create a bytes object consisting of null bytes that look like this, 0x00. It’s not an empty bytes object, which you will see when you try it out.

00:37 The third way to use the bytes() function is to pass an iterable into it. In this case, you can pass a list with integers in the range of 0 to 255.

00:51 To try the first technique, you’re going to create an object called a.

00:56 And from here, bpython helps out again. It shows the variety of things that you can put inside. For this first one, we’re going to focus on this one, where you can choose a string and then an encoding type, and that’s going to output bytes.

01:08 So the string in this case will be

01:12 'bacon and egg'. Here, you choose the encoding. The type of encoding that you’re going to use in this case is called 'utf8'. It stands for Unicode Transformation Format, and it’s basically breaking those Unicode-type characters into bytes.

01:28 And you’ll see what it does, kind of, on the output. This is a simple string that’s just consisting of ASCII characters, but it can convert non-ASCII characters from the Unicode set into these UTF8 chunks. a has been created, and you can see, in this case, it’s very simple—just a bytes object with a b at the front of it. And if you were to type it, it is of <class 'bytes'>.

01:53 Let’s create an object called b, and this time using a non-ASCII character, a character like Sigma ('∑'). Again, that was created on my keyboard using Option + W. Or potentially the Euro symbol ('€'). Again, that one was a little more difficult, Shift + Option + 2.

02:13 So those ones go beyond the basic 128 characters that are in ASCII.

02:19 What happens? Please note that 'utf8' has to be enclosed in quotations as a value. So here’s b. And you can see that it—with these escape codes here—has created an encoding. So if you’re to look at individual values, going back to a, and looking—well, first look at the length of it.

02:41 len(a). So a is 13 bytes long. Okay. How long is this? One, two, three, four, five, six, seven, eight, nine. Okay. So what should the length of b be? Well, b is actually 13 characters. Instead of just nine characters, it’s had to encode those higher-number Unicode characters into multiple bytes to represent those characters. And you can see the individual bytes also.

03:09 Like if you went to a and looked at the very first letter in the string, which would be 0 using your indexing technique, its value is actually 98 as an integer.

03:19 That’s the letter 'b' in 'bacon'. 'a' being 97. You might guess here 'c' would be 99. Yep.

03:29 So there it is. 'a' is 97, 'b' is 98, and so forth, going along the letters that are here that we’re indexing. What does that look like for our second string, b?

03:41 Well, again, the beginning—

03:45 there’s 'H', a capital 'H', which is 72, and 101 for the letter 'e'. Let’s move forward a little bit.

03:51 So this index would be 5—actually 5 would be the space (' '). 6. Let’s just check. 5 should be 32, a space, and 6—well, that’s way above 128. It’s 226.

04:02 And the next character is 136. Again, it’s still above 127. So these three bytes here are the encoding for the '∑'. Pretty cool!

04:12 I know I’ve kind of taken you into the weeds, I just wanted to show you a little bit of what’s happening inside of here. Again, computers—at the heart of them—speak in bits and bytes, so this is kind of showing you how the information is encoded and then can be decoded. Another way that you can use bytes()let’s say you had a string, in this case, you have an object named c. And another way that you could do this is actually to put an integer into it. It’s going to create, if I use 8, an object of eight bytes. These are null bytes.

04:42 So each one of these is the lowest possible value for a byte, and it’s created a string of eight of them. In fact, you can see again, you could use len()8 bytes. That’s how long that bytes object is. Pretty cool!

04:53 It’s not an empty bytes object—those are eight separate null values. The last way to create using the bytes() function is to give it an iterable.

05:04 So in this case, it has to be of int. So those integers, if you’re dealing with ASCII, again are going to be below 128. So I could say [115, 112, 97, 109, 33]. Great!

05:23 So, what did you create with this? You’ve made a bytes object that says 'spam!' with an exclamation point. It still is a bytes object, but if you were to address individual indexes, you again would see—oh, I typed b. It needs to be d.

05:39 So as you address the individual indexes, you can see how they match from the iterable that you put into the bytes() function.

05:51 Next up, what are the types of operations that can be applied to bytes objects?

theramstoss on June 4, 2020

Question for you: why does bytes(‘\x80’, ‘utf8’) evaluate to b’\xc2\x80’ ?

Thank you!

Chris Bailey RP Team on June 4, 2020

Hi @theamstoss,

You are heading in a deeper direction when you start to look at encodings. The utf-8 standard encodes in multiple byte sizes. This article and there will be a video course for it soon. They really do a good deep dive. Here is a code snippet from the article, showing characters just outside the ASCII group, in this case they have accents, being encoded in utf-8 as 2 bytes. But the other ASCII characters are single letters.

>>> "résumé".encode("utf-8")
b'r\xc3\xa9sum\xc3\xa9'
>>> "El Niño".encode("utf-8")
b'El Ni\xc3\xb1o'

>>> b"r\xc3\xa9sum\xc3\xa9".decode("utf-8")
'résumé'
>>> b"El Ni\xc3\xb1o".decode("utf-8")
'El Niño'

The value you have picked of '\x80' is equal to 128, and takes you just out of ASCII and the lower 0-127.

Become a Member to join the conversation.