Defining bytes Objects With bytes()

Strings and Character Data in Python Christopher Bailey 05:55

In the last lesson, you saw how you could create a bytes object using a string literal with the addition of a 'b' prefix. In this lesson, you’ll learn how to use bytes() to create a bytes object. You’ll explore three different forms of using bytes():

bytes(<s>, <encoding>) creates a bytes object from a string.
bytes(<size>) creates a bytes object consisting of null (0x00) bytes.
bytes(<iterable>) creates a bytes object from an iterable.

Here’s how to use bytes(<s>, <encoding>):

Python
      
        
      
    
>>> a = bytes('bacon and egg', 'utf8')
>>> a
b'bacon and egg'
>>> type(a)
<class 'bytes'>

>>> b = bytes('Hello ∑ €', 'utf8')
>>> b
b'Hello \xe2\x88\x91 \xe2\x82\xac'

>>> len(a)
13
>>> a
b'bacon and egg'
>>> b
b'Hello \xe2\x88\x91 \xe2\x82\xac'
>>> len(b)
13
>>> a[0]
98
>>> a[1]
97
>>> a[2]
99
>>> b[0]
72
>>> b[1]
101
>>> b[5]
32
>>> b[6]
226
>>> b[7]
136
>>> b[8]
145

Here’s how to use bytes(<size>):

Python
      
        
      
    
>>> c = bytes(8)
>>> c
b'\x00\x00\x00\x00\x00\x00\x00\x00'
>>>len(c)
8

Here’s how to use bytes(<iterable>):

Python
      
        
      
    
>>> d = bytes([115, 112, 97, 109, 33])
>>> d
b'spam!'
>>> type(d)
<class 'bytes'>
>>> d[0]
115
>>> d[3]
109

00:00 Another way to define a bytes object is by using the built-in bytes() function.

00:06 There’s three different approaches that you’re going to practice. The first is you use a function bytes(), and enclosed in arguments you put in a string followed by the type of encoding.

00:17 This will create a bytes object from that string. The second way is by using the bytes() function and entering in a size. This is going to create a bytes object consisting of null bytes that look like this, 0x00. It’s not an empty bytes object, which you will see when you try it out.

00:37 The third way to use the bytes() function is to pass an iterable into it. In this case, you can pass a list with integers in the range of 0 to 255.

00:51 To try the first technique, you’re going to create an object called a.

00:56 And from here, bpython helps out again. It shows the variety of things that you can put inside. For this first one, we’re going to focus on this one, where you can choose a string and then an encoding type, and that’s going to output bytes.

01:08 So the string in this case will be

01:12 'bacon and egg'. Here, you choose the encoding. The type of encoding that you’re going to use in this case is called 'utf8'. It stands for Unicode Transformation Format, and it’s basically breaking those Unicode-type characters into bytes.

01:28 And you’ll see what it does, kind of, on the output. This is a simple string that’s just consisting of ASCII characters, but it can convert non-ASCII characters from the Unicode set into these UTF8 chunks. a has been created, and you can see, in this case, it’s very simple—just a bytes object with a b at the front of it. And if you were to type it, it is of <class 'bytes'>.

01:53 Let’s create an object called b, and this time using a non-ASCII character, a character like Sigma ('∑'). Again, that was created on my keyboard using Option + W. Or potentially the Euro symbol ('€'). Again, that one was a little more difficult, Shift + Option + 2.

02:13 So those ones go beyond the basic 128 characters that are in ASCII.

02:19 What happens? Please note that 'utf8' has to be enclosed in quotations as a value. So here’s b. And you can see that it—with these escape codes here—has created an encoding. So if you’re to look at individual values, going back to a, and looking—well, first look at the length of it.

02:41 len(a). So a is 13 bytes long. Okay. How long is this? One, two, three, four, five, six, seven, eight, nine. Okay. So what should the length of b be? Well, b is actually 13 characters. Instead of just nine characters, it’s had to encode those higher-number Unicode characters into multiple bytes to represent those characters. And you can see the individual bytes also.

03:09 Like if you went to a and looked at the very first letter in the string, which would be 0 using your indexing technique, its value is actually 98 as an integer.

03:19 That’s the letter 'b' in 'bacon'. 'a' being 97. You might guess here 'c' would be 99. Yep.

03:29 So there it is. 'a' is 97, 'b' is 98, and so forth, going along the letters that are here that we’re indexing. What does that look like for our second string, b?

03:41 Well, again, the beginning—

03:45 there’s 'H', a capital 'H', which is 72, and 101 for the letter 'e'. Let’s move forward a little bit.

03:51 So this index would be 5—actually 5 would be the space (' '). 6. Let’s just check. 5 should be 32, a space, and 6—well, that’s way above 128. It’s 226.

04:02 And the next character is 136. Again, it’s still above 127. So these three bytes here are the encoding for the '∑'. Pretty cool!

04:12 I know I’ve kind of taken you into the weeds, I just wanted to show you a little bit of what’s happening inside of here. Again, computers—at the heart of them—speak in bits and bytes, so this is kind of showing you how the information is encoded and then can be decoded. Another way that you can use bytes()—let’s say you had a string, in this case, you have an object named c. And another way that you could do this is actually to put an integer into it. It’s going to create, if I use 8, an object of eight bytes. These are null bytes.

04:42 So each one of these is the lowest possible value for a byte, and it’s created a string of eight of them. In fact, you can see again, you could use len()—8 bytes. That’s how long that bytes object is. Pretty cool!

04:53 It’s not an empty bytes object—those are eight separate null values. The last way to create using the bytes() function is to give it an iterable.

05:04 So in this case, it has to be of int. So those integers, if you’re dealing with ASCII, again are going to be below 128. So I could say [115, 112, 97, 109, 33]. Great!

05:23 So, what did you create with this? You’ve made a bytes object that says 'spam!' with an exclamation point. It still is a bytes object, but if you were to address individual indexes, you again would see—oh, I typed b. It needs to be d.

05:39 So as you address the individual indexes, you can see how they match from the iterable that you put into the bytes() function.

05:51 Next up, what are the types of operations that can be applied to bytes objects?

theramstoss on June 4, 2020

Question for you: why does bytes(‘\x80’, ‘utf8’) evaluate to b’\xc2\x80’ ?

Thank you!

Chris Bailey RP Team on June 4, 2020

Hi @theamstoss,

You are heading in a deeper direction when you start to look at encodings. The utf-8 standard encodes in multiple byte sizes. This article and there will be a video course for it soon. They really do a good deep dive. Here is a code snippet from the article, showing characters just outside the ASCII group, in this case they have accents, being encoded in utf-8 as 2 bytes. But the other ASCII characters are single letters.

>>> "résumé".encode("utf-8")
b'r\xc3\xa9sum\xc3\xa9'
>>> "El Niño".encode("utf-8")
b'El Ni\xc3\xb1o'

>>> b"r\xc3\xa9sum\xc3\xa9".decode("utf-8")
'résumé'
>>> b"El Ni\xc3\xb1o".decode("utf-8")
'El Niño'

The value you have picked of '\x80' is equal to 128, and takes you just out of ASCII and the lower 0-127.

Bhavesh Sharma on Jan. 16, 2022

Have difficulty in understanding the concept in this video. If the ASCII characters have a max value from 0-127 then how how does it accept 0-255 in length. Not getting it .

Bartosz Zaczyński RP Team on Jan. 17, 2022

@Bhavesh Sharma The original ASCII standard allocated only 7 bits corresponding to values between 0 and 127 to represent Latin alphabet letters, digits, punctuation, and a few other symbols. It was enough for the hardware of the sixties. The later addition of the 8th bit allowed for implementing various extensions known as code pages, which made it possible to use more exotic characters like ąćęłńóśźż.

Biggz78 on May 27, 2022

Hi, I am new to learning Python, I’ve bought the book or books a while ago, and just this week i started to study, the book suggests me to this course and that all good, am like String Manipulation, check, then we move on to build-in String Methods, cool cool, i can follow, and then section 3 ....

Bytes Objects,

And am like, UHMMMM, WTF you lost me, is this the way to learn? going from beginner, just learning about strings, and then going over to Byte Object, I dont even know what they are or what they do…

isn’t this too early? or did I miss something here?

Bartosz Zaczyński RP Team on May 30, 2022

@Biggz78 It might sound strange jumping from string to bytes, but don’t let the “bytes” name scare you away. The reason why bytes are mentioned right after strings is that both data types in Python share a wealth of common attributes and are closely related. They’re both sequences that behave almost the same. When you look at their attributes, you’ll notice that most are identical:

>>> len(set(dir(str)) | set(dir(bytes)))
83
>>> len(set(dir(str)) & set(dir(bytes)))
72

Strings and bytes have 83 attributes combined, 72 of which are the same. Also, you can go from strings to bytes and the other way around:

>>> "Hello, World!".encode("utf-8")
b'Hello, World!'
>>> b"Hello, World!".decode("utf-8")
'Hello, World!'

Biggz78 on May 31, 2022

@Bartosz Zaczyński,

I understand where you’re coming from, but try to understand my point of view, Have no clue what bytes are… so all the “attributes” (whatever they are) they share, is all fun and that, for the more seasoned Python user.

I am learning from the book, Python Basics, and just finished chapter 5 today. Assuming you know the book, no mention about bytes, so then to be referred by the book to this resource (end of chapter 4, String and String Methods) again, not having gotten to bytes just doesn’t make sense.

Biggz78 on May 31, 2022

@Bartosz Zaczyński,

I think I made a “boo boo”. I might have come across this tutorial by accident, went back to the book to “fact check” and there is no link to this course but to:

Python String Formatting Best Practices

and

Splitting, Concatenating, and Joining Strings in Python

I think I might have made an search error or something, but I ended up with the wrong tutorial, so my bad, and my apologies.

Ross on June 5, 2024

What kind of problem would you solve using bytes objects?

Become a Member to join the conversation.