Converting Between Strings and Lists
In this lesson, you’ll explore string methods that convert between a string and some composite data type by either pasting objects together to make a string, or by breaking a string up into pieces. These methods operate on or return iterables, the general Python term for a sequential collection of objects.
Many of these methods return either a list or a tuple. A list encloses the collection of objects in square brackets ([]
) and is mutable. A tuple encloses its objects in parentheses (()
) and is immutable.
Here are methods for converting between strings and lists:
str.join(<iterable>)
str.partition(<sep>)
str.rpartition(<sep>)
str.split(sep=None, maxsplit=-1])
str.rsplit(sep=None, maxsplit=-1])
str.splitlines([<keepends>])
Here’s how to use str.join()
:
>>> mylist = ['spam', 'egg', 'sausage', 'bacon', 'lobster']
>>> mylist
['spam', 'egg', 'sausage', 'bacon', 'lobster']
>>> '; '.join(mylist)
'spam; egg; sausage; bacon; lobster'
>>> ','.join(mylist)
'spam,egg,sausage,bacon,lobster'
>>> word = 'lobster'
>>> type(word)
<class 'str'>
>>> ':'.join(word)
'l:o:b:s:t:e:r'
>>> mylist2 = ['spam', 23, 'egg']
>>> type(mylist2)
<class 'list'>
>>> ', '.join(mylist2)
Traceback (most recent call last):
File "<input>", line 1, in <module>
', '.join(mylist2)
TypeError: sequence item 1: expected str instance, int found
>>> mylist3 = ['spam', str(23), 'egg']
>>> ', '.join(mylist3)
'spam, 23, egg'
Here’s how to use str.partition()
:
>>> s = 'egg.spam'
>>> s.partition('.')
('egg', '.', 'spam')
>>> t = 'egg$$spam$$bacon'
>>> t.partition('$$')
('egg', '$$', 'spam$$bacon')
Here’s how to use str.rpartition()
:
>>> t = 'egg$$spam$$bacon'
>>> t.rpartition('$$')
('egg$$spam', '$$', 'bacon')
>>> t.partition('.')
('egg$$spam$$bacon', '', '')
>>> t.rpartition('.')
('', '', 'egg$$spam$$bacon')
Here’s how to use str.split()
:
>>> s = 'spam bacon sausage egg'
>>> s.split()
['spam', 'bacon', 'sausage', 'egg']
>>> s = 'spam\tbacon\nsausage egg'
>>> s.split()
['spam', 'bacon', 'sausage', 'egg']
>>> t = 'spam.bacon.sausage.egg'
>>> t.split('.')
['spam', 'bacon', 'sausage', 'egg']
>>> t = 'bacon...lobster...bacon'
>>> r.split('.')
['bacon', '', '', 'lobster', '', '', 'bacon']
>>> q = 'bacon\n lobster\t\n egg'
>>> q.split()
['bacon', 'lobster', 'egg']
>>> link = 'www.realpython.com'
>>> link.split('.', maxsplit=1)
['www', 'realpython.com']
Here’s how to use str.rsplit()
:
>>> link = 'www.realpython.com'
>>> link.rsplit('.', maxsplit=1)
['www.realpython', 'com']
>>> link.split('.')
['www', 'realpython', 'com']
>>> link.rsplit('.')
['www', 'realpython', 'com']
Here’s how to use str.splitlines()
:
>>> moby = 'Call me Ishmael.\nSome years ago- never mind how long precisely-\nhaving little or no money in my purse,\nand nothing particular to interest me on shore,\nI thought I would sail about a little and see the watery part of the world.\n'
>>> mobysplit = moby.splitlines()
>>> mobysplit
['Call me Ishmael.', 'Some years ago- never mind how long precisely-', 'having little or no money in my purse,', 'and nothing particular to interest me on shore,', 'I thought I would sail about a little and see the watery part of the world.']
>>> mobysplit[0]
'Call me Ishmael.'
>>> mobysplit[1]
'Some years ago- never mind how long precisely-'
00:00 For this last video on string methods, I’m going to show you a little bit about converting between strings and lists. The methods that are in this group convert between a string and some composite data type by either pasting objects together to make a string or by breaking a string apart into its pieces.
00:16 These methods operate on or return iterables. Iterable is a general Python term for a sequential collection of objects. Iterating, or walking through, all the members of a collection is a common technique done inside of Python.
00:33 I will include links below this video for more information about Python iterables. Many of these methods return either a list or a tuple, which are very similar collections of ordered objects, but they have a couple of differences.
00:47 A list is enclosed within square brackets and it’s mutable, meaning that the contents can change. Whereas a tuple, also sometimes pronounced “tup-ple”, is enclosed within parentheses and is immutable.
01:00 This is very quick introduction on these topics, but you need to know a little bit about what they look like for this next set of methods and their examples. Again, as a note, I’ll also include more information about lists and tuples below this video.
01:15
The first method you’re going to try is .join()
, which takes an iterable
into it. It concatenates strings from that iterable
. To start, I’ll have you create a list.
01:29
A list is contained within square brackets. This list will be a sequence of strings. Make sure to open and close each string object with a single quote ('
) or double quote ("
) and place a comma (,
) in between the objects.
01:44
Now that you’ve created mylist
—and you can check its type, it is the <class 'list'>
—how can you use the method .join()
? Well, .join()
works with those iterables.
01:54
In this case, you can give it a separator—in this case, maybe we’ll use, I don’t know, let’s try a semicolon and a space. That’s your separator, which is a string. And since it’s a string, you can see all the methods that are there, and .join()
is one of them.
02:07
And you can see that it’ll concatenate any number of strings. So here, you can see the example’s showing joining this list together. So this will be returned as a new string. Let’s try it with ours, mylist
,
02:21
and see what returns. It’s a single string, again, joined together using this separator. You could have used a ','
and left it without a space even.
02:34
Great! So one kind of interesting thing to think about is that any string is an iterable also, so if you had a string of just a word—and let’s say that word is 'lobster'
, and right now, what type is 'lobster'
? It’s a string. Okay.
02:48
What if you were to .join()
, say using a colon, word
. It will take all the individual letters that it can be iterated through and create a new string separating them. So, one note: if you had a list, and in this case, the list had a mix of types of things—let’s say an integer along with a handful of strings. Again, the type is definitely a list
. What happens if you were to try
03:19
using this as your separator, and join mylist2
. This one is saying that there is a problem with it. There’s an exception, a TypeError
. It expected another string and found an integer instead.
03:32
So in this case, all three of these would have to be strings in order to do that .join()
. Well, you learned a method earlier that could fix that. In its case, you could say that integer of 23
and have it converted into a string for us, using str()
. So now,
03:55
if you were to use mylist3
and join it together, you’ll not end up with that TypeError
. The next method is .partition()
, which takes a string of a separator.
04:07 It divides a string based upon that separator. The return value is a three-part tuple consisting of the portion preceding the separator, the separator itself, and the portion of the string following that separator.
04:23
Let’s say we had a string with 'egg'
and 'spam'
and a period ('.'
) separating the two. With .partition()
, it’s going to separate it into three parts given a specific separator. Let’s say our separator in this case is a period ('.'
).
04:39 It’s going to return three tuples containing the part before the separator, the separator itself, and the part after it. Okay. Let’s see what that looks like. And there you go! Again, you can see the parentheses as opposed to these square brackets, indicating this is a tuple and not a list. Kind of neat!
04:56
If you were to have a longer string, with these three words separated with dollar signs ('$'
), how would .partition()
work here? Oh, t
is our new string. So in this case, if you added the '$$'
, .partition()
made the partition based upon the first occurrence of that string.
05:18
.rpartition()
divides a string based upon a separator again. It functions exactly like .partition()
, except that the string is split at the last occurrence instead of the first occurrence of the separator. So starting from the right, like the other r
methods you’ve learned earlier.
05:40
would do the same thing, but it would work from the right side. If you were to take t
and .partition()
it based on something that’s not within the string, like let’s say a '.'
, it would simply return the entire string and then an empty string followed by another empty string in your tuple. Here’s your string t
. And t.rpartition()
, if you were to use a character that’s not in the string, would do the reverse.
06:08
Again, .partition()
works from the left, and .rpartition()
working from the right.
06:16
Next is .split()
. It splits a string into a list of substrings. Without any arguments, .split()
will take your string and divide it into substrings delimited by any sequence of whitespace. It will return those substrings as a list.
06:31
If a separator is specified, it will be used for delimiting the split. The maxsplit
value by default is -1
, which will mean it will split all the way across the entire string.
06:42
But if a value is put inside maxsplit
, it will start from the left side and count up.
06:50
So in this case, here’s a string with words separated by whitespace—in this case, just the space character (' '
). If you were to apply .split()
to it, as you can see here, without a separator the default value is going to split based upon any whitespace and discard empty strings from the results. So let’s try .split()
just by itself, and you can see it returns this list, separating all the words based upon the whitespace between them. And that whitespace could be tabs ('\t'
), newlines ('\n'
), or simply just plain old space (' '
).
07:32
If you have a string that uses a period ('.'
) as a separator, .split()
, in that case—if you were to enter in the value—will split based upon that, using that delimiter.
07:46 If you have something kind of unique with multiple periods, if you were to split this, using that as the delimiter, it will return empty strings as part of your list.
08:00 So there would be a little string there, another string there. It’s still splitting based upon that delimiter, just be aware if you have repeating characters, that’s how it would behave.
08:09 Another quick note, if you had a string
08:14 that had multiple characters of whitespace in between your words,
08:23 in that case, even though these are repetitions, it will just take it all as one chunk of whitespace in between.
08:33
Let’s do one more example here. Create a string named link
with 'realpython.com'
inside of it. And in this case, take that link
and split it. The delimiter is '.'
, sure, but this time let’s put in the value for maxsplit
.
08:48
Let’s say the maximum splits are only 1
. That would take it from the left side here, 'www.realpython.com'
, and separate the two out.
08:59
.rsplit()
works the same as .split()
. The only thing that has changed is the maxsplit
value is counted from the right side.
09:08
So, try out your same link
and this time, try .rsplit()
with it. You can see that it shows almost the entire same information, except for it’s going to split starting from the end. With a maxsplit
of 1
.
09:23
And there you can see the separation starting with the last period. In all other ways, .rsplit()
behaves the same as .split()
.
09:34 And in the case of the default,
09:41
meaning the maxsplit
is set to -1
, both methods behave exactly the same.
09:53
.splitlines()
will take a long string and break it based upon line boundaries. It will return them as a list. Any of the following characters or character sequences included in this table is considered to be a line boundary. It could be newline ('\n'
), a carriage return ('\r'
), or any of these other escape sequences included inside here, with their Unicode or ASCII equivalents. If the optional keepends
argument is set to True
, it will include those line ending characters.
10:28 If you have a large text input that’s been read in from a file, very often they’ll have newline escape sequences inside of it. It might look something like this. This is a excerpt from Moby Dick.
10:40
Instead of typing this out, you can copy it from the text below this video. You can see here, at each line there’s a line break, '\n'
. Make a new variable, call it mobysplit
. We’ll take moby
after splitting it into lines. And note, the line breaks will not be included in the resulting list unless you set that optional argument to True
. So basically, you’re taking this text, applying this method to it, and then putting that into this new variable mobysplit
.
11:15
So, what does that look like? Well, mobysplit
is a list as you can see here, and each one of these text strings is separated by a comma. And the same way that you can access portions of a string, you can access this list.
11:30
So this would be the first line, in this case. Or in the case of here, we could say, “Oh, give me the second line.” So it’s pretty powerful, what you can do with .splitlines()
if you have this large chunk of text, and how you could access it later as an iterable list instead. Now that you covered the majority of string methods, it’s time to talk about bytes, starting in Section 3.
Chris Bailey RP Team on Dec. 31, 2019
Thanks km, I’m glad you liked the lesson! You ask a good question. I looked at the python documentation, to see if there was an answer I could readily find. I didn’t really find a specific reason.
The one reason I could think of for .partition()
returning a tuple
is that it will always return 3 items. Tuples are immutable, and are more efficient meaning faster and use less memory. When designing methods that are parts of the fundamental pieces of the language, in this case the string
type, they would want it to be as efficient as possible.
For .split()
the number of items returned can vary and its possible that the best way to work with that method is to have a list
which is mutable.
nnpdba on Feb. 23, 2023
Hi Chris, Thanks you for a great tutorial. I am bit confused with the join() and hope you can help me understand. Please correct me if I am wrong, so far in learning Python what I understood is methods are called on a variable with the syntax variable.method() but in case of join we are passing varible like a parameter. Why is the join method different compared to the others?
Chris Bailey RP Team on Feb. 23, 2023
Thanks @nnpdba. Of the methods covered, .join() is fairly unique. Even though you are calling it on a string, that string is being used to combine the elements of the iterable that’s being passed into it. Very often it’s called on an empty string - "".join(<iterable>)
so that all the elements of the iterable are strung together with no spacing. It is a bit of an odd duck of the bunch. The Python docs don’t give much insight to the thought processes as to the design, maybe I could get someone on the podcast to answer that excellent question.
nnpdba on Feb. 23, 2023
Thank you for your response!
Become a Member to join the conversation.
km on Dec. 31, 2019
Thanks a lot, another great video. One query: Why is that some methods return TUPLES and some LIST. EG: Partition returns tuple but split returns LIST