Locked learning resources

Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Locked learning resources

This lesson is for members only. Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Splitting Strings in Python

This lesson covers splitting strings using the .split() method. .split() has optional parameters that allow you to fine tune how strings are split. You’ll see how to use a bare call to .split() to cut out whitespace from a string and get a list of words:

Python
>>> 'this is my string'.split()
['this', 'is', 'my', 'string']

You’ll also learn how to specify separators, and also limit the number of splits in calls to split() using the maxsplit parameter:

Python
>>> s = "this is my string"
>>> s.split(maxsplit=1)
['this', 'is my string']

00:00 Hello everyone! Welcome to our second video in the series on Splitting, Concatenating, and Joining Strings in Python. This video will focus on the string method .split(). .split() creates a list of substrings from the string instance that invoked it. Before we get started, let’s review two things from the previous video.

00:21 First, strings are immutable sequences, meaning they can’t be changed. This property is key to understanding string methods in general. String methods can’t modify the string instance they’re invoked upon, therefore, the string methods we discuss here will all be creating new values or objects.

00:39 Second, although string methods can be invoked as static methods directly from the string class, this is not recommended. Going forward, we will explore our string methods by invoking them the usual way—from their string object instance.

00:53 Now let’s talk about .split(). Here we see an example of the .split() method. The job of .split() is to create substrings based on the position of a separator character or string within the string instance.

01:09 .split() then returns a list object with those substrings as elements. The separator used to delimit these string portions depends on what, if any, character or string was passed as an argument.

01:23 Since this argument is used in this way, only strings can be passed as the separator. There are no restrictions, however, on the type of character or length of string that serves this purpose.

01:36 The .split() method can be invoked without passing any arguments. If nothing is passed, .split() uses whitespace as its default separator.

01:45 As you can see, invoking .split() from our sentence string effectively creates a list of words. With this default behavior, .split() can be helpful in generating a word count on large blocks of text.

01:58 Once a .split() is called on the text block, len() can be used to count the elements in the returned list.

02:07 Sometimes, however, we don’t want .split() to use whitespace as our string delimiter. For example, say we wanted to break apart our sentence string at the lowercase 's'. We simply pass the lowercase 's' as a string argument to .split(). .split() will then use that character as a separator to generate the list elements, which you can see from this result. Note also that the separator character, the 's', is missing from the substrings that were generated.

02:36 So, how could this be useful? You may have seen data in a format known as CSV, or comma-separated values.

02:44 Here’s an example of United States zip code data in this format. .split() can be useful in reading CSV data, because we can specify the comma character (',') as our split delimiter, thereby identifying the field values within each record or row.

03:03 Since .split() uses whitespace as its default separator, you might assume that explicitly passing a space character (' ') string argument would function the same, but there is a slight difference.

03:16 This difference is apparent in the returned list when working upon a string with leading or trailing whitespace. In this situation, passing a literal space character to .split() produces empty strings in the list result.

03:32 Another useful feature of .split() involves the use of its optional second parameter, maxsplit. The default value of maxsplit is -1, which means it has no real effect. However, if a positive integer is passed as the maxsplit argument, that number will be the maximum number of times .split() will use the separator.

03:55 For example, if a 1 is specified for maxsplit, the .split() method’s job is complete after the first separator position is found. The remaining separators will then be ignored.

04:11 So, how might maxsplit be useful? If we revisit our zip code CSV example, let’s suppose we want to treat the city and state as one field value called LOCATION. Without specifying the maxsplit argument, a split on the comma character would render a three-string list. However, specifying the integer 1 for maxsplit would stop splitting after the first comma, the one following the zip code.

04:37 Now .split() returns a two-string list, where we can treat our city and state as one field. Here you see the comma after the city is now part of the string in the second element.

04:49 Hopefully, this video showed how .split() can be useful, and perhaps you can use split in some of your own applications. Now that we’ve broken our strings apart, let’s put them back together again in the next video.

Become a Member to join the conversation.