Splitting, Concatenating, and Joining Strings in Python

Splitting, Concatenating, and Joining Strings in Python

by Kyle Stratis Oct 01, 2018 basics python

There are few guarantees in life: death, taxes, and programmers needing to deal with strings. Strings can come in many forms. They could be unstructured text, usernames, product descriptions, database column names, or really anything else that we describe using language.

With the near-ubiquity of string data, it’s important to master the tools of the trade when it comes to strings. Luckily, Python makes string manipulation very simple, especially when compared to other languages and even older versions of Python.

In this article, you will learn some of the most fundamental string operations: splitting, concatenating, and joining. Not only will you learn how to use these tools, but you will walk away with a deeper understanding of how they work under the hood.

Splitting Strings

In Python, strings are represented as str objects, which are immutable: this means that the object as represented in memory can not be directly altered. These two facts can help you learn (and then remember) how to use .split().

Have you guessed how those two features of strings relate to splitting functionality in Python? If you guessed that .split() is an instance method because strings are a special type, you would be correct! In some other languages (like Perl), the original string serves as an input to a standalone .split() function rather than a method called on the string itself.

What about string immutability? This should remind you that string methods are not in-place operations, but they return a new object in memory.

Splitting Without Parameters

Before going deeper, let’s look at a simple example:

>>>
>>> 'this is my string'.split()
['this', 'is', 'my', 'string']

This is actually a special case of a .split() call, which I chose for its simplicity. Without any separator specified, .split() will count any whitespace as a separator.

Another feature of the bare call to .split() is that it automatically cuts out leading and trailing whitespace, as well as consecutive whitespace. Compare calling .split() on the following string without a separator parameter and with having ' ' as the separator parameter:

>>>
>>> s = ' this   is  my string '
>>> s.split()
['this', 'is', 'my', 'string']
>>> s.split(' ')
['', 'this', '', '', 'is', '', 'my', 'string', '']

The first thing to notice is that this showcases the immutability of strings in Python: subsequent calls to .split() work on the original string, not on the list result of the first call to .split().

The second—and the main—thing you should see is that the bare .split() call extracts the words in the sentence and discards any whitespace.

Specifying Separators

.split(' '), on the other hand, is much more literal. When there are leading or trailing separators, you’ll get an empty string, which you can see in the first and last elements of the resulting list.

Where there are multiple consecutive separators (such as between “this” and “is” and between “is” and “my”), the first one will be used as the separator, and the subsequent ones will find their way into your result list as empty strings.

Limiting Splits With Maxsplit

.split() has another optional parameter called maxsplit. By default, .split() will make all possible splits when called. When you give a value to maxsplit, however, only the given number of splits will be made. Using our previous example string, we can see maxsplit in action:

>>>
>>> s = "this is my string"
>>> s.split(maxsplit=1)
['this', 'is my string']

As you see above, if you set maxsplit to 1, the first whitespace region is used as the separator, and the rest are ignored. Let’s do some exercises to test out everything we’ve learned so far.

What happens when you give a negative number as the maxsplit parameter?

.split() will split your string on all available separators, which is also the default behavior when maxsplit isn’t set.

You were recently handed a comma-separated value (CSV) file that was horribly formatted. Your job is to extract each row into an list, with each element of that list representing the columns of that file. What makes it badly formatted? The “address” field includes multiple commas but needs to be represented in the list as a single element!

Assume that your file has been loaded into memory as the following multiline string:

Name,Phone,Address
Mike Smith,15554218841,123 Nice St, Roy, NM, USA
Anita Hernandez,15557789941,425 Sunny St, New York, NY, USA
Guido van Rossum,315558730,Science Park 123, 1098 XG Amsterdam, NL

Your output should be a list of lists:

[
    ['Mike Smith', '15554218841', '123 Nice St, Roy, NM, USA'],
    ['Anita Hernandez', '15557789941', '425 Sunny St, New York, NY, USA'],
    ['Guido van Rossum', '315558730', 'Science Park 123, 1098 XG Amsterdam, NL']
]

Each inner list represents the rows of the CSV that we’re interested in, while the outer list holds it all together.

Here’s my solution. There are a few ways to attack this. The important thing is that you used .split() with all its optional parameters and got the expected output:

input_string = """Name,Phone,Address
Mike Smith,15554218841,123 Nice St, Roy, NM, USA
Anita Hernandez,15557789941,425 Sunny St, New York, NY, USA
Guido van Rossum,315558730,Science Park 123, 1098 XG Amsterdam, NL"""

def string_split_ex(unsplit):
    results = []

    # Bonus points for using splitlines() here instead, 
    # which will be more readable
    for line in unsplit.split('\n')[1:]:
        results.append(line.split(',', maxsplit=2))

    return results

print(string_split_ex(input_string))

We call .split() twice here. The first usage can look intimidating, but don’t worry! We’ll step through it, and you’ll get comfortable with expressions like these. Let’s take another look at the first .split() call: unsplit.split('\n')[1:].

The first element is unsplit, which is just the variable that points to your input string. Then we have our .split() call: .split('\n'). Here, we are splitting on a special character called the newline character.

What does \n do? As the name implies, it tells whatever is reading the string that every character after it should be shown on the next line. In a multiline string like our input_string, there is a hidden \n at the end of each line.

The final part might be new: [1:]. The statement so far gives us a new list in memory, and [1:] looks like a list index notation, and it is—kind of! This extended index notation gives us a list slice. In this case, we take the element at index 1 and everything after it, discarding the element at index 0.

In all, we iterate through a list of strings, where each element represents each line in the multiline input string except for the very first line.

At each string, we call .split() again using , as the split character, but this time we are using maxsplit to only split on the first two commas, leaving the address intact. We then append the result of that call to the aptly named results array and return it to the caller.

Concatenating and Joining Strings

The other fundamental string operation is the opposite of splitting strings: string concatenation. If you haven’t seen this word, don’t worry. It’s just a fancy way of saying “gluing together.”

Concatenating With the + Operator

There are a few ways of doing this, depending on what you’re trying to achieve. The simplest and most common method is to use the plus symbol (+) to add multiple strings together. Simply place a + between as many strings as you want to join together:

>>>
>>> 'a' + 'b' + 'c'
'abc'

In keeping with the math theme, you can also multiply a string to repeat it:

>>>
>>> 'do' * 2
'dodo'

Remember, strings are immutable! If you concatenate or repeat a string stored in a variable, you will have to assign the new string to another variable in order to keep it.

>>>
>>> orig_string = 'Hello'
>>> orig_string + ', world'
'Hello, world'
>>> orig_string
'Hello'
>>> full_sentence = orig_string + ', world'
>>> full_sentence
'Hello, world'

If we didn’t have immutable strings, full_sentence would instead output 'Hello, world, world'.

Another note is that Python does not do implicit string conversion. If you try to concatenate a string with a non-string type, Python will raise a TypeError:

>>>
>>> 'Hello' + 2
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: must be str, not int

This is because you can only concatenate strings with other strings, which may be new behavior for you if you’re coming from a language like JavaScript, which attempts to do implicit type conversion.

Concatenating With .Join()

There is another, more powerful, way to join strings together: the join() method.

The common use case here is when you have an iterable—like a list—made up of strings, and you want to combine those strings into a single string. Like .split(), .join() is a string instance method. If all of your strings are in an iterable, which one do you call .join() on?

This is a bit of a trick question. Remember that when you use .split(), you call it on the string or character you want to split on. The opposite operation is .join(), so you call it on the string or character you want to use to join your iterable of strings together:

>>>
>>> strings = ['do', 're', 'mi']
>>> ','.join(strings)
'do,re,mi'

Here, we join each element of the strings list with a comma (,) and call .join() on it rather than the strings list.

How could you make the output text more readable?

One thing you could do is add spacing:

>>>
>>> strings = ['do', 're', 'mi']
>>> ', '.join(strings)
'do, re, mi'

By doing nothing more than adding a space to our join string, we’ve vastly improved the readability of our output. This is something you should always keep in mind when joining strings for human readability.

.join() is smart in that it inserts your “joiner” in between the strings in the iterable you want to join, rather than just adding your joiner at the end of every string in the iterable. This means that if you pass an iterable of size 1, you won’t see your joiner:

>>>
>>> 'b'.join(['a'])
'a'

Using our web scraping tutorial, you’ve built a great weather scraper. However, it loads string information in a list of lists, each holding a unique row of information you want to write out to a CSV file:

[
    ['Boston', 'MA', '76F', '65% Precip', '0.15 in'],
    ['San Francisco', 'CA', '62F', '20% Precip', '0.00 in'],
    ['Washington', 'DC', '82F', '80% Precip', '0.19 in'],
    ['Miami', 'FL', '79F', '50% Precip', '0.70 in']
]

Your output should be a single string that looks like this:

"""
Boston,MA,76F,65% Precip,0.15in
San Francisco,CA,62F,20% Precip,0.00 in
Washington,DC,82F,80% Precip,0.19 in
Miami,FL,79F,50% Precip,0.70 in
"""

For this solution, I used a list comprehension, which is a powerful feature of Python that allows you to rapidly build lists. If you want to learn more about them, check out this great article that covers all the comprehensions available in Python.

Below is my solution, starting with a list of lists and ending with a single string:

input_list = [
    ['Boston', 'MA', '76F', '65% Precip', '0.15 in'],
    ['San Francisco', 'CA', '62F', '20% Precip', '0.00 in'],
    ['Washington', 'DC', '82F', '80% Precip', '0.19 in'],
    ['Miami', 'FL', '79F', '50% Precip', '0.70 in']
]

# We start with joining each inner list into a single string
joined = [','.join(row) for row in input_list]

# Now we transform the list of strings into a single string
output = '\n'.join(joined)

print(output)

Here we use .join() not once, but twice. First, we use it in the list comprehension, which does the work of combining all the strings in each inner list into a single string. Next, we join each of these strings with the newline character \n that we saw earlier. Finally, we simply print the result so we can verify that it is as we expected.

Tying It All Together

While this concludes this overview of the most basic string operations in Python (splitting, concatenating, and joining), there is still a whole universe of string methods that can make your experiences with manipulating strings much easier.

Once you have mastered these basic string operations, you may want to learn more. Luckily, we have a number of great tutorials to help you complete your mastery of Python’s features that enable smart string manipulation:

🐍 Python Tricks 💌

Get a short & sweet Python Trick delivered to your inbox every couple of days. No spam ever. Unsubscribe any time. Curated by the Real Python team.

Python Tricks Dictionary Merge

About Kyle Stratis

Kyle Stratis

Kyle is a self-taught developer working as a senior data engineer at PatientsLikeMe and a cofounder of Danqex (formerly Nasdanq) and Encryptid Gaming.

» More about Kyle

Each tutorial at Real Python is created by a team of developers so that it meets our high quality standards. The team members who worked on this tutorial are:

What Do You Think?

Real Python Comment Policy: The most useful comments are those written with the goal of learning from or helping out other readers—after reading the whole article and all the earlier comments. Complaints and insults generally won’t make the cut here.

Keep Reading