Watch Now This tutorial has a related video course created by the Real Python team. Watch it together with the written tutorial to deepen your understanding: Check if a Python String Contains a Substring
If you’re new to programming or come from a programming language other than Python, you may be looking for the best way to check whether a string contains another string in Python.
Identifying such substrings comes in handy when you’re working with text content from a file or after you’ve received user input. You may want to perform different actions in your program depending on whether a substring is present or not.
In this tutorial, you’ll focus on the most Pythonic way to tackle this task, using the membership operator in
. Additionally, you’ll learn how to identify the right string methods for related, but different, use cases.
Finally, you’ll also learn how to find substrings in pandas columns. This is helpful if you need to search through data from a CSV file. You could use the approach that you’ll learn in the next section, but if you’re working with tabular data, it’s best to load the data into a pandas DataFrame and search for substrings in pandas.
Free Download: Click here to download the sample code that you’ll use to check if a string contains a substring.
How to Confirm That a Python String Contains Another String
If you need to check whether a string contains a substring, use Python’s membership operator in
. In Python, this is the recommended way to confirm the existence of a substring in a string:
>>> raw_file_content = """Hi there and welcome.
... This is a special hidden file with a SECRET secret.
... I don't want to tell you The Secret,
... but I do want to secretly tell you that I have one."""
>>> "secret" in raw_file_content
True
The in
membership operator gives you a quick and readable way to check whether a substring is present in a string. You may notice that the line of code almost reads like English.
Note: If you want to check whether the substring is not in the string, then you can use not in
:
>>> "secret" not in raw_file_content
False
Because the substring "secret"
is present in raw_file_content
, the not in
operator returns False
.
When you use in
, the expression returns a Boolean value:
True
if Python found the substringFalse
if Python didn’t find the substring
You can use this intuitive syntax in conditional statements to make decisions in your code:
>>> if "secret" in raw_file_content:
... print("Found!")
...
Found!
In this code snippet, you use the membership operator to check whether "secret"
is a substring of raw_file_content
. If it is, then you’ll print a message to the terminal. Any indented code will only execute if the Python string that you’re checking contains the substring that you provide.
Note: Python considers empty strings always as a substring of any other string, so checking for the empty string in a string returns True
:
>>> "" in "secret"
True
This may be surprising because Python considers emtpy strings as false, but it’s an edge case that is helpful to keep in mind.
The membership operator in
is your best friend if you just need to check whether a Python string contains a substring.
However, what if you want to know more about the substring? If you read through the text stored in raw_file_content
, then you’ll notice that the substring occurs more than once, and even in different variations!
Which of these occurrences did Python find? Does capitalization make a difference? How often does the substring show up in the text? And what’s the location of these substrings? If you need the answer to any of these questions, then keep on reading.
Generalize Your Check by Removing Case Sensitivity
Python strings are case sensitive. If the substring that you provide uses different capitalization than the same word in your text, then Python won’t find it. For example, if you check for the lowercase word "secret"
on a title-case version of the original text, the membership operator check returns False
:
>>> title_cased_file_content = """Hi There And Welcome.
... This Is A Special Hidden File With A Secret Secret.
... I Don't Want To Tell You The Secret,
... But I Do Want To Secretly Tell You That I Have One."""
>>> "secret" in title_cased_file_content
False
Despite the fact that the word secret appears multiple times in the title-case text title_cased_file_content
, it never shows up in all lowercase. That’s why the check that you perform with the membership operator returns False
. Python can’t find the all-lowercase string "secret"
in the provided text.
Humans have a different approach to language than computers do. This is why you’ll often want to disregard capitalization when you check whether a string contains a substring in Python.
You can generalize your substring check by converting the whole input text to lowercase:
>>> file_content = title_cased_file_content.lower()
>>> print(file_content)
hi there and welcome.
this is a special hidden file with a secret secret.
i don't want to tell you the secret,
but i do want to secretly tell you that i have one.
>>> "secret" in file_content
True
Converting your input text to lowercase is a common way to account for the fact that humans think of words that only differ in capitalization as the same word, while computers don’t.
Note: For the following examples, you’ll keep working with file_content
, the lowercase version of your text.
If you work with the original string (raw_file_content
) or the one in title case (title_cased_file_content
), then you’ll get different results because they aren’t in lowercase. Feel free to give that a try while you work through the examples!
Now that you’ve converted the string to lowercase to avoid unintended issues stemming from case sensitivity, it’s time to dig further and learn more about the substring.
Learn More About the Substring
The membership operator in
is a great way to descriptively check whether there’s a substring in a string, but it doesn’t give you any more information than that. It’s perfect for conditional checks—but what if you need to know more about the substrings?
Python provides many additonal string methods that allow you to check how many target substrings the string contains, to search for substrings according to elaborate conditions, or to locate the index of the substring in your text.
In this section, you’ll cover some additional string methods that can help you learn more about the substring.
Note: You may have seen the following methods used to check whether a string contains a substring. This is possible—but they aren’t meant to be used for that!
Programming is a creative activity, and you can always find different ways to accomplish the same task. However, for your code’s readability, it’s best to use methods as they were intended in the language that you’re working with.
By using in
, you confirmed that the string contains the substring. But you didn’t get any information on where the substring is located.
If you need to know where in your string the substring occurs, then you can use .index()
on the string object:
>>> file_content = """hi there and welcome.
... this is a special hidden file with a secret secret.
... i don't want to tell you the secret,
... but i do want to secretly tell you that i have one."""
>>> file_content.index("secret")
59
When you call .index()
on the string and pass it the substring as an argument, you get the index position of the first character of the first occurrence of the substring.
Note: If Python can’t find the substring, then .index()
raises a ValueError
exception.
But what if you want to find other occurrences of the substring? The .index()
method also takes a second argument that can define at which index position to start looking. By passing specific index positions, you can therefore skip over occurrences of the substring that you’ve already identified:
>>> file_content.index("secret", 60)
66
When you pass a starting index that’s past the first occurrence of the substring, then Python searches starting from there. In this case, you get another match and not a ValueError
.
That means that the text contains the substring more than once. But how often is it in there?
You can use .count()
to get your answer quickly using descriptive and idiomatic Python code:
>>> file_content.count("secret")
4
You used .count()
on the lowercase string and passed the substring "secret"
as an argument. Python counted how often the substring appears in the string and returned the answer. The text contains the substring four times. But what do these substrings look like?
You can inspect all the substrings by splitting your text at default word borders and printing the words to your terminal using a for
loop:
>>> for word in file_content.split():
... if "secret" in word:
... print(word)
...
secret
secret.
secret,
secretly
In this example, you use .split()
to separate the text at whitespaces into strings, which Python packs into a list. Then you iterate over this list and use in
on each of these strings to see whether it contains the substring "secret"
.
Note: Instead of printing the substrings, you could also save them in a new list, for example by using a list comprehension with a conditional expression:
>>> [word for word in file_content.split() if "secret" in word]
['secret', 'secret.', 'secret,', 'secretly']
In this case, you build a list from only the words that contain the substring, which essentially filters the text.
Now that you can inspect all the substrings that Python identifies, you may notice that Python doesn’t care whether there are any characters after the substring "secret"
or not. It finds the word whether it’s followed by whitespace or punctuation. It even finds words such as "secretly"
.
That’s good to know, but what can you do if you want to place stricter conditions on your substring check?
Find a Substring With Conditions Using Regex
You may only want to match occurrences of your substring followed by punctuation, or identify words that contain the substring plus other letters, such as "secretly"
.
For such cases that require more involved string matching, you can use regular expressions, or regex, with Python’s re
module.
For example, if you want to find all the words that start with "secret"
but are then followed by at least one additional letter, then you can use the regex word character (\w
) followed by the plus quantifier (+
):
>>> import re
>>> file_content = """hi there and welcome.
... this is a special hidden file with a secret secret.
... i don't want to tell you the secret,
... but i do want to secretly tell you that i have one."""
>>> re.search(r"secret\w+", file_content)
<re.Match object; span=(128, 136), match='secretly'>
The re.search()
function returns both the substring that matched the condition as well as its start and end index positions—rather than just True
!
You can then access these attributes through methods on the Match
object, which is denoted by m
:
>>> m = re.search(r"secret\w+", file_content)
>>> m.group()
'secretly'
>>> m.span()
(128, 136)
These results give you a lot of flexibility to continue working with the matched substring.
For example, you could search for only the substrings that are followed by a comma (,
) or a period (.
):
>>> re.search(r"secret[\.,]", file_content)
<re.Match object; span=(66, 73), match='secret.'>
There are two potential matches in your text, but you only matched the first result fitting your query. When you use re.search()
, Python again finds only the first match. What if you wanted all the mentions of "secret"
that fit a certain condition?
To find all the matches using re
, you can work with re.findall()
:
>>> re.findall(r"secret[\.,]", file_content)
['secret.', 'secret,']
By using re.findall()
, you can find all the matches of the pattern in your text. Python saves all the matches as strings in a list for you.
When you use a capturing group, you can specify which part of the match you want to keep in your list by wrapping that part in parentheses:
>>> re.findall(r"(secret)[\.,]", file_content)
['secret', 'secret']
By wrapping secret in parentheses, you defined a single capturing group. The findall()
function returns a list of strings matching that capturing group, as long as there’s exactly one capturing group in the pattern. By adding the parentheses around secret, you managed to get rid of the punctuation!
Note: Remember that there were four occurrences of the substring "secret"
in your text, and by using re
, you filtered out two specific occurrences that you matched according to special conditions.
Using re.findall()
with match groups is a powerful way to extract substrings from your text. But you only get a list of strings, which means that you’ve lost the index positions that you had access to when you were using re.search()
.
If you want to keep that information around, then re
can give you all the matches in an iterator:
>>> for match in re.finditer(r"(secret)[\.,]", file_content):
... print(match)
...
<re.Match object; span=(66, 73), match='secret.'>
<re.Match object; span=(103, 110), match='secret,'>
When you use re.finditer()
and pass it a search pattern and your text content as arguments, you can access each Match
object that contains the substring, as well as its start and end index positions.
You may notice that the punctuation shows up in these results even though you’re still using the capturing group. That’s because the string representation of a Match
object displays the whole match rather than just the first capturing group.
But the Match
object is a powerful container of information and, like you’ve seen earlier, you can pick out just the information that you need:
>>> for match in re.finditer(r"(secret)[\.,]", file_content):
... print(match.group(1))
...
secret
secret
By calling .group()
and specifying that you want the first capturing group, you picked the word secret without the punctuation from each matched substring.
You can go into much more detail with your substring matching when you use regular expressions. Instead of just checking whether a string contains another string, you can search for substrings according to elaborate conditions.
Note: If you want to learn more about using capturing groups and composing more complex regex patterns, then you can dig deeper into regular expressions in Python.
Using regular expressions with re
is a good approach if you need information about the substrings, or if you need to continue working with them after you’ve found them in the text. But what if you’re working with tabular data? For that, you’ll turn to pandas.
Find a Substring in a pandas DataFrame Column
If you work with data that doesn’t come from a plain text file or from user input, but from a CSV file or an Excel sheet, then you could use the same approach as discussed above.
However, there’s a better way to identify which cells in a column contain a substring: you’ll use pandas! In this example, you’ll work with a CSV file that contains fake company names and slogans. You can download the file below if you want to work along:
Free Download: Click here to download the sample code that you’ll use to check if a string contains a substring.
When you’re working with tabular data in Python, it’s usually best to load it into a pandas DataFrame
first:
>>> import pandas as pd
>>> companies = pd.read_csv("companies.csv")
>>> companies.shape
(1000, 2)
>>> companies.head()
company slogan
0 Kuvalis-Nolan revolutionize next-generation metrics
1 Dietrich-Champlin envisioneer bleeding-edge functionalities
2 West Inc mesh user-centric infomediaries
3 Wehner LLC utilize sticky infomediaries
4 Langworth Inc reinvent magnetic networks
In this code block, you loaded a CSV file that contains one thousand rows of fake company data into a pandas DataFrame and inspected the first five rows using .head()
.
Note: You’ll need to create a virtual environment and install pandas in order to work with the library.
After you’ve loaded the data into the DataFrame, you can quickly query the whole pandas column to filter for entries that contain a substring:
>>> companies[companies.slogan.str.contains("secret")]
company slogan
7 Maggio LLC target secret niches
117 Kub and Sons brand secret methodologies
654 Koss-Zulauf syndicate secret paradigms
656 Bernier-Kihn secretly synthesize back-end bandwidth
921 Ward-Shields embrace secret e-commerce
945 Williamson Group unleash secret action-items
You can use .str.contains()
on a pandas column and pass it the substring as an argument to filter for rows that contain the substring.
Note: The indexing operator ([]
) and attribute operator (.
) offer intuitive ways of getting a single column or slice of a DataFrame.
However, if you’re working with production code that’s concerned with performance, pandas recommends using the optimized data access methods for indexing and selecting data.
When you’re working with .str.contains()
and you need more complex match scenarios, you can also use regular expressions! You just need to pass a regex-compliant search pattern as the substring argument:
>>> companies[companies.slogan.str.contains(r"secret\w+")]
company slogan
656 Bernier-Kihn secretly synthesize back-end bandwidth
In this code snippet, you’ve used the same pattern that you used earlier to match only words that contain secret but then continue with one or more word character (\w+
). Only one of the companies in this fake dataset seems to operate secretly!
You can write any complex regex pattern and pass it to .str.contains()
to carve from your pandas column just the rows that you need for your analysis.
FAQs
Like a persistent treasure hunter, you found each "secret"
, no matter how well it was hidden! You’ve covered a lot of ground, and here, you’ll find a few questions and answers that sum up the most important concepts that you’ve covered in this tutorial.
You can use these questions to check your understanding or to recap and solidify what you’ve just learned. After each question, you’ll find a brief explanation hidden in a collapsible section. Click the Show/Hide toggle to reveal the answer. Time to dive in!
The recommended operator to use in Python to check if a string contains a substring is the in
membership operator. This operator provides a quick and readable way to check whether a substring is present in a string.
Python strings are case sensitive, so if the substring that you provide uses different capitalization from the same word in your text, then Python won’t find it. By converting the whole input text to lowercase, you can disregard capitalization and make your substring check more generalized.
The .count()
and .index()
string methods in Python are not primarily meant for checking whether a string contains a substring. Instead, you use the .count()
method to count the occurrences of a substring in a string. On the other hand, you use the .index()
method to get the index position of the first character of the first occurrence of the substring.
To find substrings in Python according to more advanced conditions, you can use regular expressions with Python’s re
module. Regular expressions allow you to search for substrings according to elaborate conditions, such as finding all the words that start with a certain substring and are then followed by at least one additional letter.
To check which entries in a pandas DataFrame contain a substring, you can use the .str.contains()
method on a pandas column and pass it the substring as an argument. This will return a mask with True
values for all rows that contain the substring, and False
otherwise. You can use this mask to filter your DataFrame for only rows where the column contains the substring.
You now know how to pick the most idiomatic approach when you’re working with substrings in Python. Keep using the most descriptive method for the job, and you’ll write code that’s delightful to read and quick for others to understand.
Free Download: Click here to download the sample code that you’ll use to check if a string contains a substring.
Take the Quiz: Test your knowledge with our interactive “How to Check if a Python String Contains a Substring” quiz. You’ll receive a score upon completion to help you track your learning progress:
Interactive Quiz
How to Check if a Python String Contains a SubstringIn this quiz, you'll check your understanding of the best way to check whether a Python string contains a substring. You'll also revisit idiomatic ways to inspect the substring further, match substrings with conditions using regular expressions, and search for substrings in pandas.
Watch Now This tutorial has a related video course created by the Real Python team. Watch it together with the written tutorial to deepen your understanding: Check if a Python String Contains a Substring