Filter DataFrame Rows With .str.contains()
00:00
We’re going to stick with the example that we worked with before, and we’re going to keep looking for secrets. You can do this pretty effectively using pandas. Specifically, you can run it on a Series
object.
00:12
So you can access the slogan
column with this dot notation, and then use .str.contains()
, and then you need to pass the substring that you’re looking for.
00:26
So here we put in "secret"
. Now two things about that. What it returns might not be what you’re expecting, so you get a lot of falses here that you see, but this is essentially a mask that pandas creates.
00:40
It has the same amount of values as the Series
object that you called it on, and it has a value for False
if the substring wasn’t found for that value.
00:48
And somewhere in here, there’s a couple that have the value True
, when the substring was found. Now you can apply this mask on the original DataFrame.
00:59
You can do that by saying companies
, use the square bracket notation ([]
), and then what you had typed before: companies.slogan.str
.contains()
and then the substring "secret"
.
01:15
Okay, close, and close also the square bracket. And now if you press Enter, you see that the DataFrame was filtered for only the rows that contain the substring "secret"
inside of the slogan
column.
01:32
So I can see that these two, three, four, five, six rows from the DataFrame, the companies have slogans that contain the word "secret"
: "target secret niches"
, brand secret methodologies"
, "syndicate secret paradigms"
, et cetera.
01:48
But similar to using the in
operator in former lessons, searching for "secret"
like this also returns slogan columns that contain the word "secretly"
, not just "secret"
.
01:58 In the next lesson, you learn how to adjust your filter to be more precise.
Become a Member to join the conversation.