Filtering With Operators
00:00
Data filtering is another powerful feature of pandas. It works similarly to indexing with Boolean arrays in NumPy. Let me show you what I mean. Let’s suppose we pick off the 'py-score'
column and we use the Boolean comparison operator greater or equal to (>=
) and put a value of 40
.
00:22
What this will do is return a pandas Series, and the values of the Series that have a value of False
are going to correspond to the row that had a value less than 40
. So in our DataFrame, we’ve got the row with label 11
, score of 25
, and that is not greater than 40
, so we have a False
, and so on.
00:45
So what we could do is if we wanted to extract from this DataFrame, all of the candidates that had a score on their Python test of at least 40
, we could simply pass this pandas Series as a selector in the DataFrame. So for example, if I go like this and I’m simply adding this space at the right and left just for a little bit more readability for you to see, this is going to pick off only the rows where the score in the Python test was at least 40
.
01:16
And from this, we see that 15
, 14
, and row 16
all had that. So let’s run this, and let’s see what we get!
01:28
So, here we go. We’ve got all of the candidates whose score on the Python test was at least 40
. Now, if you’re going to be creating conditions like this that might get more complicated or with more logic, a good thing to do is to simply create a new pandas Series
.
01:46
Maybe we’ll call this filter_
, and maybe we’ll save that, then run that again on the DataFrame using the filter_
Series that we just created.
01:58
What this allows me to do is if I did also want to extract not only the candidates that scored at least 40
in their Python score, but also, say, at least 90
in their Django score, then I can use the AND operator (&
) to add in that we want also the rows whose JS score is at least 90
.
02:20
So if we compare these two on the left of the &
operator, we’ve got a pandas Series
object, which we saw before, and then on the right, we have another pandas Series
object. And so the &
operator is going to be comparing element-wise and checking if both of the elements are True
, then the resulting element for the resulting pandas Series is going to have a value of True
, and False
otherwise. I’ll run that.
02:47 This, actually, is a good thing that happened. We got an error. And the reason why we get an error is that with these operators, we need to enclose the conditions on the left or on the right—or if we have more conditions—using parentheses.
03:02
And the reason why we need to add the parentheses is because the bitwise &
operator, it can only act on integers. What’s happening here is that the bitwise operators AND (&
) and OR (|
), they have higher precedence than the comparison operators, and so Python is first evaluating the bitwise &
.
03:22
And we see there, we’ve got this 40
and also the column there on the right. And so this is what the error is. Now, in addition, though, you may be wondering why we’re using the bitwise &
operator.
03:36
The reason is because what we want to do is we want to do an element-wise AND, and so if we use the regular Boolean and
operator, then Python will implicitly obtain the truth values of each of the operands, and so we’ll return a final, either True
Boolean or False
Boolean, but what we want is an array of Boolean values.
03:59
And so pandas overwrites these bitwise &
and |
operators to do exactly that. That’s why we need to use this bitwise &
to get our element-wise AND operations. Let’s run that again.
04:16
We get that, and then if we run that again, then we see that we only get the candidates that had a Python score of at least 40
and a Django score of at least 90
.
04:29 If you recall, from our previous filter, that got rid of one candidate.
04:34
Now, you can use other operators, for example, the OR (|
) operator. So maybe instead of either asking for both the Python score and the JS score to be at least 40
for Python and 90
for the JS score, we can say, “Well, if one of the conditions is True
, then that’s the filter that we want.” So if we run that, in this case, we’ll probably get a few more candidates.
04:56
So here are the candidates that had either at least a 40
in their Python score—we’ve got three—or they got at least a 90
in their Django score.
05:08
All right, so that is a brief overview of the ways that you can filter data out of a DataFrame. There’s a couple other methods that can be considered as a filtering data from a DataFrame, and these are .where()
and .filter()
, and so we’ll take a look at those next.
Become a Member to join the conversation.