Filtering With Operators
Data filtering is another powerful feature of pandas. It works similarly to indexing with Boolean arrays in NumPy. Let me show you what I mean. Let’s suppose we pick off the
'py-score' column and we use the Boolean comparison operator greater or equal to (
>=) and put a value of
What this will do is return a pandas Series, and the values of the Series that have a value of
False are going to correspond to the row that had a value less than
40. So in our DataFrame, we’ve got the row with label
11, score of
25, and that is not greater than
40, so we have a
False, and so on.
So what we could do is if we wanted to extract from this DataFrame, all of the candidates that had a score on their Python test of at least
40, we could simply pass this pandas Series as a selector in the DataFrame. So for example, if I go like this and I’m simply adding this space at the right and left just for a little bit more readability for you to see, this is going to pick off only the rows where the score in the Python test was at least
And from this, we see that
14, and row
16 all had that. So let’s run this, and let’s see what we get!
So, here we go. We’ve got all of the candidates whose score on the Python test was at least
40. Now, if you’re going to be creating conditions like this that might get more complicated or with more logic, a good thing to do is to simply create a new pandas
Maybe we’ll call this
filter_, and maybe we’ll save that, then run that again on the DataFrame using the
filter_ Series that we just created.
What this allows me to do is if I did also want to extract not only the candidates that scored at least
40 in their Python score, but also, say, at least
90 in their Django score, then I can use the AND operator (
&) to add in that we want also the rows whose JS score is at least
So if we compare these two on the left of the
& operator, we’ve got a pandas
Series object, which we saw before, and then on the right, we have another pandas
Series object. And so the
& operator is going to be comparing element-wise and checking if both of the elements are
True, then the resulting element for the resulting pandas Series is going to have a value of
False otherwise. I’ll run that.
02:47 This, actually, is a good thing that happened. We got an error. And the reason why we get an error is that with these operators, we need to enclose the conditions on the left or on the right—or if we have more conditions—using parentheses.
And the reason why we need to add the parentheses is because the bitwise
& operator, it can only act on integers. What’s happening here is that the bitwise operators AND (
&) and OR (
|), they have higher precedence than the comparison operators, and so Python is first evaluating the bitwise
And we see there, we’ve got this
40 and also the column there on the right. And so this is what the error is. Now, in addition, though, you may be wondering why we’re using the bitwise
The reason is because what we want to do is we want to do an element-wise AND, and so if we use the regular Boolean
and operator, then Python will implicitly obtain the truth values of each of the operands, and so we’ll return a final, either
True Boolean or
False Boolean, but what we want is an array of Boolean values.
And so pandas overwrites these bitwise
| operators to do exactly that. That’s why we need to use this bitwise
& to get our element-wise AND operations. Let’s run that again.
We get that, and then if we run that again, then we see that we only get the candidates that had a Python score of at least
40 and a Django score of at least
04:29 If you recall, from our previous filter, that got rid of one candidate.
Now, you can use other operators, for example, the OR (
|) operator. So maybe instead of either asking for both the Python score and the JS score to be at least
40 for Python and
90 for the JS score, we can say, “Well, if one of the conditions is
True, then that’s the filter that we want.” So if we run that, in this case, we’ll probably get a few more candidates.
So here are the candidates that had either at least a
40 in their Python score—we’ve got three—or they got at least a
90 in their Django score.
All right, so that is a brief overview of the ways that you can filter data out of a DataFrame. There’s a couple other methods that can be considered as a filtering data from a DataFrame, and these are
.filter(), and so we’ll take a look at those next.
Become a Member to join the conversation.