Filtering With .where() and .filter()
Another method that you may be interested in is called
.where() method on a DataFrame—it’s going to replace values in the DataFrame or in your Series or whichever one you’re working with.
That’s going to be our Boolean series. And so in this
js-score, if any of the values is not greater than
80, it’s going to be replaced by some other value that we want, and that’s passed in by this
other keyword argument.
So whenever you need to change or want to change a value in either a Series or in several columns in a DataFrame to some other type of default value when a certain condition isn’t satisfied, then your best bet is to use the
01:52 Instead, it filters it on the labels. So, for example, let’s suppose that we had a really large DataFrame that had columns that we were interested in, but it was too difficult for us to type out all of the columns.
But we knew, for instance, that all of the columns that we were interested in contained, say, the
"score" string. What we could do is, using the DataFrame, we’re going to call the
.filter() method can either be used by taking in a keyword argument called
items keyword argument would accept the names of the columns that we wanted to filter out. So, for example, we could filter out the
"py-score" columns and the
"js-score" columns, and also the
Now, we know how to do this separately if we were to just use the accessor methods, but this is another way to do that. So again, this is just going to pull out the columns that have the score. Now, if you had a lot of columns that were named
"score", what you could do instead is, instead of passing in the
items keyword argument, you would pass in the
like keyword arguments.
And what this is going to be doing is you’re going to pass in a string and then it’s going to keep the columns that contain whatever string you’re passing in in the name. So for example, we know that all of the columns that have a score in either the Python score, that JS score, or the Django score have, as a substring,
Now by default, the axis where the labels are going to be tested with this is going to be columns for a DataFrame. And so in this case, the
axis keyword is actually set to
1, and so this would give us that. Now, if we set this to
0, we’re probably going to get an empty DataFrame because, of course, none of the labels in our DataFrame contain the string
One other last thing to note is that instead of either
like, you can also pass in a regular expression. If you’re familiar with regular expressions, in this case, if we just wrote in
"score", this would be a substring of the column labels for the
"django-score", and the
04:42 We started off with filtering data using comparison operators, and the basic idea there is that you want to extract rows in the DataFrame where the value in a certain column satisfies a certain condition.
So in this example, we are focusing on the
'py-score' column. We want to know which values there have at least a value of
80. And then we saw that this returns a pandas
Series object of
False values. And whenever we have a
True, those are the rows that are extracted, so the corresponding row where that pandas Series has a value of
A way to do this is to use logical operators. So for example, if we also wanted to pull out all of those rows that had
'py-score' of at least
80 and a
'js-score' of at least
80, then we use the bitwise AND operator (
&). The bitwise AND operators, they are overwritten in pandas so that they can work in an element-wise way.
And so in this case, both of these pandas
Series objects are compared element-wise, and whenever both particular elements are
True, the resulting pandas Series at that element has a truth value. And in that case, those are the rows that we’re going to extract from the main DataFrame.
And whenever we have a
False value in a particular element in that
Series object, we’re going to replace the corresponding row in whatever column that we are specifying with a certain value that we want to set.
And if you combine this, say, with regular expressions—if you have a really large DataFrame and the columns have this sort of pattern that you want to match and you only want to keep those—then you could use a regular expression or the
like keyword argument, where if the columns that you’re interested in had a certain substring, you can use the
like keyword argument. All right!
07:20 So, that’s a rundown there on the different ways that you can filter either rows or columns from a pandas DataFrame. Up next, we’ll see how you can compute basic statistics on the columns of a DataFrame that contain numerical values.
Become a Member to join the conversation.