Sorting DataFrames
00:00
Sorting is going to be one of the main operations that you do on a DataFrame. To sort a DataFrame, we’re going to use the .sort_values()
method, which takes one required positional argument.
00:12
This is going to be the column that we want to sort by. So, for example, if we wanted to sort our DataFrame by the js-score
column, we simply type in the name of the column.
00:25
The default is to go from low to high. If we want it to change the order—in other words, to go from highest to lowest—we pass in a value to the ascending
keyword argument.
00:39
The default is True
, and so it would be sorting from smallest to highest, but if we want to change that and simply sort from highest to lowest, we pass in a value of False
. In this case, we’re going from highest JS score to lowest JS score.
00:57
Now, the name of this required positional argument is by
, and so we can also type that in just to make it a little bit more readable, and we can also sort by more than one criteria or more than one column name.
01:12
So, for example, notice right here that in the js-score
, there are two candidates that have a 91.0
. And so if we then wanted to also sort by a second criteria or a second column, we would pass into the by
keyword argument a second column, or as many as we wanted. So, for example, if we then wanted to sort by py-score
, and we also wanted this to be, say, from highest to lowest—so we would pass in a False
for the py-score
criteria—
01:44
then we get js-score
from highest to lowest and then py-score
would be the tie breaker for any values that have a value of the same for the js-score
. So in this case, we’re going to be sorting these two candidates that had the same js-score
from highest to lowest in terms of the py-score
.
02:06
Now, similar to a lot of the other methods that we’ve looked at already for DataFrames, the .sort_values()
is going to return a new DataFrame.
02:15
So we could either create a new DataFrame or just redefine the DataFrame df
that we’ve been working with, or we can also use the inplace
keyword argument. In this case, the default value is False
and if we pass in True
and run that and then take a look at the current state of our DataFrame, we’ve got it sorted. Again, js-score
from highest to lowest, and then any JS scores that are equal, the tiebreaker is going to be py-score
and that’s also going to be from highest to lowest.
02:50 It’s worth mentioning that you can also sort a pandas DataFrame where the constraint is a row instead of a column, and therefore what you’ll actually be doing is sorting the columns. Now in this case, that really doesn’t make sense because for any given row, we’ve got different data types. We’ve got strings and we’ve got floats.
03:09
But depending on what you’re doing, you may actually want to sort the columns. Because sorting is an important operation on a DataFrame, let’s do a quick recap of the different keyword arguments to the .sort_values()
method.
03:23
The .sort_values()
method takes on one required positional argument and several keyword arguments, and it actually takes in more keyword arguments than the ones that are listed here, but these are probably the most used ones, and so we’ll focus on these.
03:40
The required positional argument by
, as we saw, is a string or is going to be a list of strings. This is going to determine what column or what list of columns are going to be used to sort the rows in the DataFrame.
03:56
The axis
keyword argument, the default value is 0
, which means that you want to be sorting the rows. And another way to write instead of 0
for a value for the axis
keyword argument is to write down 'index'
, and so that makes it a little bit more readable. Instead of this value of 0
, you’re saying that we want to sort the rows, and so you pass in the string 'index'
.
04:18
If, however, you want to sort the columns, then you can pass in a value of 1
for axis
or the string 'columns'
, and so then that would mean that the string or list of strings that you pass into the by
argument is going to be the names of the labels that you want to use as the constraints or the key. So in other words, you want to use a particular row to sort the columns by and you would then pass in a value of 'columns'
or a value of 1
.
04:48
Next we saw is the ascending
keyword argument. This is a Boolean value. The default value is True
, which means that the sorting will be done in ascending order.
04:57
If you want it the other way, just pass in False
. And then the inplace
keyword argument. Again, this is a Boolean. The default value is False
, which means that whenever you call the .sort_values()
method on a DataFrame, you’re going to be returned a new DataFrame.
05:13
But if, instead, you want to sort inplace, then pass in a value of True
to the inplace
keyword argument.
05:21 Coming up next, we’ll take a look at the general operation of filtering data from a DataFrame.
Become a Member to join the conversation.