Iterating Over Rows and Columns in DataFrames
Here are some resources for more information about topics covered in this lesson:
00:00
pandas provides several convenient methods for iterating over rows or columns of a DataFrame. Let’s start off with columns. If you recall, when we created this job candidates DataFrame, we used a dictionary. And a dictionary, as you may know, has a method called .items()
, and this .items()
method will create a generator, and the generator will yield a tuple containing the names of the keys and the names of the values of the dictionary.
00:31
It turns out that pandas has a similar method called .items()
. So in the df
DataFrame, we’ve got an .items()
method, and this returns a generator.
00:45
What this generator will yield if we get the first item and maybe save it in a variable, it’s going to be a tuple. The first element of the tuple is going to be the column label, and the second element is going to be the actual column, and the actual column is going to be a pandas Series
object. So if I run that—
01:10 let’s take a look at our DataFrame—the first element should be the name.
01:15 And that is the first column label. And the column is the actual first column of the DataFrame, so that is all of the names of the job candidates. If you’re not familiar with generators, the basic idea is that the generator will yield one item at a time.
01:33
And one way to do that is to use the next()
function. Now, what we want to do is we want to iterate over all the columns, and so what we’ll do is we’ll say for col_label
… and the actual column in
the generator object.
01:50
This will create an iterator, which we’ll be able to just use as the iterator in our for
loop. And let’s just go ahead and print out all of the labels one at a time.
01:58
And also the actual columns—let’s print col_label
. And of course, once you’re in your for
loop, depending on what you want to do, you would use these column objects, which are going to be pandas Series
objects as we saw over here.
02:15
Let’s also just create a little bit of separation in between the column label. And then we’ll also add two newline characters so that we separate each of these column labels and column pandas Series
objects.
02:30
Let’s go ahead and run that. Again, the basic idea is that the .items()
method on a DataFrame creates a generator, and the generator is going to yield a tuple.
02:42
The first element of the tuple is the column label and the first one was name
. Then the second element of the tuple is actually going to be the pandas Series that is the column. Okay. Again, the column is returned as a pandas Series
object. And that’s it!
02:58
That’s what the .items()
method returns. Now, there’s another method called .iteritems()
, and it does the exact same thing as .items()
. Again, it returns a generator, and the generator yields a tuple, column label and the actual column as a pandas Series
object.
03:19 So that’s the way that you would iterate over the columns of a pandas DataFrame. Now let’s talk about iterating over the rows.
03:30
The method here is called .iterrows()
.
03:34 This will also return a generator, and the generator will return or yield, one at a time, a tuple. The first element of the tuple is a row label and the second one is going to be the actual row.
03:49
And, of course, this row will be a pandas Series
object.
03:54
We’ll go ahead and we’ll print the same thing. We’ll print the actual row label and we’ll print the row, and let’s separate those with a newline character and separate with two newline characters after we print the row. So again, the order in which the generator yields the items is going to be the same as the order in the DataFrame. So we’ve got 11
, 15
, 14
.
04:21
And if I go back up here, we take a look at the actual DataFrame. 11
, 15
, 14
—the order in the actual DataFrame.
04:31
Let’s go back down here. So, that’s the way that you would iterate over the rows, very similar to how you would do it with the columns. So, you may want to sort of remember a way to do it by remembering two methods, which would be .iteritems()
for columns and .iterrows()
for the rows. Now, there is one other method that returns a generator that would be used to iterate over the rows, and that’s called .itertuples()
. So let me get rid of .iterrows()
here and let me write .itertuples()
, and what .itertuples()
returns is just a generator that yields named tuples that represent each of the row data.
05:14
And if you’re not familiar with the namedtuple
, there will be a link to some resources that you can check out at realpython.com on more info on namedtuples.
05:25 Let me just print this out so you can see what these are.
05:30
So, a namedtuple
. Basic idea is that it’s a tuple where the index or the indices or the accessors are not integers but actual fields, and these fields, you can give them names. So, for example, the names or the fields in these namedtuples are going to be exactly the columns.
05:49
So this will also include the Index
for that particular row, and then the name
value, the city
value, and so on. And so I can access each individual field using dot notation.
06:02 So if we write, let’s just print out the candidate’s name, and we can do this by just using dot notation. And then we’ll also print out the city,
06:16
so we’ve got row.city
. And then maybe we’ll print out .total
.
06:23
All right, and then if we run that, we’ve got another way to iterate over the rows. Now, another thing that the .itertuples()
method accepts is, if you notice, when we ran initially the first print statement.
06:39
So if I come back over here, it included the Index
. That would have been one of the fields that we could access using dot notation. If we don’t want the Index
, for whatever reason, we would pass into the index
keyword argument False
.
06:57 And so now there’s no index. And then also, if we don’t like this generic pandas name for our namedtuples, we can pass in something that’s a little bit more aligned with our application.
07:08
So in our case, this is a DataFrame where the rows are job candidates. We may want to do that instead. Okay, so "JobCandidate"
might be a better name for these namedtuples.
07:23 So those are the basic methods that you’re going to be using when you want to iterate over either the rows or the columns of a pandas DataFrame. Coming up next, we’re going to take a look at how we can work with time series in pandas.
Become a Member to join the conversation.
tonypy on March 23, 2023
Just an FYI wrt the use of df.iteritems()
FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead. for col_label, col in df.iteritems():