Here are some resources for more information about topics covered in this lesson:
Iterating Over Rows and Columns in DataFrames
pandas provides several convenient methods for iterating over rows or columns of a DataFrame. Let’s start off with columns. If you recall, when we created this job candidates DataFrame, we used a dictionary. And a dictionary, as you may know, has a method called
.items(), and this
.items() method will create a generator, and the generator will yield a tuple containing the names of the keys and the names of the values of the dictionary.
What this generator will yield if we get the first item and maybe save it in a variable, it’s going to be a tuple. The first element of the tuple is going to be the column label, and the second element is going to be the actual column, and the actual column is going to be a pandas
Series object. So if I run that—
01:15 And that is the first column label. And the column is the actual first column of the DataFrame, so that is all of the names of the job candidates. If you’re not familiar with generators, the basic idea is that the generator will yield one item at a time.
And one way to do that is to use the
next() function. Now, what we want to do is we want to iterate over all the columns, and so what we’ll do is we’ll say
for col_label… and the actual column
in the generator object.
And also the actual columns—let’s print
col_label. And of course, once you’re in your
for loop, depending on what you want to do, you would use these column objects, which are going to be pandas
Series objects as we saw over here.
Let’s also just create a little bit of separation in between the column label. And then we’ll also add two newline characters so that we separate each of these column labels and column pandas
The first element of the tuple is the column label and the first one was
name. Then the second element of the tuple is actually going to be the pandas Series that is the column. Okay. Again, the column is returned as a pandas
Series object. And that’s it!
That’s what the
.items() method returns. Now, there’s another method called
.iteritems(), and it does the exact same thing as
.items(). Again, it returns a generator, and the generator yields a tuple, column label and the actual column as a pandas
03:34 This will also return a generator, and the generator will return or yield, one at a time, a tuple. The first element of the tuple is a row label and the second one is going to be the actual row.
We’ll go ahead and we’ll print the same thing. We’ll print the actual row label and we’ll print the row, and let’s separate those with a newline character and separate with two newline characters after we print the row. So again, the order in which the generator yields the items is going to be the same as the order in the DataFrame. So we’ve got
Let’s go back down here. So, that’s the way that you would iterate over the rows, very similar to how you would do it with the columns. So, you may want to sort of remember a way to do it by remembering two methods, which would be
.iteritems() for columns and
.iterrows() for the rows. Now, there is one other method that returns a generator that would be used to iterate over the rows, and that’s called
.itertuples(). So let me get rid of
.iterrows() here and let me write
.itertuples(), and what
.itertuples() returns is just a generator that yields named tuples that represent each of the row data.
namedtuple. Basic idea is that it’s a tuple where the index or the indices or the accessors are not integers but actual fields, and these fields, you can give them names. So, for example, the names or the fields in these namedtuples are going to be exactly the columns.
All right, and then if we run that, we’ve got another way to iterate over the rows. Now, another thing that the
.itertuples() method accepts is, if you notice, when we ran initially the first print statement.
So if I come back over here, it included the
Index. That would have been one of the fields that we could access using dot notation. If we don’t want the
Index, for whatever reason, we would pass into the
index keyword argument
07:23 So those are the basic methods that you’re going to be using when you want to iterate over either the rows or the columns of a pandas DataFrame. Coming up next, we’re going to take a look at how we can work with time series in pandas.
Become a Member to join the conversation.