Working With Rows and Columns in DataFrames
00:00
A frequent operation that you’ll be doing on a DataFrame is to extract rows or columns. Let’s start with accessing, say, a column. Let’s access this city
column. Here, you’re going to be using notation that’s similar to a Python dictionary.
00:17
So in this case, you’re going to access the column with the label 'city'
, and this will return a pandas Series
object. You can think of a pandas Series
object as either an entire row or an entire column of a DataFrame
.
00:35
So if we check out the type of the 'city'
column, we get a pandas Series
. Let’s save this column in the variable, say, cities
.
00:49
And if we take a look at this again, we see that we not only extracted the data in that column, we also extracted the index, or the row labels. And so a pandas Series
object will also contain an .index
attribute,
01:07
which, in this case, will be the same as the .index
in the DataFrame, because we extracted an entire column of the DataFrame. Another way that you can extract a column is to use dot notation, but this will only work if the name of the column that you want to extract is a string that’s a valid Python identifier. So, for example, if we wanted to extract, say, the age
column, we would simply type .age
. And then, in this case, we get that Series
object.
01:37
But if we tried this with the Python score column, so .py-score
,
01:46
we’re going to get an AttributeError
because pandas thinks that we are extracting the column that’s called py
and we’re subtracting it from some other Series
object called score
.
01:58
So if we wanted to extract that py-score
column, we’d have to use the bracket notation and simply write out the full column name.
02:12
Now let’s talk about extracting rows. So the rows, if you remember, they’re found in the .index
attribute. We know that these are 101
to 108
, but not including 108
, so the last one’s 107
.
02:28
So to access a whole row, say with index value 103
, we use the .loc
accessor method. So the way to do this is to call the DataFrame with .loc
, and then bracket notation, and then the actual label of the row that we want to access, so let’s say 103
.
02:50
This returns a pandas Series
object as well. Let’s see the data type for that just to make sure.
02:59
And so we have a pandas Series
. Now, if you recall, we also have the cities
Series and its index is the exact same. However, with Series
objects, contrary to when we were working with a DataFrame
, where we had to use the .loc
accessor method, if we’re working with a Series
object, we can directly access the index just using bracket notation like this.
03:26
So that’s one key difference between Series
objects and DataFrame
objects. So if we go ahead and try that, we’ll directly get the only value for that index, which is 'Prague'
.
03:39
Whereas when we use the index value on a DataFrame
, we needed to use the .loc
method and we saw then, of course, that this returns a whole Series
object.
03:50
Whereas if we’re working with a Series
object and we use one of the values of the index to access a value, we’re going to get a single value.
04:02
All right! So, with this lesson and the previous lesson, you got a broad overview of the pandas
module and some of the basics with creating a pandas DataFrame and also working with some of the Series
objects that are built into a DataFrame.
04:17 We also quickly went over how you access rows and columns in a DataFrame. In the next lesson, what we’ll do is we’ll take a look at other ways to create a pandas DataFrame.
Become a Member to join the conversation.