Understanding DataFrame Attributes
00:00 Let’s go over some of the ways that we can access the data in a DataFrame. We already did a little bit of this back in a previous lesson when we did the broad overview of pandas, so let’s quickly go over this.
Probably one of the two most important attributes in a DataFrame are the
.index and the
.columns attributes. To take a look at the index of a DataFrame, we just simply type
And in this case, with our job candidates DataFrame, the index is
RangeIndex that starts at
101 and stops at
Now, this is a sequence type object. So for example, I can access individual elements of this
RangeIndex by using list notation. So to access the first element,
101, and so on—the second element using the index value of
To get the column labels, we use the
.columns attribute of a DataFrame.
This returns an
Index object, and it’s also a sequence, so we can access individual elements of this
Index object just using regular list notation. So for example, if I want to access the third element, which would give me
'age', I can just use regular list notation. Now, both the
.columns attributes, they return
Index objects, and
Index objects are immutable.
So for example, I could access an individual
Index object element, and if I wanted to change it to, say,
I would get a
TypeError because an
Index object in pandas doesn’t support mutable operations. And I would get a similar error if I wanted to change a individual element of the
However, I can change the entire index. So if we recall, this is a
108, and so, for example, if I wanted to change this to, say, a range that starts at
10 and goes to
16 by passing in an
arange NumPy object—so, this’ll go from
17. This will be similar to the Python’s
arange() function will create a range, and the stop value is not included in the actual numbers that are generated.
.index attribute or object is an
Int64Index object starting at
10 all the way up to
02:40 All right, so that’s how you access the index and the columns of a DataFrame. Now, if you remember, there’s a third piece to a DataFrame, and those are the actual values.
To access the values, you use the
This returns a two-dimensional NumPy array where each of the rows of the NumPy array are the rows of the DataFrame. There’s also a method called
which does the same thing. Now, the pandas documentation suggests that you should use the
.to_numpy() method instead because it does offer a little bit of flexibility by passing in a couple of keyword arguments. So read up on that if you want to specify the data type of the resulting NumPy array, or if you want to use the original data from the DataFrame by passing in a
False value to the
copy keyword or a
True value if you want to make a copy of the data. Now, another important attribute of the DataFrame is the
This returns a
Series object with the column names as the labels and the corresponding data types as the values. So in this case, we see that the
name and the
city column, they both have a data type of
object, whereas the
age has an
int64 data type and the
py-score has a
object data type is going to be used for strings or if you have a column with mixed data types.
Most of the times, you’re going to rely on pandas to specify the data types when you create a DataFrame but if you did want it to change the data types, you could use the
.astype() method on a
DataFrame, and it relies on passing in a keyword argument called
dtype, which is a dictionary, and the keys of the dictionary are going to be the columns that you want to change and the values of the dictionary are going to be the data types that you want to convert to. So, for example, let’s suppose we wanted to change the
age column to have a NumPy
int32 data type and the
py-score column to have a NumPy
float32 data type. So this would save some memory.
Now, this would return a new DataFrame, and so let’s just save this in a DataFrame called
df_. Let’s run that, and now let’s take a look at the data types for this new DataFrame.
Well, you see that the
age is now
int32 and the
All right, now let’s take a look at some attributes that give us the dimensions and the size of a pandas DataFrame, and these are going to be similar to the NumPy array attributes:
A DataFrame has a
.ndim attribute. This is the number of dimensions—in this case,
2. And then the
.size is going to return the total number of elements, so
28. And if we take a look at the
.shape, this is a 7 by 4 tabular DataFrame.
We’ve got 7 rows and 4 columns, and that’s why got a size of
28. 7 times 4? 28. And the last attribute or method that you might find useful is the amount of memory used by your DataFrame.
This is obtained by using the
.memory_usage() method. This returns a
Series object with the column names as the labels of the
Series object and the memory usage in bytes as the data values.
So in this case, the last two columns
py-score—they use 28 bytes of memory. That’s because each of the columns, they have seven values and it’s an integer data type, which takes up 32 bits, or 4 bytes, and 7 integers times 4 gives us 28 bytes.
Let’s do a quick recap of some of these attributes and basic methods that we’ve discussed. We went over the
.index and the
.columns attributes. These return the row labels of a DataFrame and the column labels.
The third component of a DataFrame are the values, and these are stored in the
.values attribute, which can also be obtained by using the
.to_numpy(), method on a DataFrame.
This returns a 2D NumPy array of values. The
.dtypes attribute is a
Series object containing the data types of each of the columns and the index for the
Series object are the names of the columns.
And then if we wanted to change the data types of the DataFrame that we’re working with, we can use the
.astype() method, and this will return a new DataFrame with the specified data types of the columns that we want to change.
Then there are three attributes that describe the size of the DataFrame, and these are very similar to the attributes in a NumPy array. These are
.ndim returns the number of dimensions of the DataFrame,
.size is the total number of values, and then
.shape returns a tuple containing the size of each of the dimensions of the DataFrame.
07:55 All right, so in the next lesson, we’ll talk about accessing and modifying data in a DataFrame.
Become a Member to join the conversation.