Understanding DataFrame Attributes
00:00 Let’s go over some of the ways that we can access the data in a DataFrame. We already did a little bit of this back in a previous lesson when we did the broad overview of pandas, so let’s quickly go over this.
Now, this is a sequence type object. So for example, I can access individual elements of this
RangeIndex by using list notation. So to access the first element,
101, and so on—the second element using the index value of
This returns an
Index object, and it’s also a sequence, so we can access individual elements of this
Index object just using regular list notation. So for example, if I want to access the third element, which would give me
'age', I can just use regular list notation. Now, both the
.columns attributes, they return
Index objects, and
Index objects are immutable.
However, I can change the entire index. So if we recall, this is a
108, and so, for example, if I wanted to change this to, say, a range that starts at
10 and goes to
16 by passing in an
arange NumPy object—so, this’ll go from
17. This will be similar to the Python’s
which does the same thing. Now, the pandas documentation suggests that you should use the
.to_numpy() method instead because it does offer a little bit of flexibility by passing in a couple of keyword arguments. So read up on that if you want to specify the data type of the resulting NumPy array, or if you want to use the original data from the DataFrame by passing in a
False value to the
copy keyword or a
True value if you want to make a copy of the data. Now, another important attribute of the DataFrame is the
This returns a
Series object with the column names as the labels and the corresponding data types as the values. So in this case, we see that the
name and the
city column, they both have a data type of
object, whereas the
age has an
int64 data type and the
py-score has a
object data type is going to be used for strings or if you have a column with mixed data types.
Most of the times, you’re going to rely on pandas to specify the data types when you create a DataFrame but if you did want it to change the data types, you could use the
.astype() method on a
DataFrame, and it relies on passing in a keyword argument called
dtype, which is a dictionary, and the keys of the dictionary are going to be the columns that you want to change and the values of the dictionary are going to be the data types that you want to convert to. So, for example, let’s suppose we wanted to change the
age column to have a NumPy
int32 data type and the
py-score column to have a NumPy
float32 data type. So this would save some memory.
All right, now let’s take a look at some attributes that give us the dimensions and the size of a pandas DataFrame, and these are going to be similar to the NumPy array attributes:
A DataFrame has a
.ndim attribute. This is the number of dimensions—in this case,
2. And then the
.size is going to return the total number of elements, so
28. And if we take a look at the
.shape, this is a 7 by 4 tabular DataFrame.
So in this case, the last two columns
py-score—they use 28 bytes of memory. That’s because each of the columns, they have seven values and it’s an integer data type, which takes up 32 bits, or 4 bytes, and 7 integers times 4 gives us 28 bytes.
Let’s do a quick recap of some of these attributes and basic methods that we’ve discussed. We went over the
.index and the
.columns attributes. These return the row labels of a DataFrame and the column labels.
This returns a 2D NumPy array of values. The
.dtypes attribute is a
Series object containing the data types of each of the columns and the index for the
Series object are the names of the columns.
And then if we wanted to change the data types of the DataFrame that we’re working with, we can use the
.astype() method, and this will return a new DataFrame with the specified data types of the columns that we want to change.
Then there are three attributes that describe the size of the DataFrame, and these are very similar to the attributes in a NumPy array. These are
.ndim returns the number of dimensions of the DataFrame,
.size is the total number of values, and then
.shape returns a tuple containing the size of each of the dimensions of the DataFrame.
Become a Member to join the conversation.