Changing Columns in a DataFrame
00:00
In this lesson, you’ll actually make changes to the columns of the DataFrame
. Start off by making a copy of the existing nba
DataFrame
.
00:10
You can create a new column just by assigning a value to it. The new column will be added dynamically. To store the point spread in each game, simply take the difference of the 'pts'
(points) and 'opp_pts'
(opposition points) columns.
00:25
This will create a Series
, and since a column in a DataFrame
is just a Series
, simply assign to a column named 'difference'
.
00:34
The new column 'difference'
is appended to the DataFrame
.
00:40
Using the aggregation functions from the previous lesson, you can see that the maximum point spread for a game is 68
points. You can also rename the columns.
00:51
Use the .rename()
method to shorten several of the column names. The columns
keyword argument expects a dictionary. The names of the existing columns are the keys and the updated names are the values.
01:04
In the updated DataFrame
, game_result
is now result
while game_location
is simply location
.
01:13
Notice that .rename()
, by default, returns the updated DataFrame
, leaving the original untouched. However, you can provide the inplace
keyword argument and set it to True
.
01:24
This will modify the DataFrame
that .rename()
was called on. And this works with other methods as well. For example, the .drop()
method will remove columns from a DataFrame
.
01:37 Now, something went wrong here, and it’s a common mistake. The error tells you that the selected columns were not found in the axis. But what’s an axis? Remember that DataFrames have two dimensions: the rows and the columns.
01:53
These dimensions are the axes. The 0
axis is the rows and the 1
axis is the columns. By default, the axis is 0
, so .drop()
will try to find rows with labels in the elo_columns
, and those don’t exist.
02:11
But if you set the axis
keyword argument to 1
, it will look in the columns and be able to remove them. And now all of the elo_columns
have been removed. Or have they?
02:26
This is because, by default, .drop()
returns the modified DataFrame
. You could reassign this to the original or use the inplace
keyword argument to handle it all in one step. Now nba_copy
only has 21 columns.
02:43 In the next lesson, you’ll learn more about using specific data types.
Become a Member to join the conversation.
Thomas on Aug. 5, 2021
Hi Douglas,
First of all: very interesting course.
I have a question: why do you decide at this point in time to work with a copy of your DataFrame?
Is this because now you really start to make changes to the DataFrame? But if that is the case, why exactly would you do that? You also can easily re-run your notebook load and work with the original data, right?
Kind regards,
Thomas