Changing Columns in a DataFrame

Explore Your Dataset With pandas Douglas Starnes 02:49

Transcript
Discussion (1)

00:00 In this lesson, you’ll actually make changes to the columns of the DataFrame. Start off by making a copy of the existing nba DataFrame.

00:10 You can create a new column just by assigning a value to it. The new column will be added dynamically. To store the point spread in each game, simply take the difference of the 'pts' (points) and 'opp_pts' (opposition points) columns.

00:25 This will create a Series, and since a column in a DataFrame is just a Series, simply assign to a column named 'difference'.

00:34 The new column 'difference' is appended to the DataFrame.

00:40 Using the aggregation functions from the previous lesson, you can see that the maximum point spread for a game is 68 points. You can also rename the columns.

00:51 Use the .rename() method to shorten several of the column names. The columns keyword argument expects a dictionary. The names of the existing columns are the keys and the updated names are the values.

01:04 In the updated DataFrame, game_result is now result while game_location is simply location.

01:13 Notice that .rename(), by default, returns the updated DataFrame, leaving the original untouched. However, you can provide the inplace keyword argument and set it to True.

01:24 This will modify the DataFrame that .rename() was called on. And this works with other methods as well. For example, the .drop() method will remove columns from a DataFrame.

01:37 Now, something went wrong here, and it’s a common mistake. The error tells you that the selected columns were not found in the axis. But what’s an axis? Remember that DataFrames have two dimensions: the rows and the columns.

01:53 These dimensions are the axes. The 0 axis is the rows and the 1 axis is the columns. By default, the axis is 0, so .drop() will try to find rows with labels in the elo_columns, and those don’t exist.

02:11 But if you set the axis keyword argument to 1, it will look in the columns and be able to remove them. And now all of the elo_columns have been removed. Or have they?

02:26 This is because, by default, .drop() returns the modified DataFrame. You could reassign this to the original or use the inplace keyword argument to handle it all in one step. Now nba_copy only has 21 columns.

02:43 In the next lesson, you’ll learn more about using specific data types.

Thomas on Aug. 5, 2021

Hi Douglas,

First of all: very interesting course.

I have a question: why do you decide at this point in time to work with a copy of your DataFrame?

Is this because now you really start to make changes to the DataFrame? But if that is the case, why exactly would you do that? You also can easily re-run your notebook load and work with the original data, right?

Kind regards,

Thomas

Become a Member to join the conversation.