Locked learning resources

Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Locked learning resources

This lesson is for members only. Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Deleting and Inserting Columns in a DataFrame

Cesar Aguilar

The pandas DataFrame: Working With Data Efficiently Cesar Aguilar 05:16

Transcript
Discussion (2)

00:00 Now let’s go over how you would insert and delete columns. The way to do it in pandas is to follow the same idea as you would when you add or remove an item in a dictionary. So, for example, if we want to add a new column—say, in this case, the column is going to contain, say, the score in a JavaScript test—then we would simply create this new column by passing in the name of the column

00:24 and then the data for that column. So in this case, it’s going to be an array that contains the scores for some JavaScript test. This can either be a list or a NumPy array, so let’s just put in some values here. You can really put anything here, just as long as there are seven scores that we’re putting in, because we’ve got seven rows. And then in the same cell, we’ll just call the DataFrame so that we can see it after we’ve added that new column. And there we go, we’ve got that new column. Now, maybe another way to do it would also be to specify a single value, and that single value will be used to fill every row in a new column that you’re adding.

01:05 So for example, let’s suppose that we later wanted to add, say, some sort of total score column to the DataFrame.

01:12 Initially, we’ll set all values equal to zero, so we’ll just pass in a 0.0 single value. And then let’s take a look at the DataFrame after we do that.

01:22 We get a new column labeled total-score, all the values set to 0.0. Now notice that when we add a new column, by default, it gets placed at the very end.

01:34 So we add a new column at the very right of the DataFrame. And if it’s important to you to add a column at a particular location, then you can use the .insert() method. So, df.insert().

01:47 We need to pass in the location of the column where we want to add it at. So again, this is all zero-based index, so column name is 0, city is 1, age is 2, and so on. Let’s suppose we wanted to add this at position 4, so we’ll pass in a value of 4 for the loc keyword argument.

02:07 And we want to name this column, say, 'django-score',

02:13 and we’ll pass in the values. Let me just grab the list that we used up here to put in the js-score,

02:21 and we’ll use that same list for the values for the django-score.

02:27 Let’s also call the DataFrame so that we see it when we add it. So notice, 0, 1, 2, 3, 4 is the position of where this new column was added. Now let’s talk about deleting the columns.

02:41 Let’s suppose we wanted to delete that total-score column. We’re going to be using similar notation as you would with a Python dictionary, so we can use del (delete), and then the name of the column—in this case, 'total-score'.

02:56 And then again, let’s call the DataFrame after we do that. Now, total-score is gone.

03:02 Another way to do this is to use the .pop() method. The .pop() method will delete the column that you want. and it will also return it, very similar to how you would with a list or a dictionary.

03:15 Let’s add the total-score column again, set it all equal to 0.0,

03:24 and then let’s delete it. This time, we’ll delete it with the .pop() method, and this will be 'total-score'. And then, so you can see what this will return, let’s just save this with, say, the name total_score,

03:39 and then we’ll go ahead and view total_score. There we go. We’ve removed total-score and we saved it in a Series object.

03:49 Let’s take a look at our current version of the DataFrame, which has total-score column removed. Now, if you wanted to remove more than one column, you can use the .drop() method just like we did with the rows, but in this case, you would have to pass a keyword argument of 1 to axis.

04:08 So, with .drop(), we can pass in… If we want it to remove the 'age' column. By default, this is going to be looking for the row with label 'age', and of course, our DataFrame doesn’t have such a row.

04:22 Instead, we want to tell pandas that this is a column, and so we’re removing a Series that’s in as a column, so we pass in a value of 1. The default value is 0, which means that it’s going to be removing a row. In this case, we want to remove a column, so we just pass in axis=1. And again, we can either use inplace or if we don’t, this will return a new DataFrame.

04:46 Maybe we’ll call this df, and then we’ll view that value of df after we do that operation. And so now we’ve removed the age column. Again, if you wanted to remove more than one, you would just pass these on as a list, and so on. All right!

05:04 So that’s a rundown on inserting and deleting data in a DataFrame. Next up, we’ll talk about applying arithmetic operations on pandas Series and DataFrame objects.

macro84 on Nov. 9, 2021

I know this is not a help forum so not expecting anything but if somone can answer this great! I was hoping this tutorial would help me answer my question but it did not. I can not figure this one out. I have googled so many websites and followed so many videos but I simply am stuck. I am sure the solution is simple.... I have a dateframe whose output is as follows (date is the index):

Date         col1    col2   col3   col4   col5  
1959-01-01   NaN    NaN   1.35   4.21    NaN
1959-02-01   NaN    NaN   2.14   6.30    5.75
1959-03-01   1.97   NaN   NaN    7.35    6.23
1959-04-01   2.19   3.14  NaN    NaN     7.15
1959-05-01   3.16   2.74  NaN    NaN     8.42
1959-06-01   2.91   3.63  NaN    NaN     8.36
1959-07-01   2.72   4.98  NaN    NaN     NaN

I want to delete columns that have NaN between the dates 1959-03-01 and 1959-06-01. I want the output to look like this:

Date         col1      col5  
1959-01-01   NaN      NaN
1959-02-01   NaN      5.75
1959-03-01   1.97     6.23
1959-04-01   2.19     7.15
1959-05-01   3.16     8.42
1959-06-01   2.91     8.36
1959-07-01   2.72     NaN

Thanks for your help!

torrepreciado on Jan. 19, 2022

@macro84 I’ve come up with a couple of ideas for your question:

min_date = pd.Timestamp(year=1959, month=3, day=1)
max_date = pd.Timestamp(year=1959, month=6, day=1)

1st way

index_dates = df.index[[min_date <= x <= max_date for x in df.index]]
df.dropna(axis=1, subset=index_dates, how='any', inplace=True)

2nd way

index_dates = [min_date <= x <= max_date for x in df.index]
mask_na = df.loc[index_dates].isna().any()
df = df.loc[:, ~mask_na]

Become a Member to join the conversation.