Deleting and Inserting Rows in a DataFrame
00:00 pandas provides several convenient ways to delete and insert rows and columns in a DataFrame. Let’s first work with rows.
00:09
Now, we’ve seen before that a row in a DataFrame is a pandas Series
object. So for example, let’s take a look at the row with label 16
.
00:20
This is the last row in our DataFrame, and this is a Series
object, a pandas Series
object. And the same thing for a column—for example, the age
column. That’s also a pandas Series
object.
00:35
Now, a Series
object is like a NumPy array that has labels—access labels. These are called, as we know, the index. So for example, if I take a look at the last row, the index is going to be the columns of the DataFrame where we extracted the Series
object. In this case, it’s name
, city
, age
, and py-score
. Also notice that it’s got this Name
attribute, which in this case is 16
, which corresponds to the row label in the original DataFrame.
01:07
So for example, if I take a look at .name
here, we’re going to get 16
. And if we do a similar thing with the column—say, for example, 'age'
—
01:17
in this case, the .name
is going to be 'age'
. And if we see the whole thing, in this case, the index is going to be 10
through 16
.
01:26
So, a way to think about what a pandas DataFrame is, is just a collection of Series
objects, whether you view them as rows or whether you view them as columns.
01:35
This gives us a way to think about how we can add either rows or columns to a DataFrame. So, for example, let’s suppose we wanted to add a new row corresponding to a new job candidate. So, for example, let’s suppose this is going to be a job candidate named john
. Let’s create a Series
object, so we call the constructor Series()
from the pandas
module.
02:01
This is going to take a couple of keyword arguments. So, data
—this is going to be the actual data representing this new row, which is going to be we need a name
, a city
, an age
, and, of course, a py-score
.
02:15
So we’re going to give this name, again, 'John'
. John’s from 'Boston'
.
02:22
John is 34
and he scored 79
in his Python test.
02:27 Now, we also want to give this series a index. And in this case, because we’re going to be creating a new row that we’re going to be adding to the DataFrame, we want the index… As we saw, when we pull out a row, we want the index to be the same as the column labels of the DataFrame.
02:44
So we’re going to pass in for the keyword argument index
the column labels of our DataFrame, which we know we can access via the .columns
method.
02:53
And then we’re going to pass in a new name, and the name is going to be the row label that we want to associate with this new row that we’re going to be adding, which, in this case, we’re going to put as 17
. All right, so that creates a new Series
object. Let’s go ahead and run that.
03:10
And now what we’ll do is add this using the .append()
method. So on the DataFrame, we’ve got the .append()
method. A very similar way to think about this as appending an element to a list.
03:23
In this case, we’re appending a row, and this is going to be a Series
object, and we want to append the john
Series
object that we just created. Now, this is going to append this new row to the DataFrame, but it’s going to return a new DataFrame.
03:38
We’re going to redefine the new DataFrame to be the same df
that we’ve been using all along. And so if we take a look at the DataFrame,
03:49
we’ve got that new row corresponding to label 17
, with the information for the john
candidate. Now to drop a row, we’re going to use the .drop()
method.
03:59
So if we go .drop()
and then we pass in what labels, what rows, we want to drop, we pass in a keyword argument for labels
, and this can be a list or it can be a single label. In this case, we’re going to drop the last row that we just added. And if we want, we’ll save this in the same name df
. And in this case, what’s happening is .drop()
is going to retain a new DataFrame, and so we can simply reassign the value of df
. So by default, the .drop()
method returns the pandas DataFrame with the specified rows removed.
04:34
In this case, we’re just removing one with label 17
. But there’s also a keyword argument called inplace
, which actually exists for many of the methods in pandas. The default value is False
, and so if we were to pass a value of True
, then in this case, the original DataFrame will be modified by removing that row, and what you’ll get is your return value of None
.
04:58 So we can either do it the way that we were just about to do it, or if we do it this way, then we take a look at the DataFrame and we’ve removed that last row.
05:10 All right! So, in this lesson, we’ve seen how we can delete and insert rows in a DataFrame. In the next lesson, let’s see how we can do this with columns.
Fuxuan Jia on March 13, 2023
py:1: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. df.append(John)
Martin Breuss RP Team on March 13, 2023
@Orlando Borden and @Fuxuan Jia that’s correct, with (and against) a lot of commentary, the core team has decided to deprecate .append()
.
You can learn how to use .concat()
in our course on
Combining Data in pandas With concat()
and merge()
.
Dina on July 2, 2023
From Stack Overflow I found this: Use loc to add to a pandas DataFrame: df.loc[17] = [‘John’, ‘Boston’, 34, 79]
samjeezy on Feb. 8, 2024
Thank you Dina! That df.loc[17]
method worked like a charm.
bhchurch6 on Oct. 4, 2024
It would have been helpful to show how to insert rows into a DataFrame rather than just add them to the end of the DataFrame. Did figure out how to use the to_frame() function inconjunction with the concat() function to replace the deprecated append() function.
Lahiru Ramesh on Nov. 13, 2024
We can use this way as per the documentation.
john = pd.Series(data=['John', 'Boston', 24, 20], index=df.columns, name=7)
pd.concat([df, john.to_frame().T])
Become a Member to join the conversation.
Orlando Borden on April 24, 2022
FYI, the append method is being deprecated.
“The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.”