In this lesson, you’ll be using
.assign() to carry out a cleaning operation on your university towns data. The
.assign() method is called on a DataFrame, and it’s used to assign new columns and overwrite existing ones.
It uses keyword arguments with the column name, so if you have a column called
state, you can, say, within the
assign() function, you can call a keyword argument called
state = and you pass it in a new column. You can also use a name that doesn’t exist yet, and that will create a new column.
You can also use a function or a lambda function to transform the data. To understand
.assign() a bit better, take a look at this very simplified DataFrame—very similar to what was used before, except this time is instantiated with a dictionary.
The keys represent the column names, and the values are lists, which represent the values that end up in the table. So you can call
.assign() on the DataFrame directly, and then you can use a keyword argument that is the same as the column name.
And this will tell
.assign() that you want to replace
day with whatever is here. So you can simply return a list—say we’ll make it
29—and that will return a DataFrame with the
day replaced with those values.
So it’s important to understand what’s going on with this statement. So we’re going break it down. So you’ve got your DataFrame here,
data, and you’re calling the
.assign() method on it. And to the assigned method, you’re passing a keyword argument,
day, and then you’re passing in the
df here can be called whatever you like, but here it’s called
df because it stands for
DataFrame, and it’s because the
data DataFrame is passed in to the
lambda function as its argument. Back to the example … you’ve got the
lambda function that’s getting the
You just have to be sure that it accepts a DataFrame. All right. So back to the data … you’ve got everything here and you’re going to chain on an
.assign() function here, and you’re going to be working with
town—those are the columns—and within them, you’re going to use
and this one
"town". So right now, all that’s happening is that it’s assigning the
state column to the result of this
lambda function. The
lambda function is passed the DataFrame—the whole DataFrame, as it is at this stage—and then within the function, it selects the
"state" column and returns that to
You’re trying to remove the suffix, and in this case, the suffix is the
"" in square brackets string. It just seems to pass it a string, a normal string, no regex magic in here, just a normal string.
So we can put in
.removesuffix(), and we’ll pass in the string that we want to remove and run this and see if it, if it works. So now go press up (↑) to get the last command, Control + Enter to run this … and good, it seems that none of them contain
 anymore. Perfect.
Okay, so what is this doing? The
r is just a way to say that this string is a raw string so that it will interpret all these characters literally. Usually used for regex here, you’re starting your first capture group in the regex with these open and close.
This is the first capture group, and it’s the only capture group that is defined, so this is the only thing that will be returned, and you’re using any character and any number of any character. Then there’s going to be a space, and you’re literally capturing the first opening bracket (
If you look back at the data here, every town is followed by a space and an opening bracket, so that will be able to extract that. And then the
.extract() will return the first capture group as a series.
07:20 In this lesson, you’ve used the .assign()` method to carry out some more complex cleaning operations on your dataset. In the next lesson, you’ll be moving on to the third and final dataset of this course, the books dataset.
Become a Member to join the conversation.