Creating Columns With Arithmetic Operations and NumPy
You can apply basic arithmetic operations such as addition, subtraction, multiplication, and division to pandas
DataFrame objects in pretty much the same way as you would with NumPy arrays. So, for example, with a NumPy array, you could take a column and multiply the whole column by
2, and you could do the same thing with a DataFrame.
So here, we’re taking two columns, and these are going to be two pandas
Series objects. They’re going to have the same index, and so pandas will just know how to match up the indices and add up the corresponding elements.
So, for example, let’s suppose that I take the
'js-score' and the
'py-score', and I also want to add up the
'django-score'. Maybe here, the idea is that we want to find some sort of average, and maybe the
py-score—that’s going to be worth, say, 40% of the average.
And then maybe the other two scores are going to be worth
0.3, so we can bring in those numbers and these multiplication operations. This is going to give us a new Series. And maybe what we want to do is save the series as a column in our DataFrame, and that would, give us a total score for our job candidates.
Let’s create a new column, call it
'total', and we’ll create it using this arithmetic operation. So let’s run that, and then let’s take a look at our DataFrame, and so now we’ve got this sort of total score based on all of the columns in the DataFrame relating to the score for each of the candidates.
Now, in addition to using just the basic arithmetic operations, you can also use most NumPy and SciPy routines to pandas
DataFrame objects. So, let me show you another way that we could have done this.
I’m going to create a pandas
Series object, and I’m going to call it
wgts (weights). This is going to be basically keeping track of the weights of the individual tests, and that’ll give us another way to compute this
Then what we’ll do is, this
Series object, the index is the exact same as the column labels that we want to work with. So what we could do easily is simply, from the DataFrame, pull out the columns that we want. And these columns are the ones from the index of the
And then if we just multiply this by the
wgts Series, pandas knows that what we want to do here is take the
py-score value in the
Series object and multiply the column with the
py-score values, and similarly for the
django-score and the
js-score. And so this creates a DataFrame.
Then what we want to do is use the
sum() function in NumPy. Maybe we should import
numpy first, so let’s go
import numpy as np. In here, what we want to do is we want to take the
04:38 So in other words, we fix a column, and this is going to add up along the rows once we fix a column. So let me just show you that. We got that. Let me move this over here so that we’re not getting this exact same line.
So you’re basically saying “Sum along the columns,” right? We want to fix a row, sum it along the columns. That gives us, then, the total score in another way. And if we compare that over here, we’ve got
Xavier, and we’ve got
67, and so on, and that’s exactly what we are getting over here.
So, this would give us another way to define, or to create, that
total column in the DataFrame by combining the fact that we can multiply
Series objects with
DataFrame objects and use any of the NumPy basic routines on DataFrames.
05:56 All right! So, these are a few of the many things that you can do in pandas by combining basic arithmetic operations and some of the built-in NumPy routines on pandas Series and pandas DataFrames to use them to possibly create new columns in your DataFrame. All right, up next, we’ll take a look at sorting a pandas DataFrame.
Become a Member to join the conversation.