Grouping the Data to Calculate Final Scores
00:00 To compute the final grade, let’s first define the weights that we’ll use in the final computation. We’ll use a Series for this, and we’re going to name the keys in the dictionary that we’ll pass in.
00:17 These are going to serve as the index for the Series. We want to name them exactly as we’ve named the columns in the DataFrame that store the exam scores, the homework score, and the quiz score.
00:31
So we’ll have "Exam 1 Score"
and the percentage there was 5%. Then we’ll also have "Exam 2 Score"
, and in this case, it was 10%. Then "Exam 3 Score"
, this was 15%.
00:57
this was going to be 30%. Then the "Homework Score"
01:06 And so now what we need to do is from the main DataFrame pull out those five columns, multiply them by the weights—each column—and then just add up the columns. All right, so let’s run that.
01:22 And what we’re saying is from the final DataFrame, let’s pull out the index associated with the weights,
01:32 because these are the columns that we need. Exam 1, 2, 3, quiz score, and homework score. Then we simply multiply them by the weights, and so these are just going to be these numbers that are less than 1.
01:48
And we want to sum each row, and so let’s sum with axis=1
. And that would be it! So this would give us the final score for each individual student.
02:05
Let’s call this, say, in the final DataFrame. We’ll introduce a new column and this will simply be, say, 'Final Score'
. All right. Now, likely we’re going to want to input these scores by putting in an integer and not a decimal number, not something less than one.
02:26
So what we’ll do is we’ll take this 'Final Score'
column, multiply it by 100
—so we’ll take each of those scores and multiply by 100
—and then finally we get to use the NumPy module.
02:41
We’ll be rounding those up, and so we’ll have 75.0
, 80.0
, and so on. Let’s call this the 'Ceiling Score'
.
02:54 This is the rounded score.
02:58 Now, it’s likely that once we have these ceiling scores, or these rounded scores, we need to compute a letter grade, which will actually be the final data that we’re going to have to input in some sort of system at, say, the registrar’s office.
03:12
What we’ll do is we’ll simply define a function that will compute the letter grade based on the actual numerical grade. Let’s define a utility function. We’ll call it, say, get_letter_grade()
.
03:27
It’ll take in a score or a grade and then simply if the score is greater or equal to, say, 90
, then this is going to be an 'A'
.
03:40
If it greater than, say, 80
and including 80
, then we’ll return a 'B'
,
03:49
and then we’ll sort of keep doing this filtering down of the score. If it’s a 70
, then this will be a 'C'
,
04:00
greater than 60
or equal to 60
, this will be a 'D'
, and then lastly, anything under a 60
is going to be an 'F'
.
04:12
Hopefully, we don’t have too many F’s in our course. All right, so score
comes in. If it’s greater than or equal to 90
, that’s an 'A'
.
04:20
If it’s greater or equal to 80
, that’s a 'B'
, and so on all the way down to an 'F'
. So, we’ll define this utility function.
04:27
Then in the 'Ceiling Score'
column,
04:33
we’re going to use the .map()
method. We can map this function at every cell in this Series. If we take a look at this, we’re going to be getting the letter grades for each individual student.
04:54
Now, so that we can later plot and maybe compare letter grades if we wanted to do some sort of visualization of letter grades—say, from 'F'
to 'A'
—and we wanted these to be ordered or to be interpreted as being ordered, because, you know, usually we would think of 'A'
as being the highest grade and 'B'
is the second highest grade, and so on, when we actually add this to our main DataFrame, we want to do this by adding a Categorical
series, and so let’s define sort of a temporary letter_grades
Series. And then in our main DataFrame, we’re going to add one more column called 'Final Grade'
, and this is going to be a Categorical
series.
05:46
The data for this is this newly computed letter_grade
Series, and the categories
are just the letters. The lowest grade is going to be 'F'
and then the next is 'D'
, 'C'
, 'B'
, and 'A'
.
06:04
Now, a category may not necessarily have an order. So, for example, if the data was on colors and the colors were maybe, you know, red, green, blue, and whatever, then these wouldn’t necessarily have some sort of order to them, although they possibly could depending on your application. But in this case, the letter grades are ordered. 'F'
is usually considered the lowest grade, and then 'D'
, 'C'
, 'B'
, and 'A'
.
06:31
There is a keyword argument called ordered
and the default is False
but we actually, in this case, want it to be True
because we do want the values of this series to be interpreted or have a relationship of being ordered.
06:47 So let’s run that, and then if we just take a look at the Series
06:54
with all of the grades, we see that we’ve got a data type as category
, we’ve got all the letter grades, and the lowest grade is 'F'
, and then 'D'
, 'C'
, 'B'
, and 'A'
. So for the purposes of computing the final grade for each student, this pretty much does it.
07:12 And then maybe one last thing that we want to do, you know, once we’ve computed the final grades, we’re probably going to have to upload the data somewhere.
07:22 We’ll have to do this, possibly, via section. We have all of these students and there are three sections, and so maybe the last thing that we want to do is to create CSV files containing the grades for each of the individual sections.
Become a Member to join the conversation.