Calculating the Homework Scores
and also if the
'Max' word, or string, is in the column heading. So if I run that, we just get the maximum number of points field names, and those are the ones that I want to pull out. So let’s take a look at those.
00:53 You see that, of course, not all the homework assignments are worth the same number of points. And when this happens, you have to make a decision about how you’re going to compute an average score for, in this case, the homework assignments.
01:11 The two ways to do that are, one, by total score. This involves taking the sum of all of the raw scores of the homework assignments and then the sum of all of the maximum points independently, and then we just simply take the ratio.
01:38 The second way is to take an average of the averages. In other words, we divide each raw score by its respective maximum points, so we get an average for each individual homework assignment, and then we just average the resulting ratios.
01:59 By total score, it will favor students who did well on homework assignments that were worth more points. So, for example, if a student did really well on the homework assignments that were worth 100 points or 90 points or 80 points, then that’s going to outweigh all of the assignments that maybe the student didn’t do so well that were worth fewer points.
02:22 Then vice versa, by average score is going to favor students who performed consistently. So if they did well in terms of percentage of the homework assignments throughout the semester, it’s going to favor students by average score.
02:38 What we’re going to want to do is compute both homework scores using total score and average score and then computing a maximum for each student. So let me just give you a quick example of how by total score or average score there is a difference.
The first homework assignment is worth
50 and the student scored
25, then the second homework assignment was worth
90 and the students scored
85, and so on. By total, all we do is simply take the sum of the homework scores and we just divide by the sum of all of the maximum number of points that the student could have earned for each homework. And then if we do it by average, we just simply take the ratio of each individual homework assignment.
03:35 So we take the score divided by the maximum number of points, take that ratio, and then we sum up all those ratios and we divide by the number of homework assignments. So in other words, we’re taking a ratio of all of the ratios. And then for this hypothetical data, by total score, the student would have a homework score of 82%, whereas by average score, they would only have a 76%.
That’s reflected in the fact that for the homework assignment that was worth
90 and the one that was worth
100, they did really well, whereas, for example, in the ones that were worth fewer—so
50—they didn’t do so well. So percentage-wise, they didn’t do well, but because these two, worth
100, the student did so well, it favors that particular student to have a homework score based on total score.
04:31 Let’s go back to Jupyter and do this computation. We’ll compute a homework score based on total score and a homework score based on average score and then we’ll take the maximum and that will be the final homework score for the student. So back here in Jupyter, we’re going to pull out the column names that have to do with the maximum number of points for the homework, and also just the scores for each individual homework assignment.
05:00 I’m going to keep this list comprehension, and as we saw, this is giving us the column names that contain the max points for each homework assignment. We’ll call this homework max, and these are the columns,
and then we’ll do a similar thing for the homework columns. I’ll take the same list comprehension and instead of having
'Max' in x we want
'Max' not in x. All right, so let me run that just so that you can see what these contain.
We know that
hw_max_cols (homework max columns), this just contains the heading names for the homework assignment max points, and then
hw_cols (homework columns), these are just the column names for the homework.
.sum() function in this DataFrame, we’re going to call it with the key argument parameter
axis=1. If I run that, that’s just giving me the sum of the scores of all of the homework assignments for each individual student.
Maybe what we’ll do is let’s save this by, say,
hw_score_by_total (homework score by total). Okay. Then we will create a new column in our DataFrame, so we’ll say
final_df, we’ll say
'HW by Total', and we’ll set that equal to this new Series that we just computed,
07:41 Let’s just make sure all is good. Let’s take a look at the first five rows. Now at the very end of the DataFrame, we’ve got that HW by Total. All right, so let’s clear that up and now let’s do everything by average.
08:01 Let’s get rid of this cell. To do this by average, what we sort of want to do abstractly is take the columns that give us the scores for the homework assignments and we want to divide these by the columns that give us the maximum number of points for each individual homework assignment.
For both of these DataFrames, the index is the same. This one is going to have the
NetID and so will the second one but there aren’t going to be any matching columns, so if we try to run this, we’re going to get a whole bunch of
NaN (not a number) because pandas doesn’t know because these two columns, the columns in each of these individual DataFrames, are distinct.
08:49 There’s nothing to divide, to match up column by column. So what we need to do is change the column names of, for example, this DataFrame so that they match the column names for the numerator DataFrame.
A way to do this is we’re going to use the
.set_axis() method for the DataFrame and we’re going to give the columns, which are contained in the axis number
1, which is the second axis—we want to simply call these columns the same name as the columns in the homework assignment scores. All right, so if we run this now, now we’ve got exactly the homework assignments and the percentages for all of the homework assignments.
09:43 Now, the average homework score, then, what we want to do is simply add all of these homework ratios for each student and then divide by the number of homework assignments. In other words, we want to do the same thing by computing the average of the averages.
That’ll be that DataFrame that contains the maximum number of points for each homework but now the columns are named
Homework 2, and so on. And then we’re going to take the
final_data[hw_cols] (homework columns), just like we had, and we’re going to divide this by
10:37 Then we want to sum up along the columns so that we get the sum of all of the homework ratios for each student. Then we want to take the average of those, so we want to divide by the length of the number of homework assignments.
For this, we can use, for example, the length—so, the number of columns, right? That would give us the total number of homework assignments. And this final series here is what we could call, say,
hw_score_by_avg (homework score by average).
And it looks like I made a little typo. This should be
final_df. All right, so, this is the Series, then, that we’re going to add to the DataFrame, and this one we’ll call
'HW by Average' and this is going to be set to this Series that we just computed, which is by average. All right, so let’s take a look at what we’ve got so far.
We’ve got another new column, and this is going to be
HW by Average. So, notice there is a little difference in some of them, right? If a student did particularly better in a homework that had more weight or more points, then they would have scored a little bit better than if we did the homework by the average.
Become a Member to join the conversation.