Calculating the Homework Scores

Using pandas to Make a Gradebook in Python Cesar Aguilar 13:17

Transcript
Discussion (1)

00:00 Let’s take a look at all of the columns that have the maximum number of points for the homework assignments. I just want to view these, I want to show you something.

00:09 Let’s pull these out using a list comprehension. I’m going to take a look at the columns in the DataFrame. This will be simply a Series of the titles for the columns.

00:22 And I want to keep those, So I’ll say x for x in that contain the word 'Home', because that’ll contain 'Homework',

00:35 and also if the 'Max' word, or string, is in the column heading. So if I run that, we just get the maximum number of points field names, and those are the ones that I want to pull out. So let’s take a look at those.

00:53 You see that, of course, not all the homework assignments are worth the same number of points. And when this happens, you have to make a decision about how you’re going to compute an average score for, in this case, the homework assignments.

01:07 There’s a couple of ways to do that.

01:11 The two ways to do that are, one, by total score. This involves taking the sum of all of the raw scores of the homework assignments and then the sum of all of the maximum points independently, and then we just simply take the ratio.

01:27 We just sum up all the scores that the student earned for the homework assignments and we divide it by the sum of all the maximum points, and then we get our percentage that way.

01:38 The second way is to take an average of the averages. In other words, we divide each raw score by its respective maximum points, so we get an average for each individual homework assignment, and then we just average the resulting ratios.

01:54 We’re taking an average of the averages.

01:59 By total score, it will favor students who did well on homework assignments that were worth more points. So, for example, if a student did really well on the homework assignments that were worth 100 points or 90 points or 80 points, then that’s going to outweigh all of the assignments that maybe the student didn’t do so well that were worth fewer points.

02:22 Then vice versa, by average score it’s going to favor students who performed consistently. So if they did well in terms of percentage of the homework assignments throughout the semester, it’s going to favor students by average score.

02:38 What we’re going to want to do is compute both homework scores using total score and average score and then computing a maximum for each student. So let me just give you a quick example of how by total score or average score there is a difference.

02:57 Here’s just some hypothetical data. We’ve got a list of homework scores and then we’ve got the maximum number of points for each of the homework assignments.

03:07 The first homework assignment is worth 50 and the student scored 25, then the second homework assignment was worth 90 and the students scored 85, and so on. By total, all we do is simply take the sum of the homework scores and we just divide by the sum of all of the maximum number of points that the student could have earned for each homework. And then if we do it by average, we just simply take the ratio of each individual homework assignment.

03:35 So we take the score divided by the maximum number of points, take that ratio, and then we sum up all those ratios and we divide by the number of homework assignments. So in other words, we’re taking a ratio of all of the ratios. And then for this hypothetical data, by total score, the student would have a homework score of 82%, whereas by average score, they would only have a 76%.

04:03 That’s reflected in the fact that for the homework assignment that was worth 90 and the one that was worth 100, they did really well, whereas, for example, in the ones that were worth fewer—so 50—they didn’t do so well. So percentage-wise, they didn’t do well, but because these two, worth 90 and 100, the student did so well, it favors that particular student to have a homework score based on total score.

04:31 Let’s go back to Jupyter and do this computation. We’ll compute a homework score based on total score and a homework score based on average score and then we’ll take the maximum and that will be the final homework score for the student. So back here in Jupyter, we’re going to pull out the column names that have to do with the maximum number of points for the homework, and also just the scores for each individual homework assignment.

05:00 I’m going to keep this list comprehension, and as we saw, this is giving us the column names that contain the max points for each homework assignment. We’ll call this “homework max,” and these are the columns,

05:15 and then we’ll do a similar thing for the homework columns. I’ll take the same list comprehension and instead of having 'Max' in x we want 'Max' not in x. All right, so let me run that just so that you can see what these contain.

05:37 We know that hw_max_cols (homework max columns), this just contains the heading names for the homework assignment max points, and then hw_cols (homework columns), these are just the column names for the homework.

05:53 The reason why I’m doing this is just so that we can easily pull these out whenever we need to, when we’re computing the homework score by total or by average.

06:04 All right, so let’s clear that up.

06:07 Let’s first compute the homework score by total. This is the one that’s pretty straightforward because all we need to do is, from the DataFrame, let’s pull out the homework scores.

06:20 These are just the homework values, that’s its own DataFrame. And then all we need to do is sum along the columns, right? So we want to fix a row and sum along the columns.

06:32 So the .sum() function in this DataFrame, we’re going to call it with the key argument parameter axis=1. If I run that, that’s just giving me the sum of the scores of all of the homework assignments for each individual student.

06:50 Then all we need to do is divide this by the maximum number of points.

07:00 And likewise, we need to sum along the columns, and so we’re going to pass in an axis value of 1. Let me run that. And so that’s the easy one. This is the homework score by total.

07:14 Maybe what we’ll do is let’s save this by, say, hw_score_by_total (homework score by total). Okay. Then we will create a new column in our DataFrame, so we’ll say final_df, we’ll say 'HW by Total', and we’ll set that equal to this new Series that we just computed, hw_score_by_total.

07:41 Let’s just make sure all is good. Let’s take a look at the first five rows. Now at the very end of the DataFrame, we’ve got that HW by Total. All right, so let’s clear that up and now let’s do everything by average.

08:01 Let’s get rid of this cell. To do this by average, what we sort of want to do abstractly is take the columns that give us the scores for the homework assignments and we want to divide these by the columns that give us the maximum number of points for each individual homework assignment.

08:24 For both of these DataFrames, the index is the same. This one is going to have the NetID and so will the second one but there aren’t going to be any matching columns, so if we try to run this, we’re going to get a whole bunch of NaN (not a number) because pandas doesn’t know, because these two columns, the columns in each of these individual DataFrames, are distinct, there’s nothing to divide, to match up column by column.

08:53 So what we need to do is change the column names of, for example, this DataFrame so that they match the column names for the numerator DataFrame. A way to do this is we’re going to use the .set_axis() method for the DataFrame and we’re going to give the columns, which are contained in the axis number 1, which is the second axis—we want to simply call these columns the same name as the columns in the homework assignment scores. All right, so if we run this now, now we’ve got exactly the homework assignments and the percentages for all of the homework assignments.

09:43 Now, the average homework score, then, what we want to do is simply add all of these homework ratios for each student and then divide by the number of homework assignments. In other words, we want to do the same thing by computing the average of the averages.

10:01 So, because this is getting a little long, let’s take that DataFrame and define, say, a new DataFrame called hw_max_data (homework max data).

10:13 That’ll be that DataFrame that contains the maximum number of points for each homework but now the columns are named Homework 1, Homework 2, and so on. And then we’re going to take

10:28 the final_data[hw_cols] (homework columns), just like we had, and we’re going to divide this by hw_max_data.

10:37 Then we want to sum up along the columns so that we get the sum of all of the homework ratios for each student. Then we want to take the average of those, so we want to divide by the length of the number of homework assignments.

10:54 For this, we can use, for example, the length—so, the number of columns, right? That would give us the total number of homework assignments. And this final series here is what we could call, say, hw_score_by_avg (homework score by average).

11:15 And it looks like I made a little typo. This should be final_df. All right, so, this is the Series, then, that we’re going to add to the DataFrame, and this one we’ll call 'HW by Average' and this is going to be set to this Series that we just computed, which is by average. All right, so let’s take a look at what we’ve got so far.

11:41 We’ve got another new column, and this is going to be HW by Average. So, notice there is a little difference in some of them, right? If a student did particularly better in a homework that had more weight or more points, then they would have scored a little bit better than if we did the homework by the average.

12:05 And then the final homework score that we want to assign to each student is going to be the maximum of these two columns. So if I pull out

12:19 'HW by Total', and let’s also pull out 'HW by Average',

12:28 and we want to simply take the maximum

12:33 of these two values per row, and so we want to take the max along the columns—this is then going to give us the final

12:44 column that will determine the homework score for each student. Let’s take a look at that,

12:55 and that’s the final homework score. So far, we’ve got the exam scores for 1, 2, and 3. We’ve got the homework score. All of these are percentages. We’ll multiply these by the weights.

13:09 The only thing we need to do now is do what we did for the homework score to the quizzes.

jrtirado5933 on Aug. 24, 2021

Anyone know how we can view all columns if we’re running this in PyCharm?

Become a Member to join the conversation.