Calculating the Quiz and Exam Scores

Using pandas to Make a Gradebook in Python Cesar Aguilar 11:42

Here are some resources for more information about topics covered in this lesson:

00:00 Computing a final quiz score is going to be similar to how we did the homework score. Now, before we do that, I want to bring to your attention something that maybe you’ve already sort of noticed and maybe you were a little bit uneasy about how we renamed the columns in the DataFrame that we used over here to pull out the maximum points for the homework assignments.

00:27 We renamed the axes—axis 1, which is the column axis—using the same labels as the homework columns. Now, when we pulled out over here the columns that were associated with the maximum number of points for the homework assignments, we implicitly assumed an ordering of the columns, so, in particular, that these columns over here, they were coming out in the same order in terms of what actual homework assignment. This relies on the assumption that the actual CSV files also came ordered in terms of the columns for the homework, right? So we had Homework 1 and then the max points for Homework 1 and then Homework 2, and so on. And this is probably a good assumption, but to be safe what we could do is sort the headings, or the column names, for the max homework points and also for the homework columns.

01:31 Then, that way, we’re guaranteed that they’re going to be sorted in the exact same order and we don’t introduce any type of bugs in our program.

01:40 Now, we may not have to do this. Again, this is probably going to be the case when we get that CSV file, but to be safe, it doesn’t hurt to do this. So if I run that—and let me introduce a new cell here just so that we can see these column headings—

01:59 now we’re going from Homework 1 and then 10. So lexicographically, Homework 10 comes before Homework 2 and 3 and 4 and so on. And the same thing for the headings for the homework assignments, we’re also going to start with 10. Now, because I ran this cell right here after computing a homework score and adding that new column, we’re also going to get that column.

02:26 So what we’re going to do is we’re going to delete the column for the homework score and simply rerun all of the code so that we make sure that we’re not introducing any bugs in our script.

02:39 Let’s go ahead and use what’s called the .drop() method on on the DataFrame. We want to drop the column. And here, usually, if we want to drop more than one column, we can pass on a list containing the names of the columns, but in this case, we just want to drop the "Homework Score" column.

03:01 This will return a new DataFrame without having a "Homework Score" column, but we actually want to do this in place. The default value for the keyword argument inplace is False but we want to do this in place, and so we’ll pass in a value of True.

03:20 So we’ll run that. That’s going to delete that column. Let me get rid of this, and let me go back over here and run that cell and then let’s now make sure that we’ve got just Homework 1 through 10, and same thing for the max columns.

03:41 But again, the reason why I think it’s a good idea to do this is because when we get that CSV file, there’s no guarantee that the homework assignments are going to be ordered, and so this will help in making sure that it’s a little bit more robust, our code. So we’ll get rid of this and we’ll get rid of that cell.

04:02 I’m hitting Escape and then just X. Now that we’ve got that, let’s just rerun this cell and the remaining cells,

04:14 so just Shift + Enter.

04:17 If I go down here and take a look—all right, we get the exact same values for the "Homework Score". Okay, so now let’s do the exact same thing that we did for the homework.

04:28 Let’s do that for the quizzes. What we need to do is, first, let’s pull out the quiz scores and we’re going to do this, instead of just typing out the quiz labels, we’re going to use the .filter() method.

04:43 Maybe this is just another way for you to pull out columns in the DataFrame, and we’re going to pass in what are the columns that we want to keep. We want to keep the values for Quiz 1, Quiz 2, Quiz 3, Quiz 4, and Quiz 5.

04:59 Now we can either pass in a list of columns or we can pass in a regular expression as a keyword argument. We want to use a raw string and we want to match the label r"Quiz", and then the title of every column of the quiz is always "Quiz", space, and then the number of the quiz.

05:21 The digit is just simply one digit, and so that’s how we would write down what the escape sequence would have to be to specify that we just want one digit after the space. And we want to match this so that the very first thing in the string for the label starts with "Quiz ", and the digit is the last thing in the string, and so we use a dollar sign ($). If you’re not familiar with regular expressions in Python and how they work, or just regular expressions in general, there’ll be a link to a course in Real Python that you can take a look at.

06:01 But again, we could just pass in a list of items that contain the heading labels that we want to keep. All right, now, the axis is the columns axis, so we have to pass a keyword argument of axis=1.

06:16 This is going to pull out the quiz scores, let me run that. Let’s just save this DataFrame in quiz_scores.

06:26 And we’re going to need the number of quizzes because we’re going to be computing, again, an average of the averages, so let’s introduce a new variable—say, n_quiz—and this is simply quiz_scores, and here I’ll maybe make that quiz_scores. Run that again.

06:43 And the .shape has a tuple. Let me just show you what that is. It’s a tuple containing the number of rows and the number of columns. And so the number of columns is 5, and that’s going to be the number of quizzes, so this is going to be the second element in the tuple. Now, unlike the homework assignments, where we had the maximum number of points as part of the DataFrame—for the quizzes, we don’t have that.

07:15 So this would be, perhaps, other information that you would have if you were actually, you know, creating a DataFrame for grades and computing grades—you would have the maximum number of points for the quizzes.

07:27 So we are going to create a new Series, that contains these quiz scores, or max scores. And again, this would just be part of the information that you would have if you were teaching such a course.

07:46 Now, we’re going to do this by passing in a dictionary and the keys are going to be the indices for this Series. The maximum score in quiz number 1 is 11,

07:58 and for quiz number 2, we’re going to say 15.

08:02 For quiz number 3, it’s 17.

08:08 For quiz number 4, 14. And lastly, quiz number 5 is 12. Okay, so this would just be part of the data that you know if you were computing final grades for a course.

08:22 So we’ve got Quiz 1, 2, 3, 4, 5. All right, that creates the Series, and so now we can go ahead and start computing the quiz score by total and the quiz scores by average.

08:37 So, the quiz by total is the sort of easier one to think about. Let’s go quiz_score_by_total. We just want the sum of the scores of the quizzes divided by the sum of all the maximum number of points.

08:53 In the quiz_scores DataFrame, we want to sum over the columns, right? So if I sum up along the columns, I want an axis of 1.

09:05 So again, the idea is that we’re fixing each row, and so that corresponds to a student, and then we’re summing up along the columns, and so we pass in an axis of 1, and then we’d simply want to divide by the maximum points, and we want to sum all of these maximum values for each of the quizzes.

09:27 So we’re simply adding up 11, 15, 17, 14, and 12.

09:33 All right, and so this is going to be the very first one by total. Then we need it also by average, so quiz_score_by_average, we’ll say.

09:45 And because quiz_max_points is a Series and the actual values of the indices are the same as the columns in the quiz_scores DataFrame, all we need to do is go quiz_scores

10:01 divided by quiz_max_points. We then want to add along the columns, and so we’ll have a .sum(axis=1), and then we need to divide by the number of quizzes. All right? So again, we are taking an average of the ratios for each of the quiz scores.

10:22 All right! And so, lastly, just like we did with the homework, we need to find the max. And because these are two separate Series, we’re going to create a DataFrame by concatenating these two, so we’ll have quiz_score_by_total and then quiz_score_by_average.

10:44 We want to concatenate these and not stack them. We then want to take the maximum of each of the rows, and so we pass in an axis of 1.

10:55 This is going to be our final quiz score for each of the students, and we’ll create a new column in our DataFrame and we’re going to call it 'Quiz Score'.

11:10 All right! So, let’s make sure things are okay.

11:15 So, the very last column that we’ve just added, that’s going to contain the quiz score for each student.

11:24 And so with that, we’ve computed the exam scores percentage-wise and the same thing for the homework and now for the quizzes, and so now we just need to go ahead and compute the final grade for each student. We’ll do that next.

fertorresmx on May 17, 2023

To get the same results on the final dataframe, you have to order quiz_ max_points series in the same order too:

quiz_max_points = pd.Series(
    {'Quiz 5': 12, 'Quiz 2': 15, 'Quiz 4': 14, 'Quiz 1': 11, 'Quiz 3': 17}

Become a Member to join the conversation.