Here are some resources for more information about topics covered in this lesson:
Calculating the Quiz and Exam Scores
00:00 Computing a final quiz score is going to be similar to how we did the homework score. Now, before we do that, I want to bring to your attention something that maybe you’ve already sort of noticed and maybe you were a little bit uneasy about how we renamed the columns in the DataFrame that we used over here to pull out the maximum points for the homework assignments.
00:27 We renamed the axes—axis 1, which is the column axis—using the same labels as the homework columns. Now, when we pulled out over here the columns that were associated with the maximum number of points for the homework assignments, we implicitly assumed an ordering of the columns, so, in particular, that these columns over here, they were coming out in the same order in terms of what actual homework assignment. This relies on the assumption that the actual CSV files also came ordered in terms of the columns for the homework, right? So we had Homework 1 and then the max points for Homework 1 and then Homework 2, and so on. And this is probably a good assumption, but to be safe what we could do is sort the headings, or the column names, for the max homework points and also for the homework columns.
01:40 Now, we may not have to do this. Again, this is probably going to be the case when we get that CSV file, but to be safe, it doesn’t hurt to do this. So if I run that—and let me introduce a new cell here just so that we can see these column headings—
01:59 now we’re going from Homework 1 and then 10. So lexicographically, Homework 10 comes before Homework 2 and 3 and 4 and so on. And the same thing for the headings for the homework assignments, we’re also going to start with 10. Now, because I ran this cell right here after computing a homework score and adding that new column, we’re also going to get that column.
Let’s go ahead and use what’s called the
.drop() method on on the DataFrame. We want to drop the column. And here, usually, if we want to drop more than one column, we can pass on a list containing the names of the columns, but in this case, we just want to drop the
"Homework Score" column.
This will return a new DataFrame without having a
"Homework Score" column, but we actually want to do this in place. The default value for the keyword argument
False but we want to do this in place, and so we’ll pass in a value of
03:20 So we’ll run that. That’s going to delete that column. Let me get rid of this, and let me go back over here and run that cell and then let’s now make sure that we’ve got just Homework 1 through 10, and same thing for the max columns.
03:41 But again, the reason why I think it’s a good idea to do this is because when we get that CSV file, there’s no guarantee that the homework assignments are going to be ordered, and so this will help in making sure that it’s a little bit more robust, our code. So we’ll get rid of this and we’ll get rid of that cell.
Let’s do that for the quizzes. What we need to do is, first, let’s pull out the quiz scores and we’re going to do this, instead of just typing out the quiz labels, we’re going to use the
04:43 Maybe this is just another way for you to pull out columns in the DataFrame, and we’re going to pass in what are the columns that we want to keep. We want to keep the values for Quiz 1, Quiz 2, Quiz 3, Quiz 4, and Quiz 5.
Now we can either pass in a list of columns or we can pass in a regular expression as a keyword argument. We want to use a raw string and we want to match the label
r"Quiz", and then the title of every column of the quiz is always
"Quiz", space, and then the number of the quiz.
The digit is just simply one digit, and so that’s how we would write down what the escape sequence would have to be to specify that we just want one digit after the space. And we want to match this so that the very first thing in the string for the label starts with
"Quiz ", and the digit is the last thing in the string, and so we use a dollar sign (
$). If you’re not familiar with regular expressions in Python and how they work, or just regular expressions in general, there’ll be a link to a course in Real Python that you can take a look at.
But again, we could just pass in a list of items that contain the heading labels that we want to keep. All right, now, the axis is the columns axis, so we have to pass a keyword argument of
And we’re going to need the number of quizzes because we’re going to be computing, again, an average of the averages, so let’s introduce a new variable—say,
n_quiz—and this is simply
quiz_scores, and here I’ll maybe make that
quiz_scores. Run that again.
.shape has a tuple. Let me just show you what that is. It’s a tuple containing the number of rows and the number of columns. And so the number of columns is
5, and that’s going to be the number of quizzes, so this is going to be the second element in the tuple. Now, unlike the homework assignments, where we had the maximum number of points as part of the DataFrame—for the quizzes, we don’t have that.
07:15 So this would be, perhaps, other information that you would have if you were actually, you know, creating a DataFrame for grades and computing grades—you would have the maximum number of points for the quizzes.
07:27 So we are going to create a new Series, that contains these quiz scores, or max scores. And again, this would just be part of the information that you would have if you were teaching such a course.
So, the quiz by total is the sort of easier one to think about. Let’s go
quiz_score_by_total. We just want the sum of the scores of the quizzes divided by the sum of all the maximum number of points.
So again, the idea is that we’re fixing each row, and so that corresponds to a student, and then we’re summing up along the columns, and so we pass in an
1, and then we’d simply want to divide by the maximum points, and we want to sum all of these maximum values for each of the quizzes.
quiz_max_points. We then want to add along the columns, and so we’ll have a
.sum(axis=1), and then we need to divide by the number of quizzes. All right? So again, we are taking an average of the ratios for each of the quiz scores.
All right! And so, lastly, just like we did with the homework, we need to find the max. And because these are two separate Series, we’re going to create a DataFrame by concatenating these two, so we’ll have
quiz_score_by_total and then
11:24 And so with that, we’ve computed the exam scores percentage-wise and the same thing for the homework and now for the quizzes, and so now we just need to go ahead and compute the final grade for each student. We’ll do that next.
Become a Member to join the conversation.