Loading the Quiz Files
All right, so now we’re ready to load up all of the quiz files. Let’s go ahead and put a little bit of mark up here and we’ll just say
Load quiz grades. And if you remember, these files were called
quiz_ and then the number of the quiz and then
grades, and then we’ve got a CSV file, and there were five of these quizzes, so five of these files. Okay, so we’re going to load those up. Now, the way we’re going to do that is we’re going to create one DataFrame to contain all of the grades, all of the quiz grades.
00:39 What we’ll do is we’ll need to loop over each of the quiz files, load them up as DataFrames, and then simply concatenate them to a running DataFrame that we’re going to create as the final one that contains all the grades.
this is a
Path object and it’s a directory, and so it contains this, we can use the
.glob() method that will match a pattern and that will return a generator. All right, and each of the elements that the generator returns is a path to the file matching the pattern. So in this case, the pattern is
"quiz", underscore (
"_"), and that wildcard character (
*) is what’s changing in the different file names, and they’re all named
01:38 So, this will return a generator and we want to loop over the generator. It’s going to be returning a file path that matches the pattern. What do we want to do with this file? Well, we’re going to want to read it in, obviously, get the CSV file.
And what we want to do is the DataFrame that we’re building here, it’s going to contain five columns for each of the different quiz grade files, and what we want to do is we want to call each of those field names, just either
Quiz 3, depending on what file we’re reading in.
Then for the index, or the indices, of the
quiz_grades DataFrame, we’re going to be using the email address. If you remember these quiz files, they all contain the email address for the student.
02:33 Here’s what we’ll do. Let’s load up, individually, each of the quiz files. This is similar to what we did before. We’re going to read in the CSV file and this is going to be, of course, the file path, right?
03:14 We’re going to use the index to be the email address. If you remember in all these quiz files, we had just the first name and the last name of the student, email address, and then just simply the grade.
Then we are going to take each individual quiz and we’re just simply going to concatenate it to this running
quiz_grades DataFrame. So we’ll concatenate running
quiz_grades DataFrame with the current quiz DataFrame.
And we want to do this along the second axis. We don’t want to stack these DataFrames, we want to concatenate them along columns. In other words, we’re adding a new series, or a new column, containing the current quiz grade to this
Okay, so this looks good. The only thing is that, of course, because all of the fields or all of the columns in the individual quiz files were all
Grade, then that’s what they come in as, right? When we concatenate them into the
quiz_grades DataFrame, they’re all the same name. Now, of course, we know that if these were put in where this one was the first quiz, the second quiz, the third quiz, and the fourth and the fifth—assuming of course that the
.glob() object generated them in that order, and there’s no guarantee that that’s going to happen.
So what we want to do is before we concatenate the current quiz, let’s rename the column grades. We need to pass in a dictionary and the column, or the field, that we want to rename is the
What we want here is basically we need to pull out this information from the
file_path object, because that’s going to contain the quiz and then the number. And so here, what we’ll do is before we pass that into the
.rename() of the columns, let’s create a
So essentially, we’re going to be getting
"quiz_", the number, and then
"_grades", and that’s what’s in the stem. Then we’re going to take that string and we’re just going to capitalize the first letter of the string, which is
"quiz", and then we just simply want to split by underscore (
And we only want, essentially, this number, but we might as well just keep the
"quiz" as well. This will create a list. Right now, this will be a list containing the elements
And we only want the first two elements, so we’re going to go from the first to the second, so we go all the way up to
2. That’ll be a list, and then all we need to do is join using a blank space (
" ") so that we then get just the word
"Quiz", capitalized, and then with a space and then the number, right?
So this is essentially going to give us
"Quiz 3", and so on. That is what we want to rename the
"Grade" field when we create this DataFrame, and so here we’ll pass in
07:25 Oh, we got an error. All right, so it looks like I just forgot to write down that I want to join that list of elements—only the first two—just with a blank space. All right, so let’s run that again.
Notice that the
.glob() method doesn’t guarantee that there’s any type of sorting being done. It’s just sort of whatever the generator returns to us is the order in which we’re going to be creating these fields,, or these columns, and that’s fine.
and we’ve got all of the homework and exam grades in this DataFrame called
hw_exam_grades (homework exam grades). And then lastly, of course, we’ve got the roster, the roster contains just the NetIDs, the email address, and then the section number of the student.
Become a Member to join the conversation.