Writing the Data to CSV Files
The main idea is going to be that we want to pull out all of the students that were in section number 1. Now, this isn’t that hard to do. Of course, we could say something like “Let’s pull out the columns where the
Section column say is equal to
1.” This would be all of the grades just for the students in Section 1.
And instead of pulling out all of the columns, we just care about the columns that we’re going to be writing to the CSV file, and this is going to be, again,
"Email Address", the
"Ceiling Score", and the
"Final Grade", and these are all the students in Section 1. So basically, at this point here, we could simply write to the CSV file and do this for each of the individual sections. Now, there is a nice function that’s used quite a bit, though, and that is the
And so, for example, some of the attributes that this object has are, say, the
.groups. What this will return is a dictionary and the keys are going to be the group names. These are the values in the sections—just
3—and then we’ll have a list of all of the indices or all of the rows and their index labels that had a value of
1 for the
and let me go back over here. If we use this in a
for and then we’ll have
group, or maybe
table. Okay, so when we use the
GroupBy object as an iterator, this will create a generator and what the generator will return is the name of the section—so, the value that defines the particular section. In this case, the
section variable will take on the values
3, and then the second element in the tuple returned by the generator is going to be the DataFrame that consists of all of the grades that have a value of
section. What we want to do with this DataFrame—maybe just to make it clear, this is
df (DataFrame)—we’re going to want to write the data for that section in a CSV file.
So let’s create a variable that will store the name of the CSV file. We’re going to need the
DATA_FOLDER variable that we had—this was a
Path object—and the filename is going to be, let’s just call it
f"section" and then the actual section number.
Then let’s sort things alphabetically. So, we’ve got this DataFrame consisting of just one particular section. Let’s sort this by
"Last Name" and then
"First Name" in case we have students with the same last name.
05:56 And it looks like that worked well! So, you know, this is a sort of maybe the last thing in this particular case where you’ve got data that has some sort of column where you naturally can group the data by, and in this case, we take that information of the sections and create different CSV files for the grades for each section. If we go in and try, say, section number 2, make sure that was done well.
06:25 We’ve got those students. And then section number 3, and that’s done as well. So, this pretty much does the job of finding the final grades and then writing all of the grades to individual CSV files based on section, and we can now use this Jupyter Notebook if we have different data that’s coming in from a different course and making, possibly, some changes just to make it a little bit more robust to handle different types of assessments.
06:55 Maybe the last thing that we may want to do, just sort of from an analytical point of view, is just to see, “Well, we can check at the grade distribution,” and see if this sort of course performed on average worse than different courses or better and, in particular, see how well the grades are normally distributed.
Become a Member to join the conversation.