Writing the Data to CSV Files
00:00
To write out the data in separate CSV files for each section, let’s first define a column list of the columns that we want to write. We’ll call this, say, cols_to_write
(columns to write).
00:14 The columns that we’ll want to write for each section is going to be, say, the student’s last name, first name,
00:28 the ceiling score, and the final grade.
00:35
The main idea is going to be that we want to pull out all of the students that were in section number 1. Now, this isn’t that hard to do. Of course, we could say something like “Let’s pull out the columns where the .Section
column, say, is equal to 1
.” This would be all of the grades just for the students in Section 1.
00:58
And instead of pulling out all of the columns, we just care about the columns that we’re going to be writing to the CSV file, and this is going to be, again, "Last Name"
, "First Name"
, "Email Address"
, the "Ceiling Score"
, and the "Final Grade"
, and these are all the students in Section 1. So basically, at this point here, we could simply write to the CSV file and do this for each of the individual sections. Now, there is a nice function that’s used quite a bit, though, and that is the .groupby()
function.
01:30
The .groupby()
method will basically create groups based on a column or multiple columns. In this case, the column that we want is "Section"
.
01:43
This will create a GroupBy
object, and it’s going to describe the groups based on, in this case, the column "Section"
. This object can also be iterated over.
01:57
Actually, let’s just take a look at this object. Let’s just call it g
for now. We’re not going to use g
this way, but let me just run that.
02:05
And so, for example, some of the attributes that this object has are, say, the .groups
. What this will return is a dictionary and the keys are going to be the group names. These are the values in the sections—just 1
, 2
, and 3
—and then we’ll have a list of all of the indices or all of the rows and their index labels that had a value of 1
for the "Section"
.
02:33 We can also get a group.
02:37
We can say .get_group(1)
, and so this is essentially equivalent to what we had before, where we’re simply getting the data just for the students that were in section number 1.
02:50
But a nice thing with this GroupBy
object is that we can iterate over it. Let me get rid of a few cells here
03:01
and let me go back over here. If we use this in a for
loop—so, for
and then we’ll have section
and group
, or maybe table
. Okay, so when we use the GroupBy
object as an iterator, this will create a generator and what the generator will return is the name of the section—so, the value that defines the particular section. In this case, the section
variable will take on the values 1
, 2
, and 3
, and then the second element in the tuple returned by the generator is going to be the DataFrame that consists of all of the grades that have a value of section
. What we want to do with this DataFrame—maybe just to make it clear, this is df
(DataFrame)—we’re going to want to write the data for that section in a CSV file.
03:59
So let’s create a variable that will store the name of the CSV file. We’re going to need the DATA_FOLDER
variable that we had—this was a Path
object—and the filename is going to be, let’s just call it "section"
and then the actual section number.
04:18
This is the first element in the tuple. And then we’ll just call it "_grades.csv"
.
04:27 Then the DataFrame that we’re getting for the corresponding section, we only want to write these columns up here that we defined above, so let’s pull these out.
04:40
Then let’s sort things alphabetically. So, we’ve got this DataFrame consisting of just one particular section. Let’s sort this by "Last Name"
and then "First Name"
in case we have students with the same last name.
04:58
Then just call the .to_csv()
method
05:03 with the section filename.
05:07
Let’s run that and… Oh yeah, we’re getting a NameError
here. This was the DATA_DIR
(data directory), so let’s just change that, run that again. And so there we go!
05:19 This will have created three CSV files containing the grades for each of the individual sections. And just to make sure that this actually worked, why don’t we, say, open one of these up?
05:34
Let’s read_csv()
, and this is going to be "section"
,
05:42
and we should probably use the DATA_DIR
Path
object and say "section_1_grades.csv"
.
05:56 And it looks like that worked well! So, you know, this is sort of maybe the last thing in this particular case where you’ve got data that has some sort of column where you naturally can group the data by, and in this case, we take that information of the sections and create different CSV files for the grades for each section. If we go and try, say, section number 2, make sure that was done well.
06:25 We’ve got those students. And then section number 3, and that’s done as well. So, this pretty much does the job of finding the final grades and then writing all of the grades to individual CSV files based on section, and we can now use this Jupyter Notebook if we have different data that’s coming in from a different course and making, possibly, some changes just to make it a little bit more robust to handle different types of assessments.
06:55 Maybe the last thing that we may want to do, just sort of from an analytical point of view, is just to see, “Well, we can check at the grade distribution,” and see if this sort of course performed on average worse than different courses or better and, in particular, see how well the grades are normally distributed.
Become a Member to join the conversation.