Writing the Data to CSV Files

Using pandas to Make a Gradebook in Python Cesar Aguilar 07:19

00:00 To write out the data in separate CSV files for each section, let’s first define a column list of the columns that we want to write. We’ll call this, say, cols_to_write (columns to write).

00:14 The columns that we’ll want to write for each section is going to be, say, the student’s last name, first name,

00:24 email address,

00:28 the ceiling score, and the final grade.

00:35 The main idea is going to be that we want to pull out all of the students that were in section number 1. Now, this isn’t that hard to do. Of course, we could say something like “Let’s pull out the columns where the .Section column, say, is equal to 1.” This would be all of the grades just for the students in Section 1.

01:30 The .groupby() method will basically create groups based on a column or multiple columns. In this case, the column that we want is "Section".

01:43 This will create a GroupBy object, and it’s going to describe the groups based on, in this case, the column "Section". This object can also be iterated over.

01:57 Actually, let’s just take a look at this object. Let’s just call it g for now. We’re not going to use g this way, but let me just run that.

02:05 And so, for example, some of the attributes that this object has are, say, the .groups. What this will return is a dictionary and the keys are going to be the group names. These are the values in the sections—just 1, 2, and 3—and then we’ll have a list of all of the indices or all of the rows and their index labels that had a value of 1 for the "Section".

02:33 We can also get a group.

02:37 We can say .get_group(1), and so this is essentially equivalent to what we had before, where we’re simply getting the data just for the students that were in section number 1.

02:50 But a nice thing with this GroupBy object is that we can iterate over it. Let me get rid of a few cells here

03:59 So let’s create a variable that will store the name of the CSV file. We’re going to need the DATA_FOLDER variable that we had—this was a Path object—and the filename is going to be, let’s just call it "section" and then the actual section number.

04:18 This is the first element in the tuple. And then we’ll just call it "_grades.csv".

04:27 Then the DataFrame that we’re getting for the corresponding section, we only want to write these columns up here that we defined above, so let’s pull these out.

04:40 Then let’s sort things alphabetically. So, we’ve got this DataFrame consisting of just one particular section. Let’s sort this by "Last Name" and then "First Name" in case we have students with the same last name.

04:58 Then just call the .to_csv() method

05:03 with the section filename.

05:07 Let’s run that and… Oh yeah, we’re getting a NameError here. This was the DATA_DIR (data directory), so let’s just change that, run that again. And so there we go!

05:19 This will have created three CSV files containing the grades for each of the individual sections. And just to make sure that this actually worked, why don’t we, say, open one of these up?

05:34 Let’s read_csv(), and this is going to be "section",

05:42 and we should probably use the DATA_DIR Path object and say "section_1_grades.csv".

06:55 Maybe the last thing that we may want to do, just sort of from an analytical point of view, is just to see, “Well, we can check at the grade distribution,” and see if this sort of course performed on average worse than different courses or better and, in particular, see how well the grades are normally distributed.

07:15 We’ll do that next.

Become a Member to join the conversation.