Using pandas to Make a Gradebook (Overview)

One of the jobs that all teachers have in common is evaluating students. Whether you use exams, homework assignments, quizzes, or projects, you usually have to turn students’ scores into a letter grade at the end of the term. This often involves a bunch of calculations that you might do in a spreadsheet. Instead, you can consider using Python and pandas.

In this course, you’ll learn how to:

  • Load and merge data from multiple sources with pandas
  • Filter and group data in a pandas DataFrame
  • Calculate and plot grades in a pandas DataFrame

Here are some resources for more information about topics covered in this lesson:


Sample Code (.zip)

54.9 KB

Course Slides (.pdf)

1.5 MB

00:00 Hi, I’m Cesar. Welcome to this Real Python project-based video course on using pandas to create a gradebook.

00:09 The pandas module is a data analysis and manipulation tool. It’s fast and it’s powerful and it’s easy to use. pandas is written on top of the NumPy module, which is widely considered as the de facto scientific computing module in Python. Let’s go over what this course is about.

00:30 As I said, this is a project-based course using pandas to solve a real-world data problem.

00:36 The goal of the project is to compute the final grade for each student in a large course containing 150 students and divided into 3 sections.

00:46 The data associated with each student is spread out in various CSV files. This is the type of situation that you may encounter in a real-world problem where you’ve got different sources of data, different services providing you that data.

01:00 So maybe you’ve got one service that’s giving you, say, the name of the students and basic identifier information and then maybe you’ve got another service that’s keeping track of, say, homework assignments and quizzes, and then maybe another service that’s keeping track of exams, and then you need to merge this data so that you can compute a final grade for each student. The grade is going to be computed using several assessments, like I just mentioned.

01:27 So things like many homework assignments and quizzes and exams, and each of these different assessments will have a different weighting to compute the final grade. Although not completely necessary, you’ll get the most out of this course if you know a little bit about pandas already or maybe NumPy. If you know only NumPy, then I still think you should be able to follow along, but let me give you a few resources in any case in case you either just need to freshen up on pandas, or maybe you’ve never worked with pandas before. There are a few really good tutorials in that go over some of the basics of pandas.

02:04 So working with the data structures in pandas, DataFrames and Series, and also just getting practice with data visualization. So if you just need a refresher or you’re very new to pandas, then check these out just to get caught up on pandas.

02:21 We’re also going to be using the Jupyter Notebook. If you don’t have much experience with that, here’s another really good tutorial that you may want to check out.

02:29 It’s a Jupyter Notebook introduction in Real Python. But it’s really easy to learn and if you’re already writing code in an editor, then, you know, this should be pretty straightforward for you to pick up.

02:42 All right, let’s do a quick overview of what we’re going to do in this course. The first thing we’re going to want to do is just explore the data. All of our sources come to us in CSV files, so we’re going to be able to open these files up in an editor, just do a quick exploration of the data—what the fields are, and just to see if there’s anything in particular that we may want to be careful about.

03:04 Then we’ll load the data and we’ll be careful in case we need to do some sort of conversion of the field names in the files, just so that it makes it easy for us to, say, merge the data, which is what we’ll do next.

03:17 We’ll merge the data so that we get one big DataFrame, and then on that DataFrame, we’ll be able to do all of our computations, which will be the fourth thing. We’ll be using just basic arithmetic operations on the columns of the DataFrame.

03:33 We’ll use some of the built-in functions in NumPy like the max function and the ceiling function, and again, just work with basic arithmetic operations to compute the final grade.

03:44 After we’ve computed the final grades, we’re going to subdivide all of the grades into different files, and each file is going to correspond to each of the sections.

03:54 So, the whole class is divided up into three sections, and we just want to make sure that we can subdivide the data so that we got the grades that correspond to each section so that those CSV files can be used for other purposes, like submitting final grades in some sort of online form system.

04:13 And then just for fun, we’ll do a little bit of plotting and some basic statistics just to give us a little bit more practice with some of these built-in features in the pandas module.

04:24 All right! So, I think you’ll really enjoy this course. If you like to learn via project-based learning, or if you just need a refresher with pandas, or maybe you have NumPy experience and you want to see what pandas is all about, then I really think you’ll enjoy this course.

04:41 So, let’s go ahead and start exploring the data.

Become a Member to join the conversation.