Plotting and Analyzing the Data
One of the first things we may want to do is check the letter grade distribution. This is fairly straightforward. In the
'Final Grade' column containing all the letter grades, we may want to do a simple count of how many A’s, how many B’s, how many C’s, and so on. We can use the
The type of plot that we want is a bar graph. This gives us just a basic letter grade distribution, and we see that nobody got an
'A', nobody got an
'F', and the majority of the grades were a
We want to plot the data and we want to use a histogram and let’s say
20 bins. We’re also going to want to compare the grade distribution, say, with the normal distribution and we may also want an estimate of the distribution, so we’re going to superimpose a couple of figures.
We’re going to load the
scipy.stats module, which contains a function that will give us the values of the normal distribution. We’re going to obtain the values and then we’re going to plot them, so we want to import the
pyplot module as
plt. What we want to do is first generate a set of
x points, and we want to do this from the mean of the actual grade, but five standard deviations away so that we’re getting a range.
So we want to go from the
5 standard deviations to the right, and this should be
grade_std. And we want to use
200 points. The
y values, we want to get the values of the normal distribution with this mean and this standard deviation, and that’s where the
scipy.stats module comes in.
It has a function called
.pdf(), which gives us the actual values of the normal distribution at these
x values. Using a mean of the actual
grade_mean and and we want to use the standard deviation, which is the
scale keyword argument.
We want to use the standard deviation there. We want to plot these
y values together, or superimposed on top of the histogram, and so we’ll plot
y. We’ll label this as the
and maybe we use a
3. Let’s go ahead and run that. So, there we go. We’ve got the actual grade distribution, and superimposed we’ve got the normal distribution. And we should probably put in a legend here.
'Final Score' column. This will be the density plot, so here we’re obtaining what’s called the kernel density estimate, so it’s an estimate of the actual distribution of the grades, so a continuous distribution.
06:08 We see that both the normal distribution and the kernel density estimate do a pretty good job of estimating the grades, and so we could probably say this is a fairly average class in terms of the grade distribution.
06:24 So, these are just a couple of things that you may want to do to take a look at some of the statistical analysis of the grade distribution, but overall, the kernel density estimate and the normal distribution do a pretty good job of matching the data. All right, let’s wrap things up with the summary.
Become a Member to join the conversation.