Determining Ratios
00:00 Determining Ratios. Vertical and horizontal bar charts are often a good choice if you want to see the difference between your categories. If you’re interested in ratios, however, pie plots are the way to go.
00:14
Since cat_totals
contains a few smaller categories, creating a pie plot with the code seen onscreen leads to a plot with issues.
00:28
There are several tiny slices with overlapping labels. To address the problem, the smaller categories can all be lumped together into a single group. This code merges all the categories with a total of under 100,000 into a category called "Other"
, and then creates a pie plot.
01:20
Notice that the argument for label
is an empty string (""
).
01:26 By default, pandas adds a label with the column name. That often makes sense, but in this case it would only add some noise. Now the pie plot is much better, as you can see here.
01:36
The "Other"
category still only makes up a very small slice of the pie. That’s a good sign that merging those categories was the right choice.
01:45 Zooming in on Categories. Sometimes you also want to verify whether a certain categorization makes sense. Are the members of a category more similar to one another than they are to the rest of the dataset? Again, a distribution is a good tool to get a first overview.
02:01 Generally, we expect the distribution of a category to be similar to the normal distribution but have a smaller range. This code creates a histogram plot showing the distribution of the median earnings for the engineering majors.
02:15 It will generate a histogram that you can compare to the histogram of all majors from the beginning.
02:28 The range of the major median earnings is somewhat smaller, starting at $40,000. The distribution is closer to normal, although its peak is still on the left.
02:39 So even if you’ve decided to pick a major in the engineering category, it would be wise to dive deeper and analyze your options more thoroughly.
Become a Member to join the conversation.
pnmcdos on April 8, 2022
Seems like a lot of line of code to remove the overlap of:
Was there a way to combine the two into ‘other’ with one line then plot it?