Analyze Categorical Data
00:00 Analyze Categorical Data. To process bigger chunks of information, the human mind consciously and unconsciously sorts data into categories. This technique is often useful, but it’s far from flawless.
00:15 Sometimes we put things into a category that, on further examination, aren’t all that similar. In this section, you’ll get to know some tools for examining categories and verifying whether a given categorization makes sense.
00:27 Many data sets already contain some explicit or implicit categorization. In the current example, the 173 majors are divided into 16 categories. A basic use of categories is grouping and aggregation.
00:42
You can use .groupby()
to determine how popular each of the categories in the college major dataset are. With .groupby()
, you create a DataFrameGroupBy
object.
00:56
With the .sum()
method, you create a Series.
01:09
Let’s draw a horizontal bar plot showing all the category totals in cat_totals
.
01:20 You should see a plot with one horizontal bar for each category. As your plot shows, business is by far the most popular major category. While humanities and liberal arts is the clear second, the rest of the fields are more similar in popularity. With groups clearly established, in the next section you’ll see the best way to visually compare ratios.
Become a Member to join the conversation.