# Pandas Tools

In addition to its plotting tools, Pandas also offers a convenient `.value_counts()`

method that computes a histogram of non-null values to a Pandas `Series`

:

```
>>> import pandas as pd
>>> data = np.random.choice(np.arange(10), size=10000,
... p=np.linspace(1, 11, 10) / 60)
>>> s = pd.Series(data)
>>> s.value_counts()
9 1831
8 1624
7 1423
6 1323
5 1089
4 888
3 770
2 535
1 347
0 170
dtype: int64
>>> s.value_counts(normalize=True).head()
9 0.1831
8 0.1624
7 0.1423
6 0.1323
5 0.1089
dtype: float64
```

Elsewhere, `pandas.cut()`

is a convenient way to bin values into arbitrary intervals. Let’s say you have some data on ages of individuals and want to bucket them sensibly:

```
>>> ages = pd.Series(
... [1, 1, 3, 5, 8, 10, 12, 15, 18, 18, 19, 20, 25, 30, 40, 51, 52])
>>> bins = (0, 10, 13, 18, 21, np.inf) # The edges
>>> labels = ('child', 'preteen', 'teen', 'military_age', 'adult')
>>> groups = pd.cut(ages, bins=bins, labels=labels)
>>> groups.value_counts()
child 6
adult 5
teen 3
military_age 2
preteen 1
dtype: int64
>>> pd.concat((ages, groups), axis=1).rename(columns={0: 'age', 1: 'group'})
age group
0 1 child
1 1 child
2 3 child
3 5 child
4 8 child
5 10 child
6 12 preteen
7 15 teen
8 18 teen
9 18 teen
10 19 military_age
11 20 military_age
12 25 adult
13 30 adult
14 40 adult
15 51 adult
16 52 adult
```

What’s nice is that both of these operations ultimately utilize Cython code that makes them competitive on speed while maintaining their flexibility.

**00:00**
Now that you can use a variety of graphing libraries for your histograms, we’ll cover some tools available in Pandas to give you some more control over them.

**00:10**
The first one is the `.value_counts()`

method, which computes a histogram from your data and turns it into a Pandas `Series`

. So from a little dataset, which you can just say `np.random.choice()`

,

**00:27**
and then do a NumPy `arange()`

,

**00:43**
and set `p=np.linspace()`

from `1`

to `11`

, and `10`

vals, then divide that by `60`

. And then say `s`

is just going to equal a Pandas `Series()`

from `data`

.

**01:12**
You can go ahead and print `s`

and then just call `.value_counts()`

. And when you take a look at this, and if I open this up, you can see the frequency of each value and how often it appears in that dataset. This is similar to before, when we turned these into dictionaries.

**01:37**
But by turning them into Pandas `Series`

, you get a couple more options. Because they’re a `Series`

, you’re free to use any method that you would use on a regular Pandas `Series`

.

**01:50**
So if I just called `.head()`

on that, you can see that only the first five results show up now. Another nice built-in thing to `.value_counts()`

is the ability to normalize the data, which, if you set that equal to `True`

,

**02:17**
just goes ahead and normalizes everything from 0 to 1. The big thing about `.value_counts()`

is that it returns that Pandas `Series`

, which gives you a lot of flexibility for any further processing or graphing that you need to do with that data. Another tool in Pandas is `Pandas.cut()`

.

**02:34**
I’m going ahead and make a `Series`

called `ages`

. It’s going to have quite a bit going on in here. We’ll say `[1, 1]`

…

**03:15**
And actually, bring this over. And then assign `bins`

to a list.

**04:01**
And then put these ages into the groups. So you can just say `groups = pd.cut()`

, pass in `ages`

, set the `bins=bins`

, and the `labels=labels`

.

**04:19**
Then I’m just going to go ahead and take `groups`

and print the `.value_counts()`

from that. But before I run that, let’s take a look at what’s happening here.

**04:28**
You can see that you have six different labels and seven different bins. These bins are actually the bin edges. So a `'child'`

would be from `0`

to `10`

, `'preteen'`

would be from `10`

to `13`

, and so on.

**04:43**
Calling `cut()`

here will then assign each of the ages in this `Series`

to the bin that they belong to. And then using `.value_counts()`

will print it out.

**04:54**
So let’s take a look and see. Bring this up. And yeah, everything’s been categorized correctly. You’ve got `6`

children, `5`

adults, `3`

teens, and so on.

**05:12**
Everything we’ve looked at up until this point has arbitrarily set the bins based on the dataset, and this makes sure that all the bins are the same size based on default values or the number of bins that you specify.

**05:26**
Using `Pandas.cut()`

allows you to set your own bin sizes, which is very useful for things like this, where ages don’t necessarily fall into even ranges. All right!

**05:37**
Now you know a couple different ways to make histograms, how to plot those histograms, and some different tools in Pandas to change up how you actually produce those histograms.

**05:46**
This is quite a lot of info to take in, so in the next video, we’re going to summarize everything we talked about and talk about which applications are best for each method. Thanks for watching.

Become a Member to join the conversation.