When you’re preparing to plot a histogram, it’s simplest to not think in terms of bins but rather to report how many times each value appears (a frequency table). A Python dictionary is well-suited for this task:
>>> # Need not be sorted, necessarily
>>> a = (0, 1, 1, 1, 2, 3, 7, 7, 23)
>>> def count_elements(seq) -> dict:
... """Tally elements from `seq`."""
... hist = {}
... for i in seq:
... hist[i] = hist.get(i, 0) + 1
... return hist
>>> counted = count_elements(a)
>>> counted
{0: 1, 1: 3, 2: 1, 3: 1, 7: 2, 23: 1}
count_elements()
returns a dictionary with unique elements from the sequence as keys and their frequencies (counts) as values. Within the loop over seq
, hist[i] = hist.get(i, 0) + 1
says, “For each element of the sequence, increment its corresponding value in hist
by 1.”
In fact, this is precisely what is done by the collections.Counter
class from Python’s standard library, which subclasses a Python dictionary and overrides its .update()
method:
>>> from collections import Counter
>>> recounted = Counter(a)
>>> recounted
Counter({0: 1, 1: 3, 3: 1, 2: 1, 7: 2, 23: 1})
You can confirm that your handmade function does virtually the same thing as collections.Counter
by testing for equality between the two:
>>> recounted.items() == counted.items()
True
Technical Detail: The mapping from count_elements()
above defaults to a more highly optimized C function if it’s available. Within the Python function count_elements()
, one micro-optimization you could make is to declare get = hist.get
before the for
loop. This would bind a method to a variable for faster calls within the loop.
chrismarkella on Sept. 6, 2019
Great presentation. Do you have any tutorial about the arrow syntax with the return type? I tried it with a dummy function but it didn’t force the type.