Locked learning resources

Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Locked learning resources

This lesson is for members only. Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Sets, FrozenSet, and Multisets

A set is a collection of items where each item is unique. This is derived from the mathematical concept of the same name. Python has two built-in types for sets: set and frozenset. A set is a mutable object while frozenset provides an immutable implementation.

Additionally, the collections library includes the Counter object which is an implementation of a multiset, it stores both the unique items and a count as to how many times it has been added to the container.

Here are resources and additonal documentation about sets:

Download

Course Slides (.pdf)

633.9 KB
Download

Sample Code (.zip)

3.1 KB

00:00 In the previous lesson, I wrapped up the section on records. In this lesson, I’m going to talk about sets and multisets. A set is an unordered data structure that doesn’t allow duplicate elements.

00:14 This comes directly from the mathematical concept of the same name. Sets are primarily used to determine some sort of membership and as a result are optimized to be able to find whether or not something’s in that set very, very quickly—typically in O(1) (order 1) time. And based on those same mathematical concepts, you can typically union, intersect, difference, and subset portions of your set.

00:40 Python has two built-in types for dealing with sets: the set and the frozenset. The set type is mutable, meaning it can be changed, and the frozenset type is immutable, meaning it can’t be changed. Let me show you some examples.

00:57 In Python, a set is a built-in type, so you create it calling set() and passing in some data.

01:08 You initialize a set by passing in a tuple. Each item in the tuple becomes a member in the set. There’s the values. Somewhat confusingly, if you’re used to dictionaries, sets are also shown using curly brackets.

01:22 You can tell the difference between a dictionary and a set by whether or not there’s keys and values. Because there’s no key here—it’s just a listing—that means this is a set.

01:33 I can add something new to the set by using the .add() method. Because I’ve added something that’s already there, it doesn’t change. This time, let me add something that isn’t there.

01:49 And sure enough, it’s been tacked on. It happens to show up on the end in this case, but it does not have to. Remember that sets are unordered. You can call the len() (length) function on a set to find out how many items are inside of it.

02:05 When you construct a set, whatever’s passed in is iterated over. In Python, if you use a string in an iteration context, you will get each of the characters in the string.

02:16 This is a convenient way of creating a set based on the unique letters in a string. The string 'alice' gets iterated over, and a set with 'a', 'l', 'i', 'c', and 'e' is created. Once again, notice the ordering is not maintained.

02:36 The intersection between two sets shows you what is in both sets. I can intersect name with vowels to find out what are the vowels in 'alice'.

02:52 Like lists and dictionaries, sets can also be created using comprehensions. The notation for a set comprehension uses the curly brackets, and just like the list and dictionary comprehensions, consists of a little mini for loop.

03:12 I find it easiest when reading a comprehension to start from the outside portions and move towards the middle. On the left-hand side, you see x * x. This is the thing that is making up the set.

03:24 Each item that is iterated over in the for loop is going to get multiplied by itself, and that is what’s going to be put in the set. On the right-hand side you see range(10). That’s the thing being iterated over, so the numbers 0 through 9 will be iterated over.

03:41 And, of course, the piece in the middle is the for loop. This results in the squares of the numbers 0 through 9. Again, no ordering here.

03:53 It is possible to create an empty set, but you need to be a little careful with this. Because curly brackets are used for both sets and dictionaries, you can’t use a comprehension to create an empty set.

04:05 Let me show you as a counter example. Great, I’ve got empty. It says it’s got nothing in it. I’ll look at its type, and it’s a dictionary. If you actually want an empty set, don’t use a comprehension.

04:21 Just call set() with nothing in it.

04:29 That’s better. So far, everything I’ve shown you has been mutable. Python also has a type called frozenset, which creates an immutable set. Frozensets are constructed based on a set.

04:50 Here, I’ve created a set using the curly bracket notation containing the words "sight", "hearing", "taste", "touch", and "smell", passed that into frozenset(), and the senses object is now a frozenset version of those items.

05:07 Frozensets are immutable. Any attempt to add something will result in an AttributeError. One of the restrictions that Python puts on keys for dictionaries is that they must be immutable.

05:21 This means you can actually use the frozenset as a key in a dictionary.

05:31 Attempting to do the same thing with a regular set will result in an error.

05:38 Sets are mutable, mutable things aren’t hashable, and so you get a TypeError when you try to create that dictionary. In addition to supporting sets and frozensets, Python also supports multisets. A multiset is a set that doesn’t have the unique item restriction.

05:57 This is sometimes also known as a bag. So, what good is a set that doesn’t have the unique item restriction? Isn’t that just a list? Well, it’s a little more complicated than that. The unique items are still there, but a counter is kept as to how many times something has been inserted, so you get, sort of, the best of both worlds of a list and a set.

06:17 Most implementations of a bag allow you to see the items as if they’re a set as well as the associated counts. The implementation of the multiset in Python is inside of the collections library in an object called Counter.

06:33 To use a Counter, you need to import it from the collections library.

06:41 Now I’ll create an empty Counter inside of an object named inventory.

06:48 I modify the contents of inventory by passing in a dictionary.

07:00 The keys in the dictionary indicate the things in the set and the values in the dictionary indicate how many of each there are. I’ve created a Counter that has three 'bread' items and one 'sword'.

07:15 I can call .update() again with a new dictionary,

07:25 and you can see the cumulative effect. There’s one more stored than there was before and a new item called 'apple' has been inserted. As I mentioned, counters aren’t quite lists and they aren’t quite sets—they’re sort of halfway in between. Calling the len() method on the Counter actually returns how many unique items there are.

07:48 This returns 3 in this case, for 'bread', 'sword', and 'apple'regardless of how many pieces of bread, how many swords, or how many apples you have. Similar to a dictionary, the Counter object supports a .values() method.

08:01 This returns an iterator whose contents is the counts of each one of the items inside of the Counter. For the inventory item, that would return 3, 2, and 1.

08:14 By taking this iterator and passing it into the sum() function you can get the total number of items in the bag.

08:24 Three pieces of bread, plus two swords, plus one apple gives you a total of 6. Choosing between the set data structures is pretty simple. If you need mutable, use set. If you need immutable, use a frozenset.

08:40 If you need a multiset, use Counter. Nothing more complicated than that. You can dig into sets a little more by looking at the Python documentation.

08:50 Here’s the link for set types, which gives information on both sets and frozensets. And here’s the link to the Counter object inside of the collections library.

09:01 Thanks for your time! I hope you found this course useful.

Avatar image for Alain Rouleau

Alain Rouleau on March 28, 2021

Really enjoyed the whole concept of a multiset or a so-called bag, thanks. You know, creating a “Counter” object from the “collections” module. Very interesting and something that could actually be useful.

I like not only the idea of a set but how many do I actually have of each? Think of baseball or hockey cards. Not only how many sets do I have but how many cards of each baseball or hockey player do I have in total.

Very powerful!

Avatar image for jackdev

jackdev on April 24, 2021

Great course! I’d realised I’d been overcomplicating the data structures in my head and it was making me constantly second-guess myself when designing my hobby projects (not understanding when to move things into classes etc.).

Knowing that applications are “just” data and functions, and data is boiled down into collection-types or record-types made things much simpler. Can just start small with mutable/immutable and scale up from there as needed. Already making a ton more progress on my projects.

Awesome job on this course.

Avatar image for Christopher Trudeau

Christopher Trudeau RP Team on April 24, 2021

I’m glad you enjoyed it. I’ve been coding for a long time, and I started out with fairly structured languages and always thought that was the only way to do things. I was a big fan of setters/getters and private members, protecting the programmer who was using my stuff from the internal workings.

Coming to python was an adjustment. After a while I realized it was a lot of extra effort for very little gain. Who was I protecting? The python idea of using a convention like leading underscores to warn someone but otherwise it was up to them whether to touch it or not was a bit of revelation.

Looking back now, I’m sure a younger me would be baffled by how often I just use dictionaries nowadays. Simplicity if often a lot harder to do that complexity, but frequently worth the extra effort.

Happy coding!

Become a Member to join the conversation.