Union, Intersection, and Difference

Using Sets in Python James Uejio 06:49

Transcript
Discussion (5)

00:00 Let’s move on to some more set operations. In this video, you will learn union, intersection, difference, and symmetric difference. All of these set operations can be performed by a method and by an operator.

00:13 Let’s see how that works. The union of multiple sets is defined as the set of all elements in all sets. So, as I was saying, you can perform the union by a method .union() and the arguments need to be iterables—and this is called a method—or by a operator.

00:33 x1 | x2 [| x3 ...], et cetera. The operands, which are x1, x2, x3, need to be sets. And again, the vertical bar is called the operator.

00:48 Here’s a visual of what the union would look like. So, the set union would be all elements in all sets, so we have two sets with three elements each, and they both contain the element 'baz'.

01:01 So the union set will have five elements, 'foo', 'bar', 'baz', 'qux', and 'quux'.

01:08 Here we see that example. x1 is {'foo', 'bar', 'baz'}. x2 is {'baz', 'qux', 'quux'}. We take the union, it gives us a set with five elements.

01:17 Notice that 'baz' appears in both sets, but only appears once in the union set. We can also do x1.union(x2)—same thing. Here, we can do x1.union(), a tuple ('baz', 'qux', 'quux') gives us the same set.

01:33 So .union(), remember, can take in an iterable. And then we can do x1, operator, a tuple, but it errors because you cannot use the vertical bar operator, the union operator, on non-sets. Here’s an example with multiple sets.

01:51 So here we have a is a set of {1, 2, 3, 4}, b is {2, 3, 4, 5}, {3, 4, 5, 6}, {4, 5, 6, 7}.

01:59 So if we union them using the method, we get {1, 2, 3, 4, 5, 6, 7}. We can also use the operator, and it gives us the same thing. So, the intersection of multiple sets is the set of only the elements that exist in all sets. So again, like union, we have the method x1.intersection(x2[, x3 ...]) et cetera, and all arguments need to be iterables.

02:24 Sets are iterables, so we can pass in sets. And then the operator. x1 & x2 [& x3 ...]. All the operands need to be sets. Here is a visual.

02:36 We have the two sets, and the intersection set will be the set of only the elements that exist in all sets. That would be the set {'baz'}.

02:46 So that example in code, {'baz'} and {'baz'}.

02:53 Now with multiple sets, {1, 2, 3, 4}, {2, 3, 4, 5}, {3, 4, 5, 6}, {4, 5, 6, 7}. The intersection—well, what is the element that is shared in all sets? The number 4. And here we use the operator. Moving on to difference, the difference of multiple sets is the set of only the elements that exist in the first set, but not any of the rest.

03:18 So here we have the method x1.difference(x2[, x3 ...]), et cetera. Arguments need to be iterables. And we have the operator, x1 - x2 - x3. Operands need to be sets. So in the visual, we would have the elements that exist only in the first set but not in any of the rest of the sets, so the set difference of x1 and x2 would be the set with 'foo' and 'bar'.

03:44 Here’s the code. {'foo', 'bar'} because those are the only elements in the first set that do not exist in any of the rest.

03:56 Here is another example, {1, 2, 3, 30, 300}, {10, 20, 30, 40}, {100, 200, 300, 400}. I know, sort of arbitrary numbers, but when we take the difference of a.difference(b, c), we look at all elements in a that are not in any other set. Well, 1 is not in any, 2 is not in any, 3 is not in any, 30 is in b, and 300 is in c. So it’ll just return {1, 2, 3}. And here we use the operator.

04:28 This operation is evaluated left to right, which means when we do a - b - c, we first compare a - b. So if we do {1, 2, 3, 30, 300} - {10, 20, 30, 40}, we get the set of {1, 2, 3, 300} because those are the elements in set a that are not in set b, and then we compare that resulting set with set c, and we get rid of the 300.

04:54 So now the final set is {1, 2, 3}.

04:58 The symmetric difference is defined as the set of only the elements that exist in a single set, but not in multiple. So here we have the method x1.symmetric_difference(x2).

05:09 For some reason, this only takes in one argument, not multiple. And the argument needs to be an iterable. Here’s the operator, the little caret (^). x1 ^ x2, et cetera.

05:22 And the operands need to be sets. So, that can take in multiple operands, while the method can only take in one argument. Here’s our visual. We have x1 and x2, and the symmetric difference would be only the elements that exist in one set, but not in multiple. So that would be the elements 'foo', 'bar', 'qux', and 'quux'.

05:44 So here we have {'foo', 'bar', 'baz'}, {'baz', 'qux', 'quux'}. If we take the symmetric difference, we get {'foo', 'qux', 'quux', 'bar'}. Again, unordered, so it’s not in any particular order. Same here.

05:59 Let’s try the x1.symmetric_difference(x2, x3). It would be an error because, for whatever reason, it only takes in one argument.

06:10 Here we have a set of {1, 2, 3, 4, 5}, {10, 2, 3, 4, 50}, {1, 50, 100}, and now we’re going to look at which elements exist in only one set and not multiple. {100, 5, 10}.

06:25 So, you can see how symmetric difference can be really useful, right? It’s very hard for me to just eyeball which elements only exist in one set. Let’s just say you have a hundred sets and you just want to, for whatever reason, find what elements only exist in one set.

06:40 Well hey, you just use a ^, you ^ it a bunch—and you’re done! Now let’s move on to disjoint, superset, and subset.

Levi on March 14, 2020

Really liking the breakdown of this.

Sciencificity on April 27, 2020

Hi James, I think that symmetric_difference() only takes one argument since it does not make sense when you apply it to multiple arguments. This is due to the left to right nature of the operator. The in between sets used with ^ can introduce elements that appeared in previous sets but were discarded by the previous use of ^. E.g using the example we used previously we would have the below:

a = {1,2,3,4}
b = {2,3,4,5}
c = {3,4,5,6}
d = {4,5,6,7}
print(a ^ b ^ c ^ d) # a ^ b -> {1,5} ^ c -> {1,3,4,6} ^ d -> {1,3,5,7}

{1, 3, 5, 7} # result ... but 3, 5 occur in multiple sets

Also if we amend the example used in the video slightly (add a 2 in last set):

a = {1,2,3,4,5}
b = {10,2,3,4,50}
c = {1,2,50,100}
print(a ^ b ^ c)

{2, 100, 5, 10} # result

Kumaran Ramalingam on Aug. 16, 2020

Hi James, It was really useful for me to breakdown things and learn

anshetc on Feb. 5, 2024

The video is incorrect in saying the symmetric_difference operator ‘^’ selects items that are only present in 1 set. An item present in all 3 sets s1, s2, s3 will also show up in s1^s2^s3.

Bartosz Zaczyński RP Team on Feb. 6, 2024

@anshetc Not exactly. The symmetric difference is a binary operator, which takes two arguments. Python interprets your expression, s1 ^ s2 ^ s3, as two separate symmetric differences, i.e., (s1 ^ s2) ^ s3.

Consider the following example:

>>> s1 = {"a"}
>>> s2 = {"a", "b"}
>>> s3 = {"a", "b", "c"}

>>> s1 ^ s2
{'b'}

>>> {"b"} ^ s3
{'a', 'c'}

>>> s1 ^ s2 ^ s3
{'a', 'c'}

In the above example, s1 ^ s2 results in {"b"} since "b" is the symmetric difference between sets s1 and s2. When {"b"} is then calculated against s3 with the symmetric difference, you end up with {"a", "c"} because these are the elements that are either in {"b"} or s3 but not in both. Hence, s1 ^ s2 ^ s3 produces {"a", "c"}.

Become a Member to join the conversation.