# Use Sets to Remove Duplicates

**00:00**
Yes, exactly. So, it’s like the length of the markers seems to be something that we should parametrize in our code. So I guess let’s do that. I’ll introduce a `marker_length`

variable here that we can then switch between part one and part two by changing this one between `4`

and `14`

.

**00:17**
Okay. And then I’m going to think a little bit about what we’re looking for here, are sort of like duplicate characters or technically where there are no duplicate characters.

**00:29**
And Python has a built-in data structure that’s really nice to work with when we’re dealing with duplicates, namely sets.

**00:36**
So before moving on, let’s just show how sets work here. So if I’m going to just print this part, and then I’ll just comment out the rest of this for now.

**00:52**
So now my code here, I’m using the string that we have on top here, and I’m just converting this one into a set and then printing it out. And what we may be able to see here—it’s not completely obvious, I guess, since there’s so many characters—is that there are no duplicates any longer in this.

**01:12**
So you can see the, the order is kind of changed around a little bit, but for instance, there is a `j`

, which we have `j`

in the second and the fourth character, but in our set, there’s only one of the `j`

s.

**01:25**
What we can start doing now is that we can convert our small substrings of the string into sets and then see if the set means that some characters disappear essentially.

**01:36**
So since it removes duplicates, we can use that as a test for whether there are duplicates, okay, in our character sets. So as an example of this, let’s say that we’re looking for a, let’s just call it a candidate.

**01:52**
And I’ll say that this will start here by just picking out the, okay, let’s use `marker_length`

here since we have it. So here I’m just pulling out the first four characters for the `marker_length`

for first four characters from the candidate.

**02:08**
And then I’m going to print out

**02:13**
both `candidate`

and the set of `candidate`

so that we can see what happens. Okay. And what you were doing there with `stream`

and the square brackets and the colon is you’re slicing the string from the `stream`

variable, which is a string from index zero to `marker_length`

, which is four.

**02:35**
So if we look at `candidate`

that’s down here, that’s just now `mjqj`

, and then I look at the set of `candidate`

, and that’s then the letters `m`

, `q`

, and `j`

.

**02:46**
So you can see that the duplicated `j`

has disappeared.

**02:51**
The important observation here is now that the length of my set of `candidate`

is less than the `marker_length`

. It’s less than four, so that means that there has to be a duplicate somewhere here.

**03:02**
Okay. Yeah, because if it would be four unique characters, then the length of the string would be the same length as the set. Yes, exactly.

Become a Member to join the conversation.