Join us and get access to thousands of tutorials and a community of expert Pythonistas.

This lesson is for members only. Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Coding Challenge

Managing and Measuring Python Code Quality Negar Vahid 06:33

00:00 Take a look at the logic. You have a function named x that is supposed to find duplicate IDs, taking an input n that represents the original list of IDs.

00:10 Inside, it creates an empty list called l, which is meant to store the duplicates it finds. And then it loops through n using i to represent the current ID being checked.

00:21 And if that item is not already in l and it appears in n more than once, it gets appended to l. And then finally, it returns l.

00:32 Okay, just by hearing that, you might have guessed that this function is not readable. But before anything else and taking care of readability, you want to make sure the code actually works. So you can test it with any list of IDs.

00:46 Here you have user IDs. And if you take a look, you can see that 1042 and 1088 are duplicates, so you expect to see them in the output.

00:55 Let’s run this. You can go to the terminal and say python filename. Here it’s duplicate

01:04 _id-version-readable. Well, not yet, but it will be.

01:10 And you do indeed get 1042 and 1088 as expected. So your logic works.

01:17 Now back to the readability issue. Because of those single letter variables, you have to read and decipher every single line just to guess what the code is supposed to do. So the next step is fixing this by replacing those letters with clear descriptive names.

01:33 Starting with the function name, instead of x, you can say find_duplicates.

01:42 Instead of n, you can say id_list. Now this l is taking care of duplicates, so you can just call it duplicates.

01:52 And you can see that the previous l and n variables that you renamed are now yellow. So you have to change them everywhere in your code. n is id_list.

02:05 You can copy it everywhere you see n, paste it. Same with l. Okay, there you go. You also have i here, which is supposed to be the current ID, so current_id. And then you have it here in the if statements.

02:31 And here when you’re appending i to duplicates. And finally, in the print statement, when you’re testing user IDs, now instead of x, you have to say find_duplicates.

02:45 That’s much better. When you look at the function, you can understand what each variable is doing. That’s for readability.

02:54 Next, you need to think about this code’s efficiency. Notice that this line here, id_list.count, this is a massive trap because it sits inside a for loop. And for every single item in that list, Python has to scan the entire list again from start to finish just to count it.

03:15 To put this into perspective, if the list has 10 items, it does 100 checks. But if you feed this function a database of, let’s say, 100,000 user IDs, it will perform 10 billion operations.

03:31 This is called Big O notation of n squared time complexity, or also sometimes called quadratic time complexity. And it will definitely freeze your application.

03:42 So what you need is a solution that only looks at each ID once, preferably. One way to do that is using a data structure called a set. Lookups in a set are amazing.

03:55 They’re practically instant. And if you manage to do that, you will get a Big O notation of n or linear time, which is amazing. In order to use sets in your current function, you need to change up the structure a little bit. This is all about finding duplicates and having a seen set where you can take care of the stuff that you’ve already seen.

04:19 So you can create a set called seen here,

04:23 seen, set, open, close parentheses. You can also turn your duplicates into a set. You can keep the for loop as is, but you need to change up the if statements inside. You want to say if the current ID has already been seen and it is in the seen set, then it’s a duplicate.

04:44 So you want to add it to the duplicate set. if current_id

04:49 in seen, you don’t need the second if statement anymore.

04:54 duplicates dot, instead of append, you need add because you’re dealing with a set now. duplicates.add(current_id).

05:03 And finally, when you’re returning duplicates, you can turn it into a list just for the ease of it. You also need to add an else block here.

05:15 So else, if the current ID has not been seen yet, you’ve seen it now, so you want to add it to the seen set.

05:24 seen.add(current_id).

05:29 And there you go. You can test your new function to see if the logic is still intact by using the same user IDs from before. You can say python duplicate_id-version-efficient, because that’s what this file is called, dot py.

05:47 And there you go. You still get 1088 and 1042 as expected.

05:53 Now you’ve solved the efficiency problem of your code. If you want to push this code even further and make it even more Pythonic, you can use the standard library and use Counter from the collections module. Your time complexity will stay O(N), linear time, and you will have a way shorter function. That’s your challenge.

06:15 Whenever you figure it out, you can share your completed code in the comment sections below. You just learned how to implement your understanding of what a high quality code is and how important it is for your function to be functional, readable, and efficient.

Become a Member to join the conversation.