Selecting Quality Packages Part 1
00:00 When you’re looking to find a quality Python package to help you out with a problem at hand, it can be a little bit overwhelming, having to select between all of these different options.
00:09 In my time as a Python developer, I’ve come up with a series of rules of thumb for selecting a great package. I’ve turned this into a seven step workflow that you can use to find and select quality Python packages.
00:21 Let me walk you through it now, step by step. Think of this workflow as a funnel. First, you’re going to find a poll of candidate packages, and just do a bunch of research and basically collect as many packages as possible that could help you with the problem at hand, and then, with each of the steps in this workflow, we’re going to successively refine this list by excluding packages.
00:46 Each step will help you gather more information and give you a better understanding of the quality of each package. The goal of this process is to make the decision which package to use, really really simple.
00:59 You’ll be starting with this long list of candidate packages and in the beginning, it will be almost impossible to tell which one is the perfect package for your use case, but as you keep narrowing down that list, by the end of this workflow, you will have narrowed down that candidate list so much and you will have built a great understanding of the strength and weaknesses of each library, that making that decision is going to be very easy for you.
01:24 The ability to find and identify great Python packages, is very helpful even if you’re working on your own, but it gets so much more powerful if you have to justify your dependency decisions to a team of other developers or to your manager.
01:39 You can apply the same workflow and the same criteria and use them to explain your decisions; to give you a concrete example, you could just take this process and as you go through it, take extensive notes and basically compile a report about your decision and after you went through those seven steps in the workflow, this is going to be a pretty bulletproof report that you can then share with your team, or your manager.
02:03 Alright, let’s jump right in and then you’ll learn this important skill in no time. Let’s start with step 1: finding candidate packages. The first thing that I usually do is that I come up with a list of candidate packages that will help me solve the problem at hand.
02:19 And there is a number of ways you can fill up that list. In my mind, it really helps to come up with the series of options, so that you have a base for comparison.
02:27 Now, let’s talk about how you would fill up that candidate package list. I often start out by browsing through the curated lists I told you about earlier, so I would just open up those websites like Awesome Python, I will try and find the matching category that is relevant to my problem that I am trying to solve and then I’ll just click through that category checking out all the packages that are listed there.
02:51 Another option would be just to run a quick Google search for two to five relevant keywords, imagine you are looking for a way to upload files to Amazon’s S3 service using Python.
Here is what I would do, so for that, I just open up Google and then I would probably search for something like
s3 upload Python, you know, very focused keywords and just kind of sprinkle the minimal set of keywords that I could think about, and I just search for that.
03:19 And then the results here are going to give me a pretty good overview, so I probably just click through the first three results or so, and just check out what they have to say.
Now, this question here, pretty much is what I had in mind and looking at the first answer points me to the
boto library, so I’ll probably check that out and add it to the list, and then I do the same thing for the other top search results, now in this case, I know from personal experience that
boto is a great choice.
03:48 So the fact that we’re already seeing this result is a pretty good sign. Honestly, I found that a quick Google search can really help you out here, it’s often digging up the right content immediately pointing you to results on StackOverflow or on forums like Reddit or Hacker News.
04:04 So I usually do that really early on in my research process, when I am looking for a new Python package. You’ve already seen that I looked at a StackOverflow result here, so StackOverflow is another great site you can use to find recommendations for Python packages.
04:21 If you haven’t used StackOverflow before, it’s basically a question & answer site for developers. And you can search it as well, so I am just going to punch in the same keywords that I previously searched on Google, just to see what comes up, so by default, this will be sorted by relevance, which is kind of an opaque measure, so often, I’ll just immediately switch over to the votes tab which will give me the most upvoted answers.
04:44 Alright, so let’s check out the first answer here. So, this is the number one upvoted answer for this question, I am not going in to read the full question, I just want to see what kinds of libraries and tools people recommend here.
And as I scroll down, I can immediately see that okay,
boto is another library that people recommend, so again, this will be a pretty good indicator that I should really check out this
boto library because it just keeps popping up again and again.
05:09 Another great recommendation for finding quality Python packages are community forums like Reddit or Hacker News, and sometimes you can also use Twitter like that, let’s take a look at those now.
05:24 Reddit is a community forum website that has a pretty large Python community, you can find it at reddit.com/r/python. And Reddit has a search feature as well, again, what I would do here is I would punch in the same keywords and then I would limit my search.
05:41 In this case, we could probably drop the “python” because we’re just searching the Python forum. So, anything S3 related will pretty much be about Python.
05:50 Alright, let’s see what we got here… So this looks pretty helpful already, one interesting bit here is that you can see when the question was submitted, or when the form thread was created.
06:01 So you want to make sure you are not looking at super old content for things that could change frequently. But let’s just check out this discussion here.
06:08 So this looks like this is not going to give me the answer immediately, but I can still learn a lot about how people talk about the problem here, what keywords they use and that could point me in the right direction to actually find the library that does what I want or I actually find a discussion where someone recommends a specific library and then other people can respond to that discussion and I can read what they have to say and that is going to give me a pretty good idea of whether or not that library might be the right choice for me.
06:36 Another helpful community forum is Hacker News. Now, by default Hacker News doesn’t have a search function built in, but you can use a third party search at hn.algolia.com that can do a full-text search on comments and stories inside Hacker News.
Again, let’s punch in “S3 upload python” and see what happens. Alright, so looking at these results again I see
boto popping up here so this could be interesting, maybe this result is a little bit old, but again, this could be a good way to fill up that candidate list and identify libraries that other people recommend and use.
07:12 Even if you’re not using Twitter, just the fact that so many people share their thoughts on Twitter all the time, can be pretty powerful if you’re looking for an answer to your programming question, I know it sounds a little bit crazy but this works more often than you’d think, so let’s try it out, I don’t know what is going to happen.
07:30 Again, I am searching for the same set of keywords, and then I am just going to check out some of the responses here. Alright, so sometimes it’s going to reference other source material like StackOverflow, or blog posts, okay, so this looks pretty interesting here, this guy is talking about a script that uploads stuff to S3, so why don’t we check it out.
So just looking at the code here, it looks like this guy is not using a specific library to talk to S3, but he is using the command-line tool, this
aws s3 command, so this could be another option for us to research now, maybe it’s a good choice, I don’t know, I know this process is a little bit time consuming but it’s really impressive what this process can dig up.
08:08 If you do this for an hour or two, you’re going to be pretty much an expert on what’s out there in terms of libraries that could help you with this job.
08:19 If you’ve searched all these sites and you’re still not happy with this candidate package list that you’ve built up, then it might make sense to search PyPI directly.
08:28 Personally, I find it a little bit hard to find stuff on PyPI because the interface is pretty clunky, and there is very little curation. But it might still make sense to spend a few minutes on that and see if you can dig up something useful.
08:44 Now, another option to get those candidate packages would be to actually ask a question on StackOverflow or Reddit, so on all of these sites you can create a free account, and just start asking questions, of course, you want to be mindful of questions that people have asked in the past, so I recommend that you do some research first to avoid running the danger of posting duplicate questions.
09:06 But usually, people are pretty receptive and helpful on these forums, so it might make sense to give it a shot. However, it’s rather time intensive to write and post the question and then having to wait a couple of hours or even days to get a response.
09:19 Now at the end of step one, you should have a list of candidate packages that you want to do some further research on. After you’ve generated a list of candidate packages, the next step is to check out how popular these packages are.
09:36 Usually popularity is a good sign if you’re looking for a Python package because that often means that the package is well-maintained it’s high quality, and you can’t really go wrong with installing it and using it for your own purposes.
09:50 Now, how can you find out if a package is popular? One way to do it would be to check out the download stats, now you used to be able just to go to PyPI and checkout the download stats for a package, but this feature was removed when the PyPI architecture changed.
10:05 So right now, you can’t really get those stats, they might come back in the future, and then I think they are really good indicator, but right now, we’ll have to go with something else.
10:14 Another good popularity indicator would be just the number of Google results and Reddit results and StackOverflow results or recommendations you find for a given package.
10:24 And often, this step of the research process happens in combination with the first one, so as you go along and search these sites, you can take mental notes of which packages show up frequently.
10:34 And this could be really valuable information, when you have to make a decision which one you are going to use. If a package is hosted on github, you could also check out their GitHub page and see how many stars they have on GitHub.
10:46 So the star system on GitHub is a pretty simple voting system where people can favorite or star repositories. Now, if you are thinking about installing a library that has let’s say 5,000 or 10,000 stars, it’s pretty much a no brainer.
10:59 If it only has 10 or 20 then maybe that is not a bad sign, but it’s also probably not a super popular library. Another way to get at that information is using the python.libhunt.com website and it includes a popularity indicator that is based on some other opaque values sometimes it can be helpful to compare two packages and just kind of see which one has more traction.
11:24 Now at the end of step two, you should have a pretty good understanding of the relative popularity of your candidate packages. Once you have narrowed down the list of candidate packages I would start checking out the actual project homepages.
11:38 You could learn a lot from a project website, things like does this website actually feel helpful, is it answering my questions that I have as a new user, does the website look actively maintained, and how successful does this project look, did someone actually spend the time to make the website helpful and nice?
11:58 Let’s play through this with an actual Python project website. A great example here is the Requests library, and right away when this site loads up, this looks like a really high quality library.
12:13 It has its own logo here, it looks like it’s supporting a bunch of Python versions it looks like it has automated tests which is always a great thing to see, and the project maintainer is also tracking test code coverage.
12:25 Here on the left you can see that the page has this embedded GitHub stars indicator, and as you can tell, the library has a high number of stars here which is usually a good sign.
12:36 What I like here as well is that the page starts with a concrete example of what you can do with the library and what it looks like to use it. This is great, so they even have a bunch of user testimonials from really well known people in the Python community, and when I scroll down further, I can see here that it has a pretty extensive user guide that covers a number of interesting things and seems really well structured, there is also in depth API documentation which is always a good sign.
13:04 Another sign that this is a really popular and strong library is that it has a contributor guide with all kinds of information about how to contribute to the project, the code style they use, how people should report bugs, and a really small and unpopular library is usually not going to have a need for that.
13:23 So when you see something like that, that is usually a strong sign that the library is really popular and very successful. And by extension that means it’s usually a safe choice for you to use that library in your own programs.
13:36 By the way, if you’re wondering where to find a project’s homepage, if it has one, you can usually find the link on PyPI, so it we’ll be right here on the left and for older versions of PyPI you will typically have to scroll all the way down and then you can find the link to the project homepage there.
13:55 There we go, this is the homepage link for the Requests library. At the end of step 3, after you check the couple of project homepages, your list should have narrowed down a little bit further, at this point you are starting to get to know these projects a lot better and you have a good idea of how popular they are, how well maintained they are, and whether or not you like them.
14:15 So maybe you can already start excluding some libraries that you are not really enjoying as much. Of course, not all libraries are going to have a dedicated website or homepage, that doesn’t automatically mean that the library is not great quality, many Python projects don’t actually have dedicated homepages, but if there is one, it absolutely makes sense to check it out.
Thanks Bonifacio, I’ve updated your comment to have clickable links :)
Become a Member to join the conversation.
Bonifacio de Oliveira on Dec. 1, 2019
Links mentioned in the video: