Selecting Quality Packages Part 2
Resources & Links:
00:00 Not all Python packages are going to have a dedicated project homepage. But what every project should have is some form of README file, that introduces you to the project. So I always check those two.
00:13 And what I like to see here, is that I want the README to cover the basics of the project, what does the library do, and how do I install it. You could learn a lot about the quality of a library, by looking at how the maintainers communicate the value that the library provides, I also want to know what license the project is under, because that could really influence in what circumstances you can actually use the project and then, it makes sense to quickly check who the author is, is it a group of people, is it a company, is it an individual contributor, what have they done in the past, and do they seem trustworthy?
00:49 Let’s take a look at a real project README now. Alright, I am going to try and find the README file for the Reqests library now. So, typically, what you’d be looking for is a link to the project source repository so it already looks like this is hosted on GitHub here, so I am just going to look for a link.
01:09 Alright, there we go, Requests at GitHub, so that should be the link to the GitHub project where we can check out the project README, yes, this is it, so when I scroll down this is where GitHub displays the README file, and for other source control websites like BitBucket, they will either display the README in the same fashion or you can view it by finding the README file and then clicking directly on that.
01:29 Now the first thing that I can see here is that this looks really similar to the project homepage which isn’t a bad thing, I mean, this contains all of the information that I wanted to get out of the README and it looks like it’s really well structured and nicely formatted.
01:41 So this is great, this tells me how to install the library, and it looks really simple, it’s pointing me to the documentation and it also tells me how to contribute to the project.
01:52 If you’re wondering what should go into a great README file, I wrote this article a while ago, about how you can write a great README for a GitHub project, I am covering a number of things here that in my opinion should go into a great README, for example, it should talk about what the project actually does, how to install it, some example usage, how someone could set up a development environment, some link to a change log, and then also license and author info.
02:16 You can check out the full article in the link that you see here. Now let’s go back to the requests README file. I said that I’d like to know under which license a library was published, so let’s find that out now.
02:31 Usually where you can find that information is in a license file, at the root of the repository. So this tells us that requests is under the Apache license, a popular open source license. If you’re wondering what the most common open source licenses are, and what their terms are there is a really great website you should check out.
02:48 Go to choosealicence.com/licenses and they have great simple and human readable explanations of the conditions and permissions and limitations in the most popular open source licenses.
03:02 So for example, this is the Apache license used by Requests, and this gives us a quick overview over the terms of the license, without actually having to drill down into the nitty gritty details.
03:13 Another thing that I’d like to know is who the authors are, who wrote a library. Now, typically, in an open source library, you can find an authors file that will list all the contributors, again here with Requests you can get a really quick overview of who the core maintainers are, and then there was apparently a whole bunch of people who have submitted patches over time, and this is a great sign because it means you have a project leadership and then you also have a large group of people who are dedicating patches and contributions to the project.
03:45 We could also check out the GitHub user account that hosts the Request library, and in this case, it’s Kenneth Reitz and you can see that Kenneth has a number of very popular libraries in the Python space, he is working for respectable and well known company and these are all indicators that Requests is a really great library.
04:04 At the end of step 4, maybe the field has narrowed down a little bit further, every Python library should have a good project README, and I find it helpful to familiarize myself with the licensing terms for the project, and the team of people working on or maintaining the library.
04:23 In step 5, you’re going to make sure that the project is actively maintained. In my mind, this is one of the most important quality indicators, now how can you find out if a project is under active development, usually a great way to find that information is to check out the project change log and the update history.
04:42 You could either do that directly on PyPI or by checking the project source repository, also on the source repository you can usually find a bug tracker for the project.
04:52 Now this can tell you a lot about how the project is being maintained. Are there discussions going on, are there many open tickets for severe bugs? If there are no tickets, than that is usually not a great sign either, because in my experience, any project that gets some traction, has a flood of bug reports coming in. Now I would recommend that you skim through some of those bug reports, just to make sure that there isn’t some large problem with the project that would affect your ability to use it properly.
05:20 Another piece of information you can find directly on the source repository is when the last commit to the project happened. Now you don’t want to discount projects that do not have a lot of development activity going on at the moment, I’d rather pick a well seasoned project that is also well maintained or at least not abandoned over one that’s super maintained but also brand new, because then you don’t really know what the future holds, maybe the project is going to get abandoned in a few months, and then you’re stuck with it, whereas a seasoned library that still does its job properly but it’s not getting a lot of feature updates, could still be totally worth your while, there is nothing wrong with an older library that does its job really well.
06:00 At the end of step 5, your list of candidates projects will likely have narrowed down further and this is a good thing, the more projects you can weed out, the easier it will be to pick the perfect library for your usecase.
06:12 You are almost done here. In step 6, you would spot check the source code of the project. I always like to look under the hood of a library that I am going to use in my own programs.
06:26 And usually, this is really easy to do if you’re dealing with an open source project, you just open the project repository website and browse around in the code a little bit.
06:34 Here is what I like to see. Does the code follow community best practices, for example, does it have a consistent formatting style, are there comments in the code, are there docstrings, stuff like that, another hugely important topic for me is whether or not the code has automated test coverage, in my mind, a good quality Python package will always have an automated test suite.
06:57 Looking at the code will also give you a good idea of how experienced the developers were who wrote the library; often you can tell at a glance whether it was someone who had a deep understanding of Python who wrote a library, or if it was someone who was maybe coming from an entirely different language background and was just kind of told to write a Python library.
07:17 Now, this doesn’t automatically mean disaster, but it’s still a really good quality indicator. In the end, it all boils down to the question would you feel comfortable making small changes to this library if you had to?
07:30 Because that is what the worst case scenario is. Imagine you are building a really successful application that is using a particular library and then the original authors of the library stop maintaining it.
07:40 Well, if you don’t want to give up your project, it will pretty much come down to you maintaining this library, at least enough so you can use it for your own purposes.
07:49 This is something that I always try to keep in the back of my head when I make a decision whether to use one library or another. Alright, let’s take a look at what this looks like in practice.
08:00 So I am back here looking at the GitHub repository for the Requests library. And that gives me a really easy way to browse through the library source code, so I don’t even have to install it, I can just use the GitHub code viewer and browse around and I don’t need to pull this over into my own editor.
08:17
So what I would do here is try and find the main directory where all the source files live in, and in this case, it’s the requests
folder so typically this would be named after the library, and you can see here there is a bunch of Python files in there.
08:31 This seems pretty well structured already and you can also see there is a lot of activity here so these are being updated all the time. Now let’s check out one of those files.
08:41
For example, the cookies.py
file, that sounds tasty. And I would just spend some time reading that code, so things that I immediately like here is that there is docstrings, the imports are nicely formatted, you can see here the classes seem like they are named properly, again, there are extensive docstrings for everything.
09:02 This class here with these methods on it, they seem well structured, right, there is not this crazy long like a thousand lines methods here. This is all pretty nice and tidy and when I scroll further through the file, it all just seems like it’s following a structure and it’s formatted in the way that makes it easy on the eyes, and that is usually a really good sign, like imagine you have to maintain this code, personally I would much rather work with code that looks like this, than some convoluted mess.
09:36 And you can see here it seems to adhere to the PEP 8 formatting style which I think is also a good sign because if you are also using PEP 8 or something similar, than this library code is going to look similar to your application code, which also helps maintenance.
09:52
Yeah, so I would say this looks pretty good, let’s see if we can find some tests. Okay, so there is a tests folder here, and again, it looks like there are whole bunch of tests here, so let’s check out the test_structures.py
, alright, so they are using pytest which is a library that personally I like a lot so this would be a good sign for me, first of all I love the fact they have an automated test suite here and just glancing over those tests, I mean, they seem pretty reasonable, right, they seem like they are actually testing stuff, they are not just placeholders or dummy tests they are actually doing some things.
10:36 Now, usually I wouldn’t do like a full code review for a library that I want to use, but I just want to do some spot checking to get an idea of the code quality for that library, because, in the worst case scenario I might actually have to do some maintenance work on this library, if someone stops maintaining it and it’s an important part of my own application, then I would be pretty much responsible for keeping this thing alive so that I can continue to use it.
11:03 So this is always something that is in the back of my head; of course, Requests here passes that test with flying colors, and seeing how popular that library is, it’s probably going to be maintained for a really long time so I wouldn’t be too worried about this, but, of course it helps that it has great code quality too.
11:19 Okay you made it all the way to step 7, and this is the last step in this workflow. So at this point, you would have a much narrow down list of candidates, and now it’s time to try out a few of them.
11:33 So at this point, I would go over my notes and my memories, and take this narrow down list of candidates and just start installing some of them to try them out, in an interpreter session, and I am always suing a fresh virtual environment for that so that I am not cluttering up my system.
11:48 I would encourage you to do the same, and then you can just launch into an interpreter session, import the library, and play with it a little bit. Or you might write a couple of small programs just to get a feel for how the library works, so for example, with Requests, maybe I would write a little program that downloads a file over HTTP and then I would try and implement the same example with a different library to get a feel for what the strength and weaknesses are of each of them.
12:16 Now actually, installing the library and then trying it out is going to tell you something very important; it’s going to tell you whether the package installs and imports cleanly, because, at the end of the day that is super important, even if you have the best library for your purpose and it’s so painful to install or it doesn’t work on your system, then that is not going to help you.
12:37 So I always make sure to actually get some hands-on experience with my top three choices or so, so that I can be confident into decision that I make. Another very important question is whether or not you enjoy working with the package.
12:50 I strongly believe that developers should use tools that they enjoy working with, and this also applies to third party packages and modules and libraries.
12:57 So for me, this would always factor into the decision, now I realize that there might be business constraints and sometimes you just have to work with something that you are not enjoying as much.
13:08 But if there is a way to get the best of both worlds, a really great library that is actually fun to work with, I would always pick the one that is fun to work with and gets the job done.
Become a Member to join the conversation.
Damian on April 19, 2020
dbader.org/blog/write-a-great-readme-for-your-github-project
choosealicense.com/licenses/