How to Evaluate the Quality of Python Packages

How to Evaluate the Quality of Python Packages

by Philipp Acsany intermediate best-practices

Installing packages with Python is just one pip install command away. That’s one of the many great qualities that the Python ecosystem has to offer.

However, you may have downloaded a third-party package once that didn’t work out for you in one way or another. For example, the package didn’t support the Python version that you were using in your project, or the package didn’t do what you expected it to do.

By understanding the characteristics of a high-quality Python package, you can avoid introducing incompatible or even harmful code into your project. In this tutorial, you’ll learn how the Python Package Index can give you a first impression of a package. Then you’ll dig even deeper by checking out Libraries.io, the GitHub repository, and the license of any Python package that you want to use.

In the end, you’ll know how to evaluate third-party packages that you can find online before you implement them into your Python projects.

For future reference, you can also download this handy flowchart that’ll help you decide if a third-party Python package works for your particular situation:

How to Evaluate the Quality of Third-Party Python Packages

The Python Package Index (PyPI) provides the largest collection of external Python packages. Before you install a package with pip, you should make sure that it’s available on PyPI:

Screenshot of the PyPI website

Being able to find the package on PyPI is a good indicator that it’s legit, although you still need to be careful about what you’re getting.

On PyPI, you’ve got the option to either browse categories or search for keywords. Unless you’re looking for a very niche package, chances are that PyPI will present you with a list of thousands of packages that match your topic.

To sort the order of the list, PyPI gives you two options:

  1. Relevance
  2. Date last updated

The option to sort by Relevance is ambiguous because you don’t know what PyPI takes into consideration for this order. Still, when you sort by relevance, the packages appearing on top are indeed often the ones that suit your needs:

Sometimes, it can be tricky to find the right package on PyPI, even if you know the package’s name. For example, when you search for beautiful soup, then you’ll get a number of similar-looking results. That’s when Relevance can come in handy.

Out of curiosity, you may also peek into the packages that have been updated lately by using the Date last updated order. But usually the sorting makes irrelevant packages appear on top, just because they were updated recently.

In most cases, a package doesn’t need to be cutting-edge unless it addresses a security issue. Instead, it’s important that a package supports the Python version of your project. This is when the PyPI filter comes into play:

For example, when you want to use the cool new features of Python 3.11 to full capacity, then you can select Python 3.11 in the list of Programming Languages filters.

Additionally, you can combine filters on PyPI. It’s a good idea to also add Development Status into your consideration by filtering for Production/Stable, too. This way, you increase the chances of working with reliable packages that have gone through thorough testing.

When you’ve found a package that seems to fit your needs, then it’s time to put it under the microscope to make sure it’s safe and reliable. For this evaluation, you’ll study the PyPI details page. Clicking a package name brings you to a page that’s dedicated to the package:

On the PyPI details page of a package, you’ll find the most important information about a package:

  • Project description
  • Supported Python versions
  • Release history
  • Project links
  • GitHub statistics
  • License information
  • Author details

Depending on the Python project that you need to add the package to, some pieces of information may be more important than others. However, you should always make sure that the project description is mature. Ideally, you get a helpful introduction to the package and find steps on how to get started with the package, as well as some code examples.

Another proxy indicator of the quality of a package can be its author. You may prefer to use a package from a known figure in the Python community rather than one by an anonymous person with an account named something like asdf123. When you hover over the author’s name, then you can verify that the email address of the package’s author is valid:

Screenshot of the PyPI detail page with the mouse hovering over an author's name

On the sidebar of a package’s detail package, you find statistics about the package next to the author’s details. Most prominent are the GitHub statistics, which you’ll investigate in a moment. It’s a little hidden, but you also find a link to another platform that gives you valuable information about the quality of external Python packages. You’ll have a look at this platform next.

Leverage Libraries.io

On the PyPI details page of a package, you can find a link to view the project’s statistics on Libraries.io. If you’re looking for high-quality Python packages, then the mission of Libraries.io will be music to your ears:

Helping developers make faster, more informed decisions about the software that they use. (Source)

But it’s not only Python developers who profit from their objective. Libraries.io keeps track of packages in multiple programming languages, including JavaScript and Java.

When you follow the link from the PyPI details page to Libraries.io, you see a page that looks similar to the PyPI details page at first glance:

Screenshot of the libraries.io detail page for a package

If you have a closer look at a Libraries.io details page, then you can find valuable information about a package that PyPI doesn’t show. Although the statistics come from PyPI, you’d need to access the PyPI API dataset to retrieve the data.

On Libraries.io, you see other valuable data points, such as:

  • Dependent packages
  • First release
  • Contributors

If the first release of a package is years in the past, but the package is still in development, then it probably aged well. This especially holds true when many contributors have joined forces to build the package.

Developing a package that becomes a requirement for others is a badge of honor for anyone who publishes a package on PyPI. A high number of dependent packages shows you that other developers trust the package. For their packages to work correctly, they rely on having the package in question installed.

On Libraries.io, you can find another interesting benchmark for the quality of third-party Python packages: the SourceRank. SourceRank is the proprietary score that Libraries.io gives packages based on several metrics. Having a look at the metrics gives you another checklist that you can take into consideration when evaluating Python packages:

Screenshot of the SourceRank breakdown on libraries.io

The SourceRank breakdown for the folium package shows you that it’s a package that you can trust. It had a recent release, it’s not brand-new, and over a hundred packages use it as a dependency, although it hasn’t reached version 1.0 yet.

On the SourceRank list, you see some factors listed that you already saw on PyPI. Many of them refer to the Git repository where the source code of the package is hosted. So that’s the next guidepost to have a closer look at.

Explore the GitHub Repository

The GitHub repository of a third-party package shows you how active its development is. You can dig into the repository to explore the source code yourself or browse through the issues to see how other developers use the package.

If you’ve got time on your hands, then diving into the source code is a great way to evaluate the quality of a Python package. Reading other people’s Python code also helps in your Python learning journey.

But the reason why you want to use an external package might be that you’re not knowledgeable in this particular area. That’s why the social proof of a Python package comes in handy. On GitHub, you can spot how excited others are about a project. Some of the metrics are the number of:

  • Watchers: People who have chosen to receive notifications about a repository’s activity
  • Stars: A way for users to bookmark or like a repository to keep track of it
  • Forks: Copies of a repository that someone has created in order to make changes without affecting the original codebase
  • Pull requests: Proposed changes to a repository that a contributor has submitted for review and potential merging into the main codebase
  • Issues: A way for users to report problems or suggest new features for a repository, which contributors or maintainers can then address

A high number of watchers and stars means that other people are interested in the Git repository. The number of forks indicates how many developers have copied the package’s repository to play around with the source code. In most open-source projects, other developers write their contributions to a package inside their own forks.

A high number of pull requests means that many developers want to contribute to a project. That’s a good indicator. But looking at the numbers alone usually doesn’t give you the full picture:

In the video above, you can see that the Django repository on GitHub has 145 open and over 16,000 closed pull requests, 8,000 of which were merged into the project. At the time of writing, some of them were closed and merged only hours ago. These metrics indicate that Django is a popular, actively maintained project and could be a perfect fit when you want to build a flashcards web app or create an app to manage to-do lists using Python.

When you have a look at the details of a GitHub repository’s pull requests, you can often spot a constructive discussion about the topic at hand and see developers helping each other out to make the source code better.

Still, other packages may have a high number of unmerged pull requests. A project with many open pull requests might indicate that the core developers aren’t actively monitoring and implementing changes from external contributors. However, this isn’t necessarily a bad sign. Instead, it’s worth further investigating how the discussion of the pull requests looks or how many pull requests were merged in the past.

The same goes for issues. Instead of just looking at the number of issues, look at the issue topics and the quality of the conversations. Learn why issues may still be open or why issues were closed.

Last but not least, pay attention to the README file of the repository. Similar to the PyPI package details page, the README file is a great indicator of how much care contributors put into the package. A well-written README file with helpful information for you as a user is always a good sign. It shows that the contributors care about documenting their package.

When you keep all of the above factors in mind, you’ve got a good chance of finding high-quality Python packages. Still, even when the source code of a package meets your needs, the attached license might not, so that’s what you’ll check out next.

Look at the License

Depending on how you want to use the Python package, you may want to have a close look at the package’s license. Even when a package is open source and free to use, it may come with strings attached. Broadly speaking, a license covers three aspects:

  1. Permissions: The allowed uses for the code, such as using it for commercial or non-commercial purposes, modifying the code, or distributing it with your own package
  2. Conditions: The requirements you need to fulfill to use the code, like including a copyright notice or providing a copy of the license with your code
  3. Limitations: The restrictions on what you can do with the code—for example, not using it for illegal purposes or not claiming that you wrote the code

For personal projects, you’re usually okay with most licenses of external open-source packages. This is especially true when you run your code only locally on your computer.

When you share your project with others, you need to check the attached license of any third-party packages that you’re using. Luckily, Python packages usually don’t come with custom licenses. Instead, developers often pick one of the popular open-source licenses.

You can find more information about the license that a package uses in the sidebar of the PyPI details page:

Screenshot of a Package's License Info on PyPI

Especially when you’re using a third-party package in a commercial setting, choosing a package with the proper license is important. In that case, it’s a good idea to filter down your PyPI search by using the license option.

When you’ve completed all the above steps and the license of a package fits your needs, then you’re almost good to go. But before you take off and install the external package, there are some words of caution in the next section.

Be Careful

You should be wary of files that you download from the Web. Similarly, you shouldn’t implement any external packages into your code without ensuring that you can trust the source.

By now, you’ve learned the essential measures for verifying that you’re working with high-quality packages in your code. You may even have a handful of packages that you know you can trust because you’ve worked with them successfully.

Still, it’s a good idea to check their project websites now and then. Even minor version bumps of a project can cause bugs in your project that you didn’t expect.

By keeping yourself up to date with the development of your trusted packages, you know what may break with an update. In that case, you can securely stick to an older, working version.

Knowing which external packages you can trust is a great accomplishment for you as a Python developer. But even when you know package names by heart, pay close attention when you install them.

There’s a chance that you’ll make a typo in your pip command. Wrongdoers may exploit this possibility by giving their packages names that imitate popular packages. Here are some details to pay attention to, which you’ll explore in more detail below:

  • Use the correct number in the package name.
  • Remember if the package name is singular or plural.
  • Guard against typosquatting.

Some Python packages contain a number in their name. Often these are version numbers that the maintainers decided to implement into the package name to differentiate a package from an older version. While this may make sense to some, others may be confused about the number.

For example, when working with Jinja templates in Python 3, you may accidentally type Jinja3 instead of using the correct Jinja2 version.

Another mishap can be typing a package name in the singular noun form, instead of the plural. For example, you use request without an s on the end of your pip command instead of using the correct requests name. In this case, all the thorough research that you did for Python’s requests library has been in vain.

Typos can happen, especially when you have to type the pip command for packages with long names. The Beautiful Soup package is notoriously prone to typos. Think of all the places where you could slip up in typing beautifulsoup4.

Evildoers may upload packages where they’ve switched two letters or replaced one with a neighboring letter on the keyboard. This imitation technique is known as typosquatting. Some packages can be considered malware and shouldn’t find their way onto your system.

More often than not, pip won’t be able to find a mistyped package, or you’ll just end up with a different package than the one you were looking for. Still, there’s a chance that somebody with bad intentions has uploaded a similar-sounding package. If you want to be on the safe side, then it’s a good idea to copy and paste the name directly from PyPI to avoid typos.

Even installing the right package in good will can be dangerous if an attacker has managed to take control of the author’s project. So it’s a good idea to stay alert and take note of any suspicious changes in package behavior.

If you ever spot a malicious package on PyPI, then you can report a security issue. PyPI takes security very seriously and usually takes action after any reports. However, to avoid revealing any internal practices to attackers, the security team at PyPI often remains silent about their operations. So you may or may not be updated about any follow-up steps if you report a suspicious package.

Still, reporting any suspicious package is a good idea! That way, every Python user can take their part in keeping the Python environment clean.

Conclusion

When you use third-party Python packages, you’re working with external software that somebody else put on the Internet. Just like you shouldn’t download just any old file from the Internet, you shouldn’t install external packages without evaluating them first.

Before you install packages with pip, you should ask yourself these questions:

  • Does the package support the Python version that you’re working with?
  • How popular is the package?
  • Is the package’s codebase well maintained?
  • Do other packages rely on the package?
  • Does the package’s license fit your needs?
  • What’s the exact pip install command for the package?

Any third-party package worth considering should be on PyPI with a verbose details page and a link to the project’s source code. It’s also a good indicator when you can find online documentation and even a project website.

With the tool set that you’ve built in this tutorial, you’ll be well equipped to extend your Python projects with high-quality packages. Is there a particular third-party package that you love? Let the Real Python community know and give the package a shout-out in the comments below.

🐍 Python Tricks 💌

Get a short & sweet Python Trick delivered to your inbox every couple of days. No spam ever. Unsubscribe any time. Curated by the Real Python team.

Python Tricks Dictionary Merge

About Philipp Acsany

Philipp is a core member of the Real Python team. He creates tutorials, records video courses, and hosts Office Hours sessions to support your journey to becoming a skilled and fulfilled Python developer.

» More about Philipp

Each tutorial at Real Python is created by a team of developers so that it meets our high quality standards. The team members who worked on this tutorial are:

Master Real-World Python Skills With Unlimited Access to Real Python

Locked learning resources

Join us and get access to thousands of tutorials, hands-on video courses, and a community of expert Pythonistas:

Level Up Your Python Skills »

Master Real-World Python Skills
With Unlimited Access to Real Python

Locked learning resources

Join us and get access to thousands of tutorials, hands-on video courses, and a community of expert Pythonistas:

Level Up Your Python Skills »

What Do You Think?

Rate this article:

What’s your #1 takeaway or favorite thing you learned? How are you going to put your newfound skills to use? Leave a comment below and let us know.

Commenting Tips: The most useful comments are those written with the goal of learning from or helping out other students. Get tips for asking good questions and get answers to common questions in our support portal.


Looking for a real-time conversation? Visit the Real Python Community Chat or join the next “Office Hours” Live Q&A Session. Happy Pythoning!

Keep Learning

Related Topics: intermediate best-practices