Hint: You can adjust the default video playback speed in your account settings.
Sorry! Looks like there’s an issue with video playback 🙁 This might be due to a temporary outage or because of a configuration issue with your browser. Please see our video player troubleshooting guide to resolve the issue.

Setting Up a New Python Package

Give Feedback

This lesson shows you how you can create a new package based on the given one, which you’ll publish later to PyPI!

The package consists of five files:

config.txt is a configuration file used to specify the URL of the feed of Real Python tutorials. It’s a text file that can be read by the configparser standard library:

# config.txt

[feed]
url = https://realpython.com/atom.xml

In general, such a config file contains key-value pairs separated into sections. This particular file contains only one section (feed) and one key (url).

The first source code file we’ll look at is __main__.py. The double underscores indicate that this file has a special meaning in Python. Indeed, when running a package as a script with -m as above, Python executes the contents of the __main__.py file.

In other words, __main__.py acts as the entry point of our program and takes care of the main flow, calling other parts as needed:

# __main__.py

from configparser import ConfigParser
from importlib import resources  # Python 3.7+
import sys

from reader import feed
from reader import viewer

def main():
    """Read the Real Python article feed"""
    # Read URL of the Real Python feed from config file
    cfg = ConfigParser()
    cfg.read_string(resources.read_text("reader", "config.txt"))
    url = cfg.get("feed", "url")

    # If an article ID is given, show the article
    if len(sys.argv) > 1:
        article = feed.get_article(url, sys.argv[1])
        viewer.show(article)

    # If no ID is given, show a list of all articles
    else:
        site = feed.get_site(url)
        titles = feed.get_titles(url)
        viewer.show_list(site, titles)

if __name__ == "__main__":
    main()

Notice that main() is called on the last line. If we do not call main(), then our program would not do anything. As you saw earlier, the program can either list all articles or print one specific article. This is handled by the if-else inside main().

To read the URL to the feed from the configuration file, we use configparser and importlib.resources. The latter is used to import non-code (or resource) files from a package without having to worry about the full file path. It is especially helpful when publishing packages to PyPI where resource files might end up inside binary archives.

importlib.resources became a part of the standard library in Python 3.7. If you are using an older version of Python, you can use importlib_resources instead. This is a backport compatible with Python 2.7, and 3.4 and above. importlib_resources can be installed from PyPI:

$ pip install importlib_resources

See Barry Warzaw’s presentation at PyCon 2018 for more information.

The next file is __init__.py. Again, the double underscores in the filename tell us that this is a special file. __init__.py represents the root of your package. It should usually be kept quite simple, but it’s a good place to put package constants, documentation, and so on:

# __init__.py

# Version of the realpython-reader package
__version__ = "1.0.0"

The special variable __version__ is a convention in Python for adding version numbers to your package. It was introduced in PEP 396. We’ll talk more about versioning later.

Variables defined in __init__.py become available as variables in the package namespace:

>>>
>>> import reader
>>> reader.__version__
'1.0.0'

You should define the __version__ variable in your own packages as well.

Looking at __main__.py, you’ll see that two modules, feed and viewer, are imported and used to read from the feed and show the results. These modules do most of the actual work.

First consider feed.py. This file contains functions for reading from a web feed and parsing the result. Luckily there are already great libraries available to do this. feed.py depends on two modules that are already available on PyPI: feedparser and html2text.

feed.py contains several functions. We’ll discuss them one at a time.

To avoid reading from the web feed more than necessary, we first create a function that remembers the feed the first time it’s read:

 1 # feed.py
 2 
 3 import feedparser
 4 import html2text
 5 
 6 _CACHED_FEEDS = dict()
 7 
 8 def _feed(url):
 9     """Only read a feed once, by caching its contents"""
10     if url not in _CACHED_FEEDS:
11         _CACHED_FEEDS[url] = feedparser.parse(url)
12     return _CACHED_FEEDS[url]

feedparser.parse() reads a feed from the web and returns it in a structure that looks like a dictionary. To avoid downloading the feed more than once, it’s stored in _CACHED_FEEDS and reused for later calls to _feed(). Both _CACHED_FEEDS and _feed() are prefixed by an underscore to indicate that they are support objects not meant to be used directly.

We can get some basic information about the feed by looking in the .feed metadata. The following function picks out the title and link to the web site containing the feed:

14 def get_site(url):
15     """Get name and link to web site of the feed"""
16     info = _feed(url).feed
17     return f"{info.title} ({info.link})"

In addition to .title and .link, attributes like .subtitle, .updated, and .id are also available.

The articles available in the feed can be found inside the .entries list. Article titles can be found with a list comprehension:

19 def get_titles(url):
20     """List titles in feed"""
21     articles = _feed(url).entries
22     return [a.title for a in articles]

.entries lists the articles in the feed sorted chronologically, so that the newest article is .entries[0].

In order to get the contents of one article, we use its index in the .entries list as an article ID:

24 def get_article(url, article_id):
25     """Get article from feed with the given ID"""
26     articles = _feed(url).entries
27     article = articles[int(article_id)]
28     html = article.content[0].value
29     text = html2text.html2text(html)
30     return f"# {article.title}\n\n{text}"

After picking the correct article out of the .entries list, we find the text of the article as HTML on line 28. Next, html2text does a decent job of translating the HTML into much more readable text. As the HTML doesn’t contain the title of the article, the title is added before returning.

The final module is viewer.py. At the moment, it consists of two very simple functions. In practice, we could have used print() directly in __main__.py instead of calling viewer functions. However, having the functionality split off makes it easier to replace it later with something more advanced. Maybe we could add a GUI interface in a later version?

viewer.py contains two functions:

# viewer.py

def show(article):
    """Show one article"""
    print(article)

def show_list(site, titles):
    """Show list of articles"""
    print(f"The latest tutorials from {site}")
    for article_id, title in enumerate(titles):
        print(f"{article_id:>3} {title}")

show() simply prints one article to the console, while show_list() prints a list of titles. The latter also creates article IDs that are used when choosing to read one particular article.

Comments & Discussion

rorydaulton on May 31, 2020

In the “Setting Up a New Python Package” video at time mark 1:52 and following you say, “Now you can go ahead and copy and paste the code from below.” I do not see any code below the video. Am I missing something or did you forget to put the code in the text below the video? (And is a comment the best way to ask such a question?)

Dan Bader RP Team on May 31, 2020

@rorydaulton: Thanks for the heads up. I just added the sample files to the video description :)

Become a Member to join the conversation.