This lesson shows you how you can create a new package based on the given one, which you’ll publish later to PyPI!
The package consists of five files:
config.txt
is a configuration file used to specify the URL of the feed of Real Python tutorials. It’s a text file that can be read by the configparser
standard library:
# config.txt
[feed]
url = https://realpython.com/atom.xml
In general, such a config file contains key-value pairs separated into sections. This particular file contains only one section (feed
) and one key (url
).
Note: A configuration file is probably overkill for this simple package. We include it here for demonstration purposes.
The first source code file we’ll look at is __main__.py
. The double underscores indicate that this file has a special meaning in Python. Indeed, when running a package as a script with -m
as above, Python executes the contents of the __main__.py
file.
In other words, __main__.py
acts as the entry point of our program and takes care of the main flow, calling other parts as needed:
# __main__.py
from configparser import ConfigParser
from importlib import resources # Python 3.7+
import sys
from reader import feed
from reader import viewer
def main():
"""Read the Real Python article feed"""
# Read URL of the Real Python feed from config file
cfg = ConfigParser()
cfg.read_string(resources.read_text("reader", "config.txt"))
url = cfg.get("feed", "url")
# If an article ID is given, show the article
if len(sys.argv) > 1:
article = feed.get_article(url, sys.argv[1])
viewer.show(article)
# If no ID is given, show a list of all articles
else:
site = feed.get_site(url)
titles = feed.get_titles(url)
viewer.show_list(site, titles)
if __name__ == "__main__":
main()
Notice that main()
is called on the last line. If we do not call main()
, then our program would not do anything. As you saw earlier, the program can either list all articles or print one specific article. This is handled by the if-else
inside main()
.
To read the URL to the feed from the configuration file, we use configparser
and importlib.resources
. The latter is used to import non-code (or resource) files from a package without having to worry about the full file path. It is especially helpful when publishing packages to PyPI where resource files might end up inside binary archives.
importlib.resources
became a part of the standard library in Python 3.7. If you are using an older version of Python, you can use importlib_resources
instead. This is a backport compatible with Python 2.7, and 3.4 and above. importlib_resources
can be installed from PyPI:
$ pip install importlib_resources
See Barry Warzaw’s presentation at PyCon 2018 for more information.
The next file is __init__.py
. Again, the double underscores in the filename tell us that this is a special file. __init__.py
represents the root of your package. It should usually be kept quite simple, but it’s a good place to put package constants, documentation, and so on:
# __init__.py
# Version of the realpython-reader package
__version__ = "1.0.0"
The special variable __version__
is a convention in Python for adding version numbers to your package. It was introduced in PEP 396. We’ll talk more about versioning later.
Variables defined in __init__.py
become available as variables in the package namespace:
>>> import reader
>>> reader.__version__
'1.0.0'
You should define the __version__
variable in your own packages as well.
Looking at __main__.py
, you’ll see that two modules, feed
and viewer
, are imported and used to read from the feed and show the results. These modules do most of the actual work.
First consider feed.py
. This file contains functions for reading from a web feed and parsing the result. Luckily there are already great libraries available to do this. feed.py
depends on two modules that are already available on PyPI: feedparser
and html2text
.
feed.py
contains several functions. We’ll discuss them one at a time.
To avoid reading from the web feed more than necessary, we first create a function that remembers the feed the first time it’s read:
1# feed.py
2
3import feedparser
4import html2text
5
6_CACHED_FEEDS = dict()
7
8def _feed(url):
9 """Only read a feed once, by caching its contents"""
10 if url not in _CACHED_FEEDS:
11 _CACHED_FEEDS[url] = feedparser.parse(url)
12 return _CACHED_FEEDS[url]
feedparser.parse()
reads a feed from the web and returns it in a structure that looks like a dictionary. To avoid downloading the feed more than once, it’s stored in _CACHED_FEEDS
and reused for later calls to _feed()
. Both _CACHED_FEEDS
and _feed()
are prefixed by an underscore to indicate that they are support objects not meant to be used directly.
We can get some basic information about the feed by looking in the .feed
metadata. The following function picks out the title and link to the web site containing the feed:
14def get_site(url):
15 """Get name and link to web site of the feed"""
16 info = _feed(url).feed
17 return f"{info.title} ({info.link})"
In addition to .title
and .link
, attributes like .subtitle
, .updated
, and .id
are also available.
The articles available in the feed can be found inside the .entries
list. Article titles can be found with a list comprehension:
19def get_titles(url):
20 """List titles in feed"""
21 articles = _feed(url).entries
22 return [a.title for a in articles]
.entries
lists the articles in the feed sorted chronologically, so that the newest article is .entries[0]
.
In order to get the contents of one article, we use its index in the .entries
list as an article ID:
24def get_article(url, article_id):
25 """Get article from feed with the given ID"""
26 articles = _feed(url).entries
27 article = articles[int(article_id)]
28 html = article.content[0].value
29 text = html2text.html2text(html)
30 return f"# {article.title}\n\n{text}"
After picking the correct article out of the .entries
list, we find the text of the article as HTML on line 28. Next, html2text
does a decent job of translating the HTML into much more readable text. As the HTML doesn’t contain the title of the article, the title is added before returning.
The final module is viewer.py
. At the moment, it consists of two very simple functions. In practice, we could have used print()
directly in __main__.py
instead of calling viewer
functions. However, having the functionality split off makes it easier to replace it later with something more advanced. Maybe we could add a GUI interface in a later version?
viewer.py
contains two functions:
# viewer.py
def show(article):
"""Show one article"""
print(article)
def show_list(site, titles):
"""Show list of articles"""
print(f"The latest tutorials from {site}")
for article_id, title in enumerate(titles):
print(f"{article_id:>3} {title}")
show()
simply prints one article to the console, while show_list()
prints a list of titles. The latter also creates article IDs that are used when choosing to read one particular article.
rorydaulton on May 31, 2020
In the “Setting Up a New Python Package” video at time mark 1:52 and following you say, “Now you can go ahead and copy and paste the code from below.” I do not see any code below the video. Am I missing something or did you forget to put the code in the text below the video? (And is a comment the best way to ask such a question?)