This lesson shows you how you can create a new package based on the given one, which you’ll publish later to PyPI!

The package consists of five files:

config.txt is a configuration file used to specify the URL of the feed of Real Python tutorials. It’s a text file that can be read by the configparser standard library:

Config File
      
# config.txt

[feed]
url = https://realpython.com/atom.xml

In general, such a config file contains key-value pairs separated into sections. This particular file contains only one section (feed) and one key (url).

Note: A configuration file is probably overkill for this simple package. We include it here for demonstration purposes.

The first source code file we’ll look at is __main__.py. The double underscores indicate that this file has a special meaning in Python. Indeed, when running a package as a script with -m as above, Python executes the contents of the __main__.py file.

In other words, __main__.py acts as the entry point of our program and takes care of the main flow, calling other parts as needed:

Python
      
    
# __main__.py

from configparser import ConfigParser
from importlib import resources  # Python 3.7+
import sys

from reader import feed
from reader import viewer

def main():
    """Read the Real Python article feed"""
    # Read URL of the Real Python feed from config file
    cfg = ConfigParser()
    cfg.read_string(resources.read_text("reader", "config.txt"))
    url = cfg.get("feed", "url")

    # If an article ID is given, show the article
    if len(sys.argv) > 1:
        article = feed.get_article(url, sys.argv[1])
        viewer.show(article)

    # If no ID is given, show a list of all articles
    else:
        site = feed.get_site(url)
        titles = feed.get_titles(url)
        viewer.show_list(site, titles)

if __name__ == "__main__":
    main()

Notice that main() is called on the last line. If we do not call main(), then our program would not do anything. As you saw earlier, the program can either list all articles or print one specific article. This is handled by the if-else inside main().

To read the URL to the feed from the configuration file, we use configparser and importlib.resources. The latter is used to import non-code (or resource) files from a package without having to worry about the full file path. It is especially helpful when publishing packages to PyPI where resource files might end up inside binary archives.

importlib.resources became a part of the standard library in Python 3.7. If you are using an older version of Python, you can use importlib_resources instead. This is a backport compatible with Python 2.7, and 3.4 and above. importlib_resources can be installed from PyPI:

Shell
      
$ pip install importlib_resources

See Barry Warzaw’s presentation at PyCon 2018 for more information.

The next file is __init__.py. Again, the double underscores in the filename tell us that this is a special file. __init__.py represents the root of your package. It should usually be kept quite simple, but it’s a good place to put package constants, documentation, and so on:

Python
      
# __init__.py

# Version of the realpython-reader package
__version__ = "1.0.0"

The special variable __version__ is a convention in Python for adding version numbers to your package. It was introduced in PEP 396. We’ll talk more about versioning later.

Variables defined in __init__.py become available as variables in the package namespace:

Python
      
>>> import reader
>>> reader.__version__
'1.0.0'

You should define the __version__ variable in your own packages as well.

Looking at __main__.py, you’ll see that two modules, feed and viewer, are imported and used to read from the feed and show the results. These modules do most of the actual work.

First consider feed.py. This file contains functions for reading from a web feed and parsing the result. Luckily there are already great libraries available to do this. feed.py depends on two modules that are already available on PyPI: feedparser and html2text.

feed.py contains several functions. We’ll discuss them one at a time.

To avoid reading from the web feed more than necessary, we first create a function that remembers the feed the first time it’s read:

Python
      
    
# feed.py

import feedparser
import html2text

_CACHED_FEEDS = dict()

def _feed(url):
    """Only read a feed once, by caching its contents"""
    if url not in _CACHED_FEEDS:
        _CACHED_FEEDS[url] = feedparser.parse(url)
    return _CACHED_FEEDS[url]

feedparser.parse() reads a feed from the web and returns it in a structure that looks like a dictionary. To avoid downloading the feed more than once, it’s stored in _CACHED_FEEDS and reused for later calls to _feed(). Both _CACHED_FEEDS and _feed() are prefixed by an underscore to indicate that they are support objects not meant to be used directly.

We can get some basic information about the feed by looking in the .feed metadata. The following function picks out the title and link to the web site containing the feed:

Python
      
    
def get_site(url):
    """Get name and link to web site of the feed"""
    info = _feed(url).feed
    return f"{info.title} ({info.link})"

In addition to .title and .link, attributes like .subtitle, .updated, and .id are also available.

The articles available in the feed can be found inside the .entries list. Article titles can be found with a list comprehension:

Python
      
    
def get_titles(url):
    """List titles in feed"""
    articles = _feed(url).entries
    return [a.title for a in articles]

.entries lists the articles in the feed sorted chronologically, so that the newest article is .entries[0].

In order to get the contents of one article, we use its index in the .entries list as an article ID:

Python
      
    
def get_article(url, article_id):
    """Get article from feed with the given ID"""
    articles = _feed(url).entries
    article = articles[int(article_id)]
    html = article.content[0].value
    text = html2text.html2text(html)
    return f"# {article.title}\n\n{text}"

After picking the correct article out of the .entries list, we find the text of the article as HTML on line 28. Next, html2text does a decent job of translating the HTML into much more readable text. As the HTML doesn’t contain the title of the article, the title is added before returning.

The final module is viewer.py. At the moment, it consists of two very simple functions. In practice, we could have used print() directly in __main__.py instead of calling viewer functions. However, having the functionality split off makes it easier to replace it later with something more advanced. Maybe we could add a GUI interface in a later version?

viewer.py contains two functions:

Python
      
    
# viewer.py

def show(article):
    """Show one article"""
    print(article)

def show_list(site, titles):
    """Show list of articles"""
    print(f"The latest tutorials from {site}")
    for article_id, title in enumerate(titles):
        print(f"{article_id:>3} {title}")

show() simply prints one article to the console, while show_list() prints a list of titles. The latter also creates article IDs that are used when choosing to read one particular article.

00:00 Now it’s time to set up your own package so that you can upload it yourself to PyPI. You may have noticed in the previous video, I called the package with a -m after python, which is to call a module name instead of a filename. In this way, you can call modules that are built into Python without knowing where the files actually are, which can be pretty helpful.

00:20 A common one I use is venv

00:26 for setting up virtual environments. Sometimes, this doesn’t work. If you go to your terminal, try typing in python -m math.

00:40 You’ll end up with an error that says there’s No code object available. This is because the math library doesn’t have a __main__.py file.

00:49 When you use -m it looks for one of these to execute, so if you want your package to be able to be executed, you need to include one. So now in your editor, go ahead and create a new folder.

01:01 I’m just going to call mine joe-reader. This is where your package is going to go. So inside of this folder, you’re going to create a couple new files. The first one, go ahead and make a config.txt,

01:20 and __main__.py—and actually let me open this up so we can see them—

01:33 __init__.py, and then feed.py,

01:49 and viewer.py. Now you can go ahead and copy and paste the code from below. The config file isn’t really needed for a project this small, but we’re including it so you can see how to include different types of files when you package up your project.

02:08 And if you look here, all this is going to contain is a URL to where the Real Python feed is located. Save that and close that out. __main__.py is pretty important.

02:21 This is how you can use that -m to call the module. I’m actually just going to close this out.

02:29 You can see there’s a lot going in here. The big thing to take into account is that this is importing everything that your script needs to execute. It defines a main() function here, and then at the end, it calls that function.

02:46 If you were to remove these lines down here, the reader would not run. Okay, save that, and move over to __init__.py. Copy and paste the following in. The __init__.py is the root of the package.

03:03 So here, we’re just going to use it to hold the __version__, which you can tell is important because it is also with underscores (__).

03:13 But we’ll talk more about versioning later on. Save this, and move over to feed.py. feed.py and viewer.py contain the code that actually does pretty much what you want.

03:25 So if you’re using your own project for packaging, there’s a good chance that most of your code will go in these kinds of files. So let me copy this, paste this here.

03:36 So, _feed() is actually pulling the data from the Real Python site, and it’s doing this clever caching so that it only reads what it needs to read—no matter how many times you run it.

03:47 Save that, move on over to viewer, and paste in the viewer.py code. All this is doing is printing out the data to your terminal, whether it’s for one article or for the list of articles. All right!

04:01 Now you’ve got the base structure set up for your package. In the next video, you’re going to add some configuration so that you can prepare your package for publication.

rorydaulton on May 31, 2020

In the “Setting Up a New Python Package” video at time mark 1:52 and following you say, “Now you can go ahead and copy and paste the code from below.” I do not see any code below the video. Am I missing something or did you forget to put the code in the text below the video? (And is a comment the best way to ask such a question?)

Dan Bader RP Team on May 31, 2020

@rorydaulton: Thanks for the heads up. I just added the sample files to the video description :)

sshekhar on Dec. 30, 2021

Is there a tutorial or a course that explains how to create a package from scratch? This tutorial provided an existing package realpython-reader but I would like to know how can I go about constructing my own package, as in do I need a setup.py or init.py or some other files. What files are optional and what are required?

fhireman on March 16, 2022

Same question as sshekhar so far, it might actually be explained further in the course but for now I’m wondering how to build my own simplest-form best practices package from scratch instead of an existing one.

Become a Member to join the conversation.