Prefix and Suffix Methods & Topological Sort
00:00
In the previous lesson, I covered the new parser and changes to the generic type hints. In this lesson, I’ll be talking about prefix and suffix removal functions added to the str
(string) class, and topological graph sorting.
00:14
In Python 3.9, strings now come with two new functions, .removeprefix()
and .removesuffix()
, for pulling off something from the beginning of a string and pulling off something from the end of a string. Prior to Python 3.9, the most common way of getting rid of things from the beginning and end of a string was the .strip()
function.
00:39
If you are new to Python or the .strip()
function, you could be forgiven if you thought this was a little broken. .strip(" 3.9")
does not strip off the string " 3.9"
.
00:51
It strips off any character in that grouping—the space (" "
), the "3"
, the dot ("."
), and the "9"
. At the end of the string, the result that you expect is removed. At the beginning of the string, the "3"
disappears as well. This might not be what you intended.
01:09
To actually peel off just that " 3.9"
from the end, you would have to do something like a string splice, chopping the last four characters. Python 3.9 adds the .removesuffix()
function.
01:21
This does what you might expect. It removes the suffix. This is actually looking for " 3.9"
and only will remove that ending. And as it’s only attacking the suffix, it leaves the "3"
at the beginning of the sentence alone.
01:42
.removeprefix()
does the same thing, but at the beginning of the sentence. Notice, in this sentence, nothing’s removed, and that’s because " 3.9"
was not found at the beginning. So, unlike the .strip()
case, your "3 cool features in Python 3.9"
is left alone.
02:06
If I ask it to remove just the "3"
, the .removeprefix()
will do that. One thing to note: if there’s repetition in the string, only the first part gets removed.
02:16
Removing "ki"
from "Waikiki"
still leaves one "ki"
at the end. .removesuffix()
only matches once. If you do need to remove the same suffix multiple times, put .removesuffix()
inside of a while
loop.
02:31
A new library has been introduced into Python 3.9 called graphlib
. This library is used for doing what’s called a topological sort.
02:41
Consider the following graph that describes the dependency of the realpython-reader
library. In order to install realpython-reader
, you need to also install html2text
and feedparser
.
02:53
In order to install feedparser
, you have to install sgmllib3k
. If you consider this as a dependency graph and you wanted to figure out what order you could install things in for realpython-reader
to work, you would have to install sgmllib3k
before feedparser
, feedparser
before realpython-reader
, and html2text
before realpython-reader
.
03:17
Getting this linear order out of a graph is called topological sorting, and that’s what graphlib
does. Let’s look at that same thing in code. First, I have to construct the graph.
03:31
graphlib
expects you to do this as a dictionary. Each key in the dictionary represents part of the graph. The first key here, "realpython-reader"
, is the root of the graph, which has two children—"feedparser"
and "html2text"
.
03:47
Notice that that’s a set—not a dictionary—after the key. Because the feedparser
node also has children, you add another key, "feedparser"
. Once again, a set.
04:01
This set is {"sgmllib3k"}
—feedparser
’s dependency. It’s important to note that this needs to be a set. If you forget to use a set here, it will iterate over the string instead, and you’ll get weird little answers instead of what you’re actually looking for.
04:19 So, that’s my dictionary describing the graph.
04:24
Import the TopologicalSorter
from graphlib
.
04:30 Construct it, passing in the graph as a dictionary.
04:36
And then call the .static_order()
method. You get back a generator, which is good—that means it’s memory-efficient—but not very useful to look at inside of the REPL.
04:45
So let me do that again with a list()
…
04:50
and there’s the result. This tells you that the order you need to install these dependencies in is html2text
, sgmllib3k
, feedparser
, then realpython-reader
—the topological sort of this graph.
05:06
A couple quick things to note. First, this isn’t the only valid answer. 'html2text'
could come after 'feedparser'
, and this would still work. When you topologically sort a graph, the result does not have to be unique. Remember earlier, when I talked about the values in the graph being sets.
05:25
Suppose for a moment that you forgot to set the "feedparser"
value as a set. Because "sgmllib3k"
is a string, and because the TopologicalSorter
iterates over the content inside of the graph, it would treat the letters "s"
, "g"
, "m"
, "l"
, et cetera, each as nodes in the graph, and you wouldn’t get the correct result.
05:47 Just remember that if you go to use this library yourself.
05:52
In the next lesson, I’ll show you a couple of small changes to the math
library in GCD and the addition of LCM, and the changes to the HTTPStatus
object.
Become a Member to join the conversation.