For more information on concepts covered in this lesson, you can check out:
In the previous lesson, I spoke about the new type hinting and annotation features of Python 3.10. In this lesson, I’ll show you two new standard library functions,
anext(), which provide support for asynchronous iteration.
You may have come across the
next() functions in Python. Used together, these create an iterator and get the next item out of it.
This is actually the underlying mechanism for the
for loop. The
for loop creates an iterator on the object being looped over, and each instance of the loop calls
next() on that iterator.
The next function raises a
StopIteration exception when there is nothing left to iterate on. The
for loop catches this exception and exits the code block. Python 3.5 introduced two new keywords,
await, to implement coroutines.
A coroutine is a way of doing asynchronous code execution also known as parallel execution. Coroutines are an alternative to using the
threading library. To write a coroutine, you must declare a function using the
An asynchronous function has some restrictions, the key one being it can’t call synchronous code. Otherwise, it just becomes synchronous again. And you’ll never guess what kind of code
next() are. Yep, they’re synchronous.
That means that up until Python 3.10, using an iterator inside your asynchronous code implicitly created a synchronous block. This was limiting. Python 3.10 has introduced two new functions,
a meaning asynchronous. Using these, you can now create asynchronous iterators with
aiter() and get items from them with
anext() inside of your async functions.
Writing asynchronous code is more complicated than writing synchronous code. The notes below have links to entire courses on this subject. The only way to demonstrate
anext() is to do so inside of some asynchronous code. I’m going to do that in just a second, but if this isn’t your wheelhouse, feel free to skip to the next lesson.
02:13 The purpose of asynchronous code is to do multiple things at once. There are two kinds of parallelism available on your computer: multi-CPU and I/O bound. As the name implies, multi-CPU executes different chunks of code on different processors. I/O bound is different.
02:31 It still runs on one processor but swaps between different chunks of code at a time. Accessing the disk or network is actually very slow in comparison to most CPU operations, which means synchronous code tends to sit around waiting a lot.
02:48 Coroutines are I/O bound parallelism. They operate on a single CPU but allow you to run a second code block while the first is waiting on input from disk or network.
The example I’m going to show you reads in multiple files from the disk at a time and counts the number of newlines in each file. It uses a third-party library called
aiofiles, so if you’re coding along with me, you’ll need to run
pip install. As always, it is best practice to do this in a virtual environment.
Here is the asynchronous line counting code. First off, it needs to import
asyncio. This library is used to manage each of the coroutines that I’ll be creating.
aiofiles library provides alternate implementations of file operations, like
open(), that are asynchronous. Let me just scroll down to the bottom here.
03:44 This code is a bit easier to understand if you start with the execution.
run() function of the
asyncio library takes an asynchronous function and executes it. This encapsulates the coroutine mechanism and the underlying threading structure.
On line 30, I’m running the
all_files() async function, passing it whatever arguments were sent in on the command line.
all_files() able to be asynchronous is the
async keyword attached to the function declaration. This function is responsible for setting up all the coroutines and then waiting for them to complete. The
for block starting on line 22 loops through the filenames passed in on the command line, and line 23 creates a coroutine for each of them. The coroutine is also an async function, this one called
count_lines() takes a single filename as a parameter. So, if ten files are passed in on the command line, then ten coroutines are created—one for each file.
04:48 The coroutines do the work of counting newlines and then return. The process of creating the coroutine returns almost immediately. Because it’s asynchronous, it doesn’t wait for the wrapped function to return.
Line 24 appends the newly created coroutine into a list so that you can track all of the coroutines that are currently running. Now, the
await indicates that this is a boundary between asynchronous and synchronous code.
gather() function takes all the task coroutines that were generated and says to wait here until all of them have finished executing.
Let’s look at the coroutine that counts the newlines in an individual file. Line 6 declares
count_lines() and indicates that it also is an asynchronous function.
Line 9 is an asynchronous context manager using the
open() function from the
aiofiles library. This is an asynchronous replacement for Python’s
open(). It does the same thing, namely opening a file, but it does that in a fashion that you can do asynchronous operations on it.
05:57 Note that this file is being opened in binary mode. Although the code is looking for newlines, and so really is only meaningful with text files, there is a gotcha here. Python by default opens text files using the standard file encoding for the operating system. On Linux and Mac, that’s UTF-8.
On Windows, it varies by platform. If a text file isn’t UTF-8 and you open it as UTF-8, you’ll crash. Instead, then, you open the file in binary mode and that avoids this problem. Line 10 uses the new
aiter() function to create an asynchronous iterator based on the contents of the newly opened file.
Iterating on a file opened by the
aiofiles.open() function will split it based on lines, meaning it’s looking for the newline character. Inside the
anext() is called on the iterator, getting the next line. With each line, the counter is incremented.
And then, like with the synchronous iterator,
anext() raises an exception if the iterator is done. Instead of being a
StopIteration exception, it rather appropriately raises a
StopAsyncIteration exception. In this case, nothing needs to be done when the iterator is empty, so the code breaks out of the infinite loop.
The final action of
count_lines() is to print a result. The extra parameters to
print() ensure that each coroutine prints on the same line and flushes the print buffer immediately. Okay, let’s run this sucker.
Here, I’m counting the newlines in all the PDF files in my
Downloads/ directory. PDF files are actually binary, but because of the UTF-8 assumption forcing the file to be read in binary mode, that doesn’t cause a problem. And although they are binary, they do have strings inside of them, which contain newlines, so there’s something to count.
08:01 The numbers are generally going to go up as the code executes as the bigger the file, the longer it takes for the coroutine to run. To demonstrate that this is all happening asynchronously, let me run the code again.
If you follow along on the output from the two executions, you’ll note that the numbers are showing up in a different order. The first case of this is nine numbers in, where
681 have changed place.
08:31 They’re in a different order because there’s no guarantee of execution order with asynchronous code. This is one of the many things that makes debugging parallelism that much more difficult than synchronous code. Parallel coding always breaks my brain just a little bit. Time for something lightweight and breezy. Ooh, statistics?
Become a Member to join the conversation.