In the previous lesson, I showed you the danger of race conditions. In this lesson, I’ll be discussing the
asyncio Python library.
asyncio was introduced in Python 3.4, and there are changes in the library as recent as 3.7. The
threading library shown two lessons ago is dependent on threads supported by the operating system.
asyncio is Python-specific.
00:26 It is independent of the operating system threading mechanism. This gives you finer-grained control over your concurrency. The concurrency model uses something called event loops and coroutines.
00:38 There is a main loop that happens inside of Python, which controls the order of things happening, and your coroutine is the code that is being run. Inside of the code, you signal that you no longer need control and you give up execution.
00:53 The event loop then schedules the next coroutine. This is cooperative concurrency. If one of your tasks doesn’t give up its turn, the other tasks starve.
01:03 But seeing as all of this is happening inside of your Python code, you aren’t fighting with some other program having to be cooperative. It’s your own code you are in control over, so as long as you write good coroutines you won’t have a problem.
asyncio is built on two keywords:
async indicates that code is to be run asynchronously, and the
await keyword is the cooperative signal that your coroutine is willing to give up execution control.
The biggest challenge with
asyncio is that it’s still fairly new. To take advantage of it, the libraries you are using have to support it. As an example, the
requests library doesn’t currently support
The threaded web fetcher that I showed you in an earlier lesson needs to be rewritten using a different library to achieve the same thing in
aiohttp is a library similar to
requests, but it is
asyncio aware. I’ll be using this library in the following examples. You’ll need to do a
pip install to make it available to yourself.
02:08 It’s not part of the Python standard library.
This is the
asyncio version of the web downloading program that I showed you earlier in both synchronous and threaded varieties. The concept’s the same.
The web page at Jython and our web page at Real Python are being downloaded 80 times each. Line 6 is the definition of
download_site(). This is the part of the code that will be run concurrently.
The function takes two parameters: a
session, which is an object from the
aiohttp library, and the
url to download. Notice the
async keyword at the beginning of the function indicating that it is going to operate asynchronously. Line 7 is a context manager using the
This is what actually fetches the web page. The context manager is also marked as
async. Inside of the body, the
response object is populated.
And like before, I’m printing out a
"J" or an
"R" depending on the
url and printing something to the screen so you can see the downloads happening. Line 11 is the
download_all_sites() method, similar to the one in the previous code. Like the
download_site() mechanism, this is also marked as
async, to indicate that asynchronous code is happening within it.
Line 12 is what creates the
session object that is going to be passed into the
asyncio library divides things that are being executed into a concept of tasks. You can examine the tasks as they are running, and so they’re collected and returned.
03:47 Line 13 creates a list where these tasks are going to be stored.
Line 15 is a future. This is similar to the concept of the pool. The
download_site() method gets called inside of the asynchronous mechanism, and a task is returned. The
download_site() method is the coroutine—this is where the execution happens.
ensure_future() method makes the event loop aware that this code needs to be run. It returns immediately with a task, and eventually the event loop will run it.
The task itself can then be stored in the list, so at a future time you can check what is running, what isn’t running, and if anything fails. Line 18 is the
gather() method. This is similar to the
.join() method in the
asyncio to wait here until all of these tasks are complete. The list of
tasks is passed in and
return_exceptions changes the behavior of this method if something goes wrong while the task is being executed.
I’ll come back to this later. Now let me just scroll down so you can see the
04:58 The list at line 21 is the same as before.
05:01 Like before, I’m printing out a message and starting a timer. Lines 28 and 29 are the key parts of executing the event loop and calling the asynchronous methods.
Line 28 gets a handle to the event loop, and line 29 schedules the asynchronous
download_all_sites() method and says that it should run this method until it completes.
Line 29 is where the futures get registered, and then where the
gather() method is called. And once all of the futures have been performed, the gathering finishes and this line would return. Line 30 and 31 are similar to before, that it finishes the timer and then prints out the result. Here it is in practice.
Once again, concurrency is significantly speeding up the program. Notice the pattern of
R. Like the
threading library, you have an inconsistent back and forth between the downloads.
You don’t have
JRJR like synchronous. The event loop determines when the futures happen and how much time gets allocated for each of the sections of coroutines.
Let’s drill down a little more on the call to
gather(). Notice the
return_exceptions parameter. In the code, I set this to
True. The value of this parameter indicates what to do if something goes wrong inside of the coroutine. Your choices are
return_exceptions=False, which means the exception will filter up and stop your program, similar to a synchronous execution. This is the default behavior.
Or you can register the exception inside of the task object.
return_exceptions=True means to return the exception into the task object. There’s pros and cons between these different mechanisms.
06:56 The default allows you to catch problems inside of your code, because the exceptions happen like you would expect them to. But what if the exception is being caused by just one execution of one of the coroutines. Do you want everything else to die?
And that’s why this is a choice. By selecting
return_exceptions=True, all of the tasks can operate and then you can introspect them afterwards to see whether or not some succeeded or failed.
If you’re creating a new asynchronous program, I would recommend leaving the default behavior on until you’ve debugged the code, then switch to
return_exceptions=True while you’re running your actual code so that you aren’t interrupted if something goes wrong inside of only one of your coroutines. At any time during execution you can call the
.result() method on the
07:47 If the task executed successfully, then you will get back whatever the coroutine returned.
If the task caused an exception, then that exception will be fired when you call the
.result() method. You can also request that the event loop cancel running tasks.
If you’ve done that, then this
Task.result() method will return the
CancelledError exception. If you call the method too early and the task hasn’t finished operating and hasn’t been canceled, you’ll get an
There are other methods on the
Task object that allow you to see whether or not it’s been canceled, whether or not it’s finished, and to introspect that as you go along in your code.
threading library and the
asyncio library solve similar problems—they’re both for concurrency in I/O-bound functions.
So, why would you choose one over the other? Because
asyncio concurrency is managed by Python and it doesn’t have as much overhead as the threads in an operating system, it generally tends to outperform threads. That being said, as you’ve seen from the code, it’s a little more complicated to actually implement.
And as I mentioned earlier,
asyncio is still new. Your favorite library might not support it yet. Changes are coming, and it’s being implemented slowly across a lot of libraries, but your mileage may vary.
If a lot of the coders that you’re working with are used to other languages besides Python, the
threading library’s approach is probably more similar to what they’re accustomed. Finally,
threading is preemptive.
asyncio is cooperative. If you’re in complete control of the code, this probably doesn’t matter all that much, but it’s an implementation detail that you may want to pay attention to because it could affect the design of your code. Up until now, I’ve been showing you concurrency in I/O-bound situations that run on a single processor. In the next lesson, I’ll show you how to use multiple processes.
Become a Member to join the conversation.