asyncio Version

Speed Up Python With Concurrency Christopher Trudeau 09:50

Transcript
Discussion

00:00 In the previous lesson, I showed you the danger of race conditions. In this lesson, I’ll be discussing the asyncio Python library.

00:09 asyncio was introduced in Python 3.4, and there are changes in the library as recent as 3.7. The threading library shown two lessons ago is dependent on threads supported by the operating system. asyncio is Python-specific.

00:26 It is independent of the operating system threading mechanism. This gives you finer-grained control over your concurrency. The concurrency model uses something called event loops and coroutines.

00:38 There is a main loop that happens inside of Python, which controls the order of things happening, and your coroutine is the code that is being run. Inside of the code, you signal that you no longer need control and you give up execution.

00:53 The event loop then schedules the next coroutine. This is cooperative concurrency. If one of your tasks doesn’t give up its turn, the other tasks starve.

01:03 But seeing as all of this is happening inside of your Python code, you aren’t fighting with some other program having to be cooperative. It’s your own code you are in control over, so as long as you write good coroutines you won’t have a problem.

01:17 asyncio is built on two keywords: async and await. async indicates that code is to be run asynchronously, and the await keyword is the cooperative signal that your coroutine is willing to give up execution control.

01:33 The biggest challenge with asyncio is that it’s still fairly new. To take advantage of it, the libraries you are using have to support it. As an example, the requests library doesn’t currently support asyncio.

01:45 The threaded web fetcher that I showed you in an earlier lesson needs to be rewritten using a different library to achieve the same thing in asyncio.

01:55 aiohttp is a library similar to requests, but it is asyncio aware. I’ll be using this library in the following examples. You’ll need to do a pip install to make it available to yourself.

02:08 It’s not part of the Python standard library.

02:12 This is the asyncio version of the web downloading program that I showed you earlier in both synchronous and threaded varieties. The concept’s the same.

02:22 The web page at Jython and our web page at Real Python are being downloaded 80 times each. Line 6 is the definition of download_site(). This is the part of the code that will be run concurrently.

02:34 The function takes two parameters: a session, which is an object from the aiohttp library, and the url to download. Notice the async keyword at the beginning of the function indicating that it is going to operate asynchronously. Line 7 is a context manager using the session object’s .get() method.

02:56 This is what actually fetches the web page. The context manager is also marked as async. Inside of the body, the response object is populated.

03:06 And like before, I’m printing out a "J" or an "R" depending on the url and printing something to the screen so you can see the downloads happening. Line 11 is the download_all_sites() method, similar to the one in the previous code. Like the download_site() mechanism, this is also marked as async, to indicate that asynchronous code is happening within it.

03:28 Line 12 is what creates the session object that is going to be passed into the download_site() function.

03:35 The asyncio library divides things that are being executed into a concept of tasks. You can examine the tasks as they are running, and so they’re collected and returned.

03:47 Line 13 creates a list where these tasks are going to be stored.

03:53 Line 15 is a future. This is similar to the concept of the pool. The download_site() method gets called inside of the asynchronous mechanism, and a task is returned. The download_site() method is the coroutine—this is where the execution happens.

04:08 The ensure_future() method makes the event loop aware that this code needs to be run. It returns immediately with a task, and eventually the event loop will run it.

04:20 The task itself can then be stored in the list, so at a future time you can check what is running, what isn’t running, and if anything fails. Line 18 is the gather() method. This is similar to the .join() method in the threading example.

04:36 This tells asyncio to wait here until all of these tasks are complete. The list of tasks is passed in and return_exceptions changes the behavior of this method if something goes wrong while the task is being executed.

04:51 I’ll come back to this later. Now let me just scroll down so you can see the __main__ code.

04:58 The list at line 21 is the same as before.

05:01 Like before, I’m printing out a message and starting a timer. Lines 28 and 29 are the key parts of executing the event loop and calling the asynchronous methods.

05:14 Line 28 gets a handle to the event loop, and line 29 schedules the asynchronous download_all_sites() method and says that it should run this method until it completes.

05:27 Line 29 is where the futures get registered, and then where the gather() method is called. And once all of the futures have been performed, the gathering finishes and this line would return. Line 30 and 31 are similar to before, that it finishes the timer and then prints out the result. Here it is in practice.

05:51 Once again, concurrency is significantly speeding up the program. Notice the pattern of J and R. Like the threading library, you have an inconsistent back and forth between the downloads.

06:04 You don’t have JRJR like synchronous. The event loop determines when the futures happen and how much time gets allocated for each of the sections of coroutines.

06:16 Let’s drill down a little more on the call to gather(). Notice the return_exceptions parameter. In the code, I set this to True. The value of this parameter indicates what to do if something goes wrong inside of the coroutine. Your choices are return_exceptions=False, which means the exception will filter up and stop your program, similar to a synchronous execution. This is the default behavior.

06:42 Or you can register the exception inside of the task object. return_exceptions=True means to return the exception into the task object. There’s pros and cons between these different mechanisms.

06:56 The default allows you to catch problems inside of your code, because the exceptions happen like you would expect them to. But what if the exception is being caused by just one execution of one of the coroutines. Do you want everything else to die?

07:10 And that’s why this is a choice. By selecting return_exceptions=True, all of the tasks can operate and then you can introspect them afterwards to see whether or not some succeeded or failed.

07:23 If you’re creating a new asynchronous program, I would recommend leaving the default behavior on until you’ve debugged the code, then switch to return_exceptions=True while you’re running your actual code so that you aren’t interrupted if something goes wrong inside of only one of your coroutines. At any time during execution you can call the .result() method on the Task object.

07:47 If the task executed successfully, then you will get back whatever the coroutine returned.

07:54 If the task caused an exception, then that exception will be fired when you call the .result() method. You can also request that the event loop cancel running tasks.

08:05 If you’ve done that, then this Task.result() method will return the CancelledError exception. If you call the method too early and the task hasn’t finished operating and hasn’t been canceled, you’ll get an InvalidStateError.

08:19 There are other methods on the Task object that allow you to see whether or not it’s been canceled, whether or not it’s finished, and to introspect that as you go along in your code.

08:29 The Python threading library and the asyncio library solve similar problems—they’re both for concurrency in I/O-bound functions.

08:38 So, why would you choose one over the other? Because asyncio concurrency is managed by Python and it doesn’t have as much overhead as the threads in an operating system, it generally tends to outperform threads. That being said, as you’ve seen from the code, it’s a little more complicated to actually implement.

09:00 And as I mentioned earlier, asyncio is still new. Your favorite library might not support it yet. Changes are coming, and it’s being implemented slowly across a lot of libraries, but your mileage may vary.

09:12 If a lot of the coders that you’re working with are used to other languages besides Python, the threading library’s approach is probably more similar to what they’re accustomed. Finally, threading is preemptive.

09:25 asyncio is cooperative. If you’re in complete control of the code, this probably doesn’t matter all that much, but it’s an implementation detail that you may want to pay attention to because it could affect the design of your code. Up until now, I’ve been showing you concurrency in I/O-bound situations that run on a single processor. In the next lesson, I’ll show you how to use multiple processes.

Become a Member to join the conversation.