Locked learning resources

Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Locked learning resources

This lesson is for members only. Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Speed Up Python With Concurrency (Summary)

You’ve got the understanding to decide which concurrency method you should use for a given problem, or if you should use any at all! In addition, you’ve achieved a better understanding of some of the problems that can arise when you’re using concurrency.

In this course, you’ve learned how to:

  • Understand how latency between the CPU and compoments of your computer provide opportunities for concurrency
  • Use the threading library to write concurrent programs
  • Write code using async and await with the asyncio library
  • Get full use of all your CPUs with the multiprocessing library
  • Distinguish between I/O bound and CPU bound workloads

Here are resources for additional information about latency:

Here are resources about concurrency and the Python GIL:

Here are resources about PEP 554 and Subinterpreters:

Here are resources about threading, futures and asyncio:

Here are resources about multiprocessing and distributed programming:


Sample Code (.zip)

8.2 KB

Course Slides (.pdf)

1.6 MB

00:00 In the previous lesson, I talked about the difference between I/O-bound and CPU-bound workloads. In this final topic, I’ll summarize the contents of the course and point you at some further reading.

00:12 Different programs use your computer in different ways. A lot of software is I/O-bound, spending most of its time waiting for the disk or network in comparison to the amount of cycles available to do computation. Some kinds of problems are CPU-bound, meaning they spend most of their time using the CPU to do work.

00:33 A large amount of latency is involved in getting off the CPU and going to memory, even more to disk, and even more to going to the network. Single processor computers are able to look like they’re doing multiple things at a time because they’re quickly switching back and forth between programs, taking advantage of this latency. This simultaneous work is called concurrency.

00:55 This course introduced you to a number of patterns to help you think about how the concurrency works. The first was pipes, starting with a producer responsible for feeding data into the computation, having a worker to work on that data and do the computation, and then having a consumer that consolidates the output of the worker.

01:16 The N-workers pattern uses a similar model, but multiplies the number of workers. This pattern is particularly useful in CPU-bound computing. Having one worker for each CPU can drastically speed up your program.

01:31 In the N-workers pattern, the producer breaks up the data and passes chunks on to each of the workers. The broadcast pattern is a variation on this, where the producer sends all of the data to each of the workers and the workers themselves decide what to work on.

01:48 Python includes three different modules to meet your concurrency needs in the standard library. The first is threading, which helps you do I/O-bound processing and is tied to the threads inside of your operating system. The second is asyncio, which is an event loop and coroutine mechanism that is similar to threading, but is completely contained inside of the Python interpreter and isn’t operating system-dependent.

02:13 And last, is the multiprocessing library that allows you to spin up multiple interpreters across the CPUs on your computer.

02:21 When thinking about concurrency in your software, the first thing you need to do is decide whether or not you really need it. There’s additional overhead and more code necessary just to manage the concurrency, so make sure that you’re actually going to benefit before you write that code.

02:37 If you are going to use concurrency, determine whether or not your problem is an I/O-bound problem. If it is, then threading or asyncio would be your answer.

02:47 If it isn’t, then you need to use the multiprocessing library. In the case of an I/O-bound program, you should prefer asyncio over threading if you can. It tends to be more efficient and requires less overhead.

03:00 Not all libraries support asyncio, so this decision may actually be made for you, depending on your third-party library needs. Finally, be careful with your concurrent program as to how you’re dealing with memory and the interactions between the parallel portions of your software. threading and asyncio use the same interpreter, so you have to be careful about race conditions messing up your results. In the multiprocessing situation, you don’t have this problem, but you need to do extra work to get the different processes talking to each other and sharing values.

03:32 If you’d like to learn more about latency inside of software, these two articles are helpful. The second article is the original, and the first article was written by somebody else doing an update on the numbers.

03:44 This is where I got a lot of the data about component timing inside of the computer. The general purpose Concurrency article in Wikipedia gives you a high-level introduction to the topic and points you to different models and ways of thinking about it from a computer science perspective.

04:00 If you want to learn more about the GIL, there’s an article available on Real Python, or you can go to the Python Wiki to see some internals. If you’re interested in the subinterpreters, PEP 554 has the proposed changes, and this article on Medium discusses the pros and cons of the approach.

04:20 If you’re interested in threading without using the futures library, the Python docs is probably the best place to start, or you can read this introduction on Real Python. Generally, I wouldn’t recommend using the old school methods. Take advantage of concurrent.futures if you can.

04:37 More information on these can also be found in the documentation. If you want to dig into asyncio, here’s a good article introducing you to the concepts, and this conversation on Stack Overflow goes into great detail about how it actually works. This is the link to the multiprocessing library, and this is an excellent article that introduces you to the different concepts.

05:00 Finally, if you want to up your concurrency game, there’s nothing like making things concurrent across multiple computers. This is referred to as distributed computing.

05:09 This used to be something that was extremely difficult to do unless you had a rack full of servers available to yourself. Now with the advent of Amazon Web Services, Google Cloud Platform, Azure, and other services like it, you have access to someone else’s large warehouse filled with computers.

05:26 This page at the Python Wiki shows different tools that you can use for doing distributed programming. And then finally, Dask and Celery are two common Python libraries that you can use to attack these kinds of problems.

05:41 Thanks for your attention. I hope this course has been useful for you.

Avatar image for frankhofstede

frankhofstede on Dec. 15, 2020

I think the celery link is broken.

Avatar image for Chris Bailey

Chris Bailey RP Team on Dec. 15, 2020

Hi @frankhofstede, It looks like that link it is currently down, not sure why. But these links may work to get you more information on celery, and how to use it.



Avatar image for Lin Gao

Lin Gao on Dec. 27, 2020

Nice course! Thanks for the course I finally understand the difference between threads/processes and when should we take advantage of multi-thread/multiprocessing. One suggestion based on my experience taking a high-performance-computing course before: show in depth how to tune the number of threads/processes using a systematic approach.

Avatar image for blackray

blackray on Dec. 28, 2020

Very nice write up. This is one of those advance python topics that is must read for data engineers.

Avatar image for Christopher Trudeau

Christopher Trudeau RP Team on Dec. 28, 2020

Hi Lin,

Yeah, that’s a tough topic and is a little bit black-magical. Process-wise you typically don’t want to exceed the number of processors on your machine. Thread-wise it depends on how IO bound your computation is. You also have to factor in the extra complexity in your code and the overhead of inter-concurrent communication.

This could make a good article topic on its own. …ct

Avatar image for Pavlo Kurochka

Pavlo Kurochka on Dec. 29, 2020

Excellent course. I finally got a comprehensive and current overview of the options and the reasoning behind choosing one over the other. Code samples are great too.

Avatar image for danP

danP on Dec. 1, 2021

Awesome lesson, with the added benefit of finally understanding what the GIL actually does!

Avatar image for brunobutter

brunobutter on July 13, 2022

Excelent course

Avatar image for Christopher Trudeau

Christopher Trudeau RP Team on July 14, 2022

Glad you liked it @brunobutter!

Avatar image for Rakshit Patel

Rakshit Patel on Jan. 4, 2023

Very good information. I always had confusion(don’t know why) on I/O bound and CPU bound programs. It is a simple concept but somehow I never got my head around that, representing/implementing it in a program. This course has explained it in a very simple yet effective manner.

Avatar image for Christopher Trudeau

Christopher Trudeau RP Team on Jan. 4, 2023

Glad you found it useful Rakshit. Modern hardware and operating systems abstract so much away that it is easy to think of it as a big black box. I started back in the old command-line only days where it was a bit more obvious what the machine was doing. Although it might have been more obvious, today’s machine power is definitely the better situation. Happy coding!

Avatar image for grenait

grenait on Feb. 15, 2023

Great course!

How typical/untypical is a mix of threaded/asyncio and multiprocessing?

E.g.: At work, I’ve created a program which uses threads to receive data via the network and adds this data to a queue. Another thread reads out this information and saves that when the buffer reaches a specific length. Would you rather keep that a thread, or even have a multi-process? Saving on disc is also a I/O bound method, but in this circumstance, it might improve the speed of the program/saving when it is done on a different core.

What do you think?

Avatar image for Christopher Trudeau

Christopher Trudeau RP Team on Feb. 15, 2023

Hi @grenait,

I don’t come across a mix all that often. It is hard to know in your case whether it would make a difference. You’ve got three things going on:

  1. Saturation of your network
  2. Saturation of your disk I/O
  3. Saturation of your processor

If your situation is causing the third case, then adding multi-processing will help. My gut though is that the first two will happen before the third.

Before creating the extra work, I’d do some profiling on your code. If you’re finding you’re pegging the processor then this is a worthy experiment. If the code is mostly waiting on system calls (disk and network) then adding another thread will be good enough.

Adding multi-processing will mean you have to change how inter-communication is done. A lot of this will depend on your setup. For example, if you’re queue is in a database, I might not even bother with multi-processing, just kick off a separate program that is fed off the queue. You end up in the same place without the added coding.

Good luck with it!

Avatar image for Hans Geukens

Hans Geukens on Aug. 16, 2023

Thank you for the right balance of being brief on this versatile subject but sufficiently documented to make the first design choices for your own project case and start seeing the benefits. The links at the end help those who believe they need more input first > everybody happy :-)

My intention is to develop a client-server solution to keep updated on what different ‘alike’ test benches of our department in different locations/countries are doing in ‘real-time’ (once every 10s?). Are they running? What test are they performing? What’s the actual reason if they are not running, and how long did it take to restart?

Avatar image for Christopher Trudeau

Christopher Trudeau RP Team on Aug. 16, 2023

Hi Hans,

Glad you liked the course. You might not need “real” concurrency to solve the problem you’re tackling. One possible design for you:

Have a small script that checks the state of things (what is running, etc) and have it update a database (sqlite will be good enough).

Then have a separate script that uses either Flask or FastAPI to serve content out of the database.

This will run concurrently as your OS will be spawning the “add data” script in a separate process from your server. Doing it this way means you can also use “cron” to invoke your “add data” script, so you don’t have to write any code for scheduling, and the database will take care of any row-locking caused by concurrent access.

I like concurrent programming, but I try to always make it a last resort. Good luck with it. …ct

Avatar image for Tony Ngok

Tony Ngok on Feb. 6, 2024

Can I say that frontend programming is basically I/O bound, while backend programming is basically CPU bound?

Avatar image for Bartosz Zaczyński

Bartosz Zaczyński RP Team on Feb. 6, 2024

@tonyn1999 The distinction between I/O-bound and CPU-bound types of work pertains to tasks carried out by a computer. Programming is done by people, on the other hand.

Both front-end and back-end programming involve the two types of tasks. For example, a microservice on the backend handles user requests or reads from a database, which are I/O-bound tasks, but also processes complex business logic, which is a CPU-bound task. Conversely, a front-end component might fetch data from an API, which is an I/O-bound task, while rendering complex visuals in a browser is a CPU-bound task.

Become a Member to join the conversation.