Computers and Latency

Christopher Trudeau

Speed Up Python With Concurrency Christopher Trudeau 07:40

Transcript
Discussion (2)

00:00 In the previous section, I gave an overview of the course. In this section, I’m going to talk about latency and how that leads into I/O-bound concurrency.

00:10 Consider the basic parts of a computer. This is an oversimplification, but good enough for the level of our discourse. There’s a CPU where the math is done. This is the brain of the computer.

00:20 This is where the actual computation happens. Memory stores what is being worked on, and the CPU works with memory frequently—not only to find out what instruction to run next, but where it stores the data to work on. Memory is generally volatile and is gone once you turn the computer off, so for longer-term storage, there’s usually a device like a hard drive.

00:41 And then, finally, there is some set of peripherals. Peripherals are usually used for input and output. This includes things like network cards, video cards, and external devices like keyboards and mice.

00:53 In order for a program to run, the CPU must first fetch it from storage. The CPU sends an instruction down to the hard drive, asks for some data, and gets it back.

01:04 That information is pulled into the CPU. The CPU then sends that off to memory. In modern computers, there are ways of skipping the CPU to do this, which speeds things up, but for the purposes of this conversation, I’m going to keep things simple.

01:18 Once the program’s been loaded in memory, the CPU needs to get the next instruction from the memory and run that instruction inside of the CPU. That instruction often impacts peripherals—for example, sending something out onto the network.

01:33 The CPU sends information down to the peripheral card, and then the peripheral card itself sends information to the outside world. Each one of these components runs at different speeds, and this is where latency comes into effect.

01:47 Consider the lowly nanosecond. That’s one one billionth of a second. To give you a sense of pace, a Intel i7 can run about 100 instructions in 1 nanosecond.

01:59 This varies from computer to computer, but on the nanosecond scale, you’re talking about 10 to 100 instructions as pretty typical for a PC.

02:10 Now multiply that out by 100. That’s about how long it takes to talk to main memory. So every time the CPU needs to talk to the memory, you need to delay by about 100 nanoseconds. Again, modern computers have ways of speeding this up, like L2 caches, but for the purposes of what I’m talking about, let’s keep it simple.

02:31 Taking those 100 nanoseconds and grouping together. 10 of them gives you a microsecond or 1,000 nanoseconds. That’s about how long it takes to read 500 kilobytes from memory—a short program.

02:46 Multiplying that out by 100 and then by 10 again, gives you a millisecond. 1 millisecond is 1,000 microseconds. It takes about 2 milliseconds for a disk to seek.

02:57 So when you ask the hard drive to look for something, if the read head is not on that position right now, it takes about 2 milliseconds for the read head to be repositioned.

03:08 150 milliseconds is about the ping time from the East coast of the United States to Europe, so that’s a packet going out across the Atlantic and coming back.

03:18 There’s a huge difference in scale between the instruction level, memory level, disk level, and peripheral level in your computer. There can be a factor of a thousand or more between different steps in this stack. To try and put this in perspective, let’s think about this like a distance. Think about a single CPU instruction as a meter, or about a yard. For the purposes of this analogy, they’re about the same. To help you visualize, that’s about the height of a doorknob off the ground on a regular door.

03:47 This runs in 0.01 nanoseconds. In 1 nanosecond, you can run 100 CPU instructions of that Intel i7 that I mentioned earlier. That would be 100 meters or about 100 yards, which is roughly the length of an American football field or about the length of a soccer pitch—give or take the same thing plus or minus a few meters.

04:08 So that memory reference, which takes 100 nanoseconds—that’s 10 kilometers, or 6 miles. That’s a quarter of a marathon. The best marathoners in the world can run that in about half an hour.

04:21 3 microseconds, which is about how long it takes to read 1 megabyte from memory, is 300 kilometers or 186 miles. That’s three times the length of the Suez canal, so now you’re looking at large distances on the face of the Earth.

04:37 Going from memory to disk just makes that worse. Reading 1 megabyte from disk is 82,500 kilometers or 51,000 miles. That’s over twice the Earth’s circumference. That read time is only if the disk’s head is in the correct position and the megabyte being read is in order on the disk, i.e. it’s sequential information. If the head needs to move around, there’s a cost to do just that.

05:03 It takes about 2 milliseconds to do a disk seek. That’s 200,000 kilometers, 125,000 miles, or about half the distance between the Earth and the Moon, on average. And that ping time to Europe, 150 milliseconds?

05:19 Well, that’s 15 million kilometers or 10 million miles. That’s one 10th the distance to the Sun. The difference between a single instruction and a simple network call is an astronomical amount.

05:35 These differences are huge and hard to wrap your head around. Let me try it another way to see if I can just hit it home. Pretend that instead of an instruction taking fractions of a nanosecond, it took a full second. Reading that megabyte from RAM would take 2 hours and 47 minutes, or you can run 10,000 instructions in that time. That pesky disk seek? 6 years and 4 months, or 200 million instructions.

06:01 A seek and reading a megabyte? 8 years, 11 months—just shy of 9 years. 285 million instructions. And that ping time to Europe? 475 years, 8 months—or 15 billion instructions.

06:17 The gaps between the levels in the computing stack are phenomenally large. And just to make it that much more complicated, that Intel i7 that I said runs 100 instructions? Yeah, that’s a 8-year-old processor.

06:30 The modern ones are about three or four times that. Unfortunately, for the distances in latency it’s easier from a physics standpoint to increase the speed of a CPU than it is to increase the speed of network traffic. As a result, computer processors are getting faster and faster at a higher degree than the network traffic is getting faster. As CPUs get better and better, the latency difference between performing an instruction and going out to the network is getting more extreme, not less.

07:02 This is why most programs are I/O-bound. If you write a program that first accesses RAM, hundreds of instructions could be run in the time that it’s waiting. If it needs to access disk, that can be tens of thousands or millions of instructions before the program is ready to run again. And if you have to access the network, it’s billions of instructions.

07:23 This pattern is very common. In all likelihood, your program is I/O-bound. It spends more time waiting than it does computing.

07:33 Next up, I’ll talk about the types of concurrency and how to take advantage of this latency.

Ariba S on March 31, 2025

I enjoyed the distance analogy to latency! First time seeing it described like so.

Bartosz Zaczyński RP Team on March 31, 2025

@Ariba S I think Christopher’s explanation might’ve been inspired by Grace Hopper’s classic lecture on nanoseconds, which you can watch on YouTube: www.youtube.com/watch?v=9eyFDBPk4Yw

Become a Member to join the conversation.