Python mmap: Doing File I/O With Memory Mapping (Overview)
The Zen of Python has a lot of wisdom to offer. One especially useful idea is that “There should be one—and preferably only one—obvious way to do it.” Yet there are multiple ways to do most things in Python, and often for good reason. For example, there are multiple ways to read a file in Python, including the rarely used mmap
module.
Python’s mmap
provides memory-mapped file input and output (I/O). It allows you to take advantage of lower-level operating system functionality to read files as if they were one large string or array. This can provide significant performance improvements in code that requires a lot of file I/O.
In this video course, you’ll learn:
- What kinds of computer memory exist
- What problems you can solve with
mmap
- How use memory mapping to read large files faster
- How to change a portion of a file without rewriting the entire file
- How to use
mmap
to share information between multiple processes
00:00 Welcome to Python mmap: Doing File I/O With Memory Mapping. My name is Christopher, and I will be your guide. This course is about the mmap library, a wrapper to a fairly low-level operating system call that maps the contents of a file into memory.
00:17 In this course, you will learn about how to map a file on disk into a memory block and why you might want to do that, reading and writing to and from said memory block, and how to use this same library for sharing memory between processes.
00:33 A quick note on versions: all code demonstrated here was tested with Python 3.10 on macOS. If you’ve taken one of my courses before, you’ll know that I don’t usually bother telling you about the operating system.
00:46 In this case, the mmap library has some operating system-specific variations. So my OS is a bit more important. Don’t worry. You’ll be able to follow along either way.
00:56 I mostly stick to the stuff that is common to all operating systems, and will point out the differences between Unix-like worlds and Windows worlds when they’re important. mmap has been around for a long time and pretty much mimics the underlying call in C that it is based upon.
01:12 So that whole Python 3.10 thing isn’t too important. I do use f-strings in some demo code, but otherwise, this could go back to the dawn of Python time.
01:24 mmap is based on a very low-level call to your operating system, which maps directly to a call in a C library for system memory management. Its purpose is to map the contents of a file on disk into a block of memory so that anything you do to that block of memory is reflected in the file.
01:43 That’s an oversimplification. There are multiple modes of doing work, but the primary purpose typically is write to the block of memory, have it reflected in the file.
01:53 Why would you do this? In the Python world, usually when you’re mucking about with a file, you’re loading it to into some sort of Python object representation. This is typically either a string, a byte buffer, or something similar. Direct memory mapping is a lot closer to the underlying file.
02:10 There’s no intermediate representation. This often means you can get a performance boost. It usually means less memory because the Python object requires multiple copies of things going into memory. For example, the file might get buffered when it is read before being put into the Python object. And in most cases, you may also get a speed improvement. Since this is tied so close to an OS call, the performance boost in memory usage and speed is tied directly to the OS.
02:40 That means the boost you get may be different than the boost I get—or worse, might be different between subsequent calls due to things like caching.
02:50 In addition to all of this, the mmap library can also be used to share memory between processes.
02:58 Next up, I’ll give a bit of background into the inner workings of your computer and how that affects memory and file I/O. If CPUs, memory, and files are your bread and butter, feel free to skip this one.
Become a Member to join the conversation.