Python mmap: Doing File I/O With Memory Mapping (Summary)
Memory mapping is an alternative approach to file I/O that’s available to Python programs through the mmap
module. Memory mapping uses lower-level operating system APIs to store file contents directly in physical memory. This approach often results in improved I/O performance because it avoids many costly system calls and reduces expensive data buffer transfers.
In this video course, you learned:
- What the differences are between physical, virtual, and shared memory
- How to optimize memory use with memory mapping
- How to use Python’s
mmap
module to implement memory mapping in your code
The mmap
API is similar to the regular file I/O API, so it’s fairly straightforward to test out. Give it a shot in your own code to see if your program can benefit from the performance improvements offered by memory mapping.
To learn more about the concepts you covered in this course, check out:
- Python documentation: mmap — Memory-mapped file support
- Strings and Character Data in Python
- Speed Up Your Python Program With Concurrency
- Speed Up Python With Concurrency
- Learning Path: Python Concurrency & Parallel Programming
Congratulations, you made it to the end of the course! What’s your #1 takeaway or favorite thing you learned? How are you going to put your newfound skills to use? Leave a comment in the discussion section and let us know.
00:00 In the previous lesson, I showed you how to use mmap to create shared memory. In this final lesson, I’ll summarize the course and highlight some points of further investigation.
00:11 In this course, you learned all about using mmap to map the contents of a file into a block of memory. Doing this can potentially give you a performance boost, but how much is variable dependent on your choice of OS and how much data is getting mapped.
00:27
You use an mmap object to access the mapped memory block as a byte array, which means things with Unicode strings can get a little tricky. And don’t forget, depending on your OS, how you create that object might be different. Once you’ve got the block, you can search inside of it using .find()
, .rfind()
, and regular expressions,
00:48
or you can use file-like operations, such as .seek()
, .tell()
, .read()
, .write()
, and others.
00:56
You can also use mmap to create a block of memory that can be shared between processes. In this case, it isn’t associated with a file on disk. The drawback to this method is you have to use the os.fork()
call, as the mmap object isn’t compatible with higher-level libraries like the multiprocessing module. On the other hand, it doesn’t have the data restrictions that the multiprocessing module has.
01:21 So depending on your situation, it might be your right
01:26 choice. For more information on mmap, you can dig into the Python docs. If you’d like to learn more about strings versus bytes, this course might be of interest.
01:36 Or if you’d like to dig into multiprocessing, here is an article, a course based on that article, and a whole learning path devoted to the topic. There’s lots of content there if you want to get your hands dirty. That’s all for me.
01:52 I hope you found the course useful. Thanks for your attention.
Christopher Trudeau RP Team on June 24, 2022
Hi @FooledByCode,
All depends on your size and performance needs. I see mmap
as an optimization – I’d always try to write some code first and see if I run into bottlenecks. If it turns out that I have some, then I go looking for ways of optimizing.
There are a lot of good data libraries out there for ML, you may be better off using one of those rather than trying to write your own based on mmap. Of course, if you’re doing it to learn the underlying tech, that can always be fun too :)
FooledByCode on June 25, 2022
Thanks for the response.
Become a Member to join the conversation.
FooledByCode on June 23, 2022
Thanks for this refresher. It reminded me of good old days when I use to work heavily with C and RTOS. Would this be a good idea, if I read a csv file to train some Machine Learning model? What are your thoughts? What would be the Pros and Cons?