Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

This lesson is for members only. Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Hint: You can adjust the default video playback speed in your account settings.
Hint: You can set the default subtitles language in your account settings.
Sorry! Looks like there’s an issue with video playback 🙁 This might be due to a temporary outage or because of a configuration issue with your browser. Please see our video player troubleshooting guide to resolve the issue.

Python mmap: Doing File I/O With Memory Mapping (Summary)

Memory mapping is an alternative approach to file I/O that’s available to Python programs through the mmap module. Memory mapping uses lower-level operating system APIs to store file contents directly in physical memory. This approach often results in improved I/O performance because it avoids many costly system calls and reduces expensive data buffer transfers.

In this video course, you learned:

  • What the differences are between physical, virtual, and shared memory
  • How to optimize memory use with memory mapping
  • How to use Python’s mmap module to implement memory mapping in your code

The mmap API is similar to the regular file I/O API, so it’s fairly straightforward to test out. Give it a shot in your own code to see if your program can benefit from the performance improvements offered by memory mapping.

To learn more about the concepts you covered in this course, check out:

Download

Sample Code (.zip)

15.7 KB

Download

Course Slides (.pdf)

1.5 MB

00:00 In the previous lesson, I showed you how to use mmap to create shared memory. In this final lesson, I’ll summarize the course and highlight some points of further investigation.

00:11 In this course, you learned all about using mmap to map the contents of a file into a block of memory. Doing this can potentially give you a performance boost, but how much is variable dependent on your choice of OS and how much data is getting mapped.

00:27 You use an mmap object to access the mapped memory block as a byte array, which means things with Unicode strings can get a little tricky. And don’t forget, depending on your OS, how you create that object might be different. Once you’ve got the block, you can search inside of it using .find(), .rfind(), and regular expressions,

00:48 or you can use file-like operations, such as .seek(), .tell(), .read(), .write(), and others.

00:56 You can also use mmap to create a block of memory that can be shared between processes. In this case, it isn’t associated with a file on disk. The drawback to this method is you have to use the os.fork() call, as the mmap object isn’t compatible with higher-level libraries like the multiprocessing module. On the other hand, it doesn’t have the data restrictions that the multiprocessing module has.

01:21 So depending on your situation, it might be your right

01:26 choice. For more information on mmap, you can dig into the Python docs. If you’d like to learn more about strings versus bytes, this course might be of interest.

01:36 Or if you’d like to dig into multiprocessing, here is an article, a course based on that article, and a whole learning path devoted to the topic. There’s lots of content there if you want to get your hands dirty. That’s all for me.

01:52 I hope you found the course useful. Thanks for your attention.

FooledByCode on June 23, 2022

Thanks for this refresher. It reminded me of good old days when I use to work heavily with C and RTOS. Would this be a good idea, if I read a csv file to train some Machine Learning model? What are your thoughts? What would be the Pros and Cons?

Christopher Trudeau RP Team on June 24, 2022

Hi @FooledByCode,

All depends on your size and performance needs. I see mmap as an optimization – I’d always try to write some code first and see if I run into bottlenecks. If it turns out that I have some, then I go looking for ways of optimizing.

There are a lot of good data libraries out there for ML, you may be better off using one of those rather than trying to write your own based on mmap. Of course, if you’re doing it to learn the underlying tech, that can always be fun too :)

FooledByCode on June 25, 2022

Thanks for the response.

Become a Member to join the conversation.