Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

This lesson is for members only. Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Write pandas Objects Directly to Compressed Formats

Since pandas version 0.21.0 you can save your DataFrames in a compressed format to save space. Have a look at this short and sweet recipe to save a DataFrame in a compressed format using gzip:

abalone.to_json('df.json.gz', orient='records',
                lines=True, compression='gzip')

Watch the video to learn more about it.

00:00 You’ve made it. In this last video, you’re going to learn how to take Pandas objects and put them directly into compressed formats. Sometimes the DataFrames you’re working with can get very large and it can be a hassle to save them in a non-compressed format.

00:14 Pandas actually added support in version 0.21.0 to compress these objects directly from Pandas. So let’s take the data set from the settings video and see how this works in the terminal. I’m going to copy that, open up a terminal, and start the Python interpreter.

00:34 Now import pandas as pd, paste everything in, and there you go. So let’s say you’re doing some work on this dataset and you’re ready to save it.

00:46 You can take the DataFrame, and then if you’re going to save it as a JSON file, you could do .to_json(), just call it 'df.json.gz',

01:01 orient it as 'records',

01:07 set lines=True, and for compression, you can actually put in 'gzip'. Before I run this, let me open up my project viewer.

01:22 And there you go. You can see the compressed version of that file has been saved. Let’s take this a step further to show the significance of this. So import os.path and then take that DataFrame again, and this time save it as an uncompressed JSON file, so just 'df.json'.

01:44 orient='records' again, and set lines=True. And there you go. Now the uncompressed version is saved as well. With os.path you can call the getsize() method, so if we did this on 'df.json' and then divided that by the size of the compressed version,

02:13 you can see that the uncompressed version is almost 10 times larger than the compressed version. When you’re dealing with large data sets, this can make a huge difference, so think about using compression before you save your next Pandas object.

02:26 Thanks for watching.

Avatar image for andrewcheryl

andrewcheryl on April 12, 2019

Awesome course - full of really useful tips. Thankyou !

Avatar image for Joe Tatusko

Joe Tatusko RP Team on April 15, 2019

Glad you enjoyed it! Feel free to reach out if you have any questions :D

Avatar image for senatoduro8

senatoduro8 on July 17, 2019

I love the clipboard trick. It’s my favorite so far and it allow me to copy data from the “supporting material” page and get working with without having to save it first as file because it’s a throw away file anyway.

Thanks for the tutorial

Avatar image for Joe Tatusko

Joe Tatusko RP Team on July 18, 2019

Yeah! Such a neat little feature that goes mostly unnoticed. Glad it could help speed up your workflow!

Avatar image for Pygator

Pygator on Nov. 28, 2019

Finally finished, I had forgotten about this course, but more videos from you on Core Pandas datastructures would be nice. Great tips. Also, you sound like the lead actor in Boyhood; I recently watched that movie.

Avatar image for Pakorn

Pakorn on Dec. 18, 2019

Great tips, Thanks!

Avatar image for Fahim

Fahim on Aug. 14, 2020

At last completed it. Great content.

Avatar image for feygin

feygin on May 7, 2021

One of the most comprehensive and usefull cources so far!

Become a Member to join the conversation.