Join us and get access to thousands of tutorials and a community of expert Pythonistas.

This lesson is for members only. Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Using Chunks

Reading and Writing Files With pandas Darren Jones 02:02

00:00 Use chunks to iterate through files. Another way to deal with very large datasets is to split the data into smaller chunks and process one chunk at a time.

00:11 If you use read_csv(), read_json(), or read_sql(), then you can specify the optional parameter chunksize.

00:22 chunksize defaults to None and can take on an integer value that indicates the number of items in a single chunk. When it’s set to an integer, read_csv() returns an iterable that you can use in a for loop to get and process only a fragment of the dataset in each iteration.

00:54 In this example, the chunksize is 8.

01:15 The first iteration of the for loop returns a DataFrame with the first eight rows of the dataset only. The second iteration returns another DataFrame with the next eight rows, and the third and last iteration returns the remaining four rows. In each iteration, you get and process the DataFrame with a number of rows equal to chunksize.

01:40 It’s possible to have fewer rows than the value of chunksize in the last iteration, as you’ve seen, and you can use this functionality to control the amount of memory required to process data and keep that amount reasonably small.

01:55 Now that you’ve seen techniques for working with big data, let’s review what you’ve learned in this course.

Become a Member to join the conversation.