chunksize defaults to
None and can take on an integer value that indicates the number of items in a single chunk. When it’s set to an integer,
read_csv() returns an iterable that you can use in a
for loop to get and process only a fragment of the dataset in each iteration.
The first iteration of the
for loop returns a
DataFrame with the first eight rows of the dataset only. The second iteration returns another
DataFrame with the next eight rows, and the third and last iteration returns the remaining four rows. In each iteration, you get and process the
DataFrame with a number of rows equal to
It’s possible to have fewer rows than the value of
chunksize in the last iteration, as you’ve seen, and you can use this functionality to control the amount of memory required to process data and keep that amount reasonably small.
Become a Member to join the conversation.