Locked learning resources

Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Locked learning resources

This lesson is for members only. Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Forcing Lower Precision

Reading and Writing Files With pandas Darren Jones 03:13

Transcript
Discussion

00:00 Force lower precision. If you’re okay with less precise data types, then you can potentially save a significant amount of memory. First, get the data types with .dtypes again.

00:28 The columns with the floating-point numbers are 64-bit floats, so each number of this float64 type consumes 64 bits, which is 8 bytes. Each column has 20 numbers, and therefore requires 160 bytes.

00:42 You can verify this with .memory_usage(). .memory_usage() returns an instance of Series with the memory usage of each column in bytes.

00:55 You can conveniently combine it with .loc[] and .sum() to get the memory for a group of columns.

01:05 This example shows how you can combine the numeric columns 'POP', 'AREA', and 'GDP' to get their total memory requirement.

01:14 The argument index=False excludes data for row labels from the resulting Series object. For these three columns, you’ll need 480 bytes.

01:26 You can also extract the data values in the form of a NumPy array with .to_numpy() or .values. Then, use the .nbytes attribute to get the total bytes consumed by the items of the array.

01:40 The result is the same 480 bytes. So, how do you save memory? In this case, you can specify that your numeric columns 'POP', 'AREA', and 'GDP' should have the type float32.

01:54 Use the optional parameter dtype to do this.

02:24 You can now verify that each numeric column needs 80 bytes, or 4 bytes per item.

02:37 Each value is a floating-point number of 32 bits or 4 bytes. The three numeric columns contain 20 items each. In total, you’ll need 240 bytes of memory when you work with the type float32.

02:52 This is half the size of the 480 bytes you’d need with float64 numbers. In addition to saving memory, you can significantly reduce the time required to process data by using float32 instead of float64 in some cases.

03:08 Next up, using chunks to deal with your data in segments.

Become a Member to join the conversation.