Forcing Lower Precision
Force lower precision. If you’re okay with less precise data types, then you can potentially save a significant amount of memory. First, get the data types with
The columns with the floating-point numbers are 64-bit floats, so each number of this
float64 type consumes 64 bits, which is 8 bytes. Each column has 20 numbers, and therefore requires 160 bytes.
You can verify this with
.memory_usage() returns an instance of
Series with the memory usage of each column in bytes.
You can conveniently combine it with
.sum() to get the memory for a group of columns.
This example shows how you can combine the numeric columns
'GDP' to get their total memory requirement.
index=False excludes data for row labels from the resulting
Series object. For these three columns, you’ll need 480 bytes.
You can also extract the data values in the form of a NumPy array with
.values. Then, use the
.nbytes attribute to get the total bytes consumed by the items of the array.
The result is the same 480 bytes. So, how do you save memory? In this case, you can specify that your numeric columns
'GDP' should have the type
Use the optional parameter
dtype to do this.
02:24 You can now verify that each numeric column needs 80 bytes, or 4 bytes per item.
Each value is a floating-point number of 32 bits or 4 bytes. The three numeric columns contain 20 items each. In total, you’ll need 240 bytes of memory when you work with the type
This is half the size of the 480 bytes you’d need with
float64 numbers. In addition to saving memory, you can significantly reduce the time required to process data by using
float32 instead of
float64 in some cases.
03:08 Next up, using chunks to deal with your data in segments.
Become a Member to join the conversation.