Forcing Lower Precision
00:00
Force lower precision. If you’re okay with less precise data types, then you can potentially save a significant amount of memory. First, get the data types with .dtypes
again.
00:28
The columns with the floating-point numbers are 64-bit floats, so each number of this float64
type consumes 64 bits, which is 8 bytes. Each column has 20 numbers, and therefore requires 160 bytes.
00:42
You can verify this with .memory_usage()
. .memory_usage()
returns an instance of Series
with the memory usage of each column in bytes.
00:55
You can conveniently combine it with .loc[]
and .sum()
to get the memory for a group of columns.
01:05
This example shows how you can combine the numeric columns 'POP'
, 'AREA'
, and 'GDP'
to get their total memory requirement.
01:14
The argument index=False
excludes data for row labels from the resulting Series
object. For these three columns, you’ll need 480 bytes.
01:26
You can also extract the data values in the form of a NumPy array with .to_numpy()
or .values
. Then, use the .nbytes
attribute to get the total bytes consumed by the items of the array.
01:40
The result is the same 480 bytes. So, how do you save memory? In this case, you can specify that your numeric columns 'POP'
, 'AREA'
, and 'GDP'
should have the type float32
.
01:54
Use the optional parameter dtype
to do this.
02:24 You can now verify that each numeric column needs 80 bytes, or 4 bytes per item.
02:37
Each value is a floating-point number of 32 bits or 4 bytes. The three numeric columns contain 20 items each. In total, you’ll need 240 bytes of memory when you work with the type float32
.
02:52
This is half the size of the 480 bytes you’d need with float64
numbers. In addition to saving memory, you can significantly reduce the time required to process data by using float32
instead of float64
in some cases.
03:08 Next up, using chunks to deal with your data in segments.
Become a Member to join the conversation.