Omitting Rows
00:00 Omit rows. When you test an algorithm for data processing or machine learning, you often don’t need the entire dataset. It’s convenient to load only a subset of the data to speed up the process.
00:14
The pandas read_csv()
and read_excel()
functions have some optional parameters that allow you to select which rows you want to load.
00:23
skiprows
: either the number of rows to skip at the beginning of the file if it’s an integer or the zero-based indices of the rows to skip if it’s a list-like object. skipfooter
the number of rows to skip at the end of the file. nrows
: the number of rows to read.
00:44 Here’s how you would skip rows with odd, zero-based indices, keeping the even ones.
00:57
In this example, skiprows
is passed the range
object with 1
, 20
, and 2
as the parameters, which corresponds to the odd values 1
, 3
, 5
, et cetera, up to 19
.
01:13
The instances of the Python built-in class range
behave like sequences. The first row of the file data.csv
is the header row. It has the index 0
, so pandas loads it in. The second row with index 1
corresponds to the label CHN
, and pandas skips it.
01:31
The third row with the index 2
and the label IND
is loaded, and so on. If you want to choose rows randomly, then skiprows
could be a list or NumPy array with pseudorandom numbers obtained either with pure Python or with NumPy.
01:48 Next, you’ll look at reducing precision and the effect it has on memory use.
Become a Member to join the conversation.