Omitting Rows
00:00 Omit rows. When you test an algorithm for data processing or machine learning, you often don’t need the entire dataset. It’s convenient to load only a subset of the data to speed up the process.
00:14
The pandas read_csv() and read_excel() functions have some optional parameters that allow you to select which rows you want to load.
00:23
skiprows: either the number of rows to skip at the beginning of the file if it’s an integer or the zero-based indices of the rows to skip if it’s a list-like object. skipfooter the number of rows to skip at the end of the file. nrows: the number of rows to read.
00:44 Here’s how you would skip rows with odd, zero-based indices, keeping the even ones.
00:57
In this example, skiprows is passed the range object with 1, 20, and 2 as the parameters, which corresponds to the odd values 1, 3, 5, et cetera, up to 19.
01:13
The instances of the Python built-in class range behave like sequences. The first row of the file data.csv is the header row. It has the index 0, so pandas loads it in. The second row with index 1 corresponds to the label CHN, and pandas skips it.
01:31
The third row with the index 2 and the label IND is loaded, and so on. If you want to choose rows randomly, then skiprows could be a list or NumPy array with pseudorandom numbers obtained either with pure Python or with NumPy.
01:48 Next, you’ll look at reducing precision and the effect it has on memory use.
Become a Member to join the conversation.
