For more information on the REPL used in these videos, you can check out bpython and the Real Python tutorial Discover bpython: A Python REPL With IDE-Like Features.
Getting Started With pandas Sort Methods
00:00 Getting Started With Pandas Sort Methods. As a quick reminder, a DataFrame is a data structure with labeled axes for both rows and columns. You can sort a DataFrame by row or column value, as well as by row or column index. Both rows and columns have indices, which are numerical representations of where the data is in your DataFrame.
00:55 The EPA fuel economy dataset is great because it has many different types of information that you can sort on, from textual to numeric data types. The dataset contains eighty-three columns in total.
It offers a number of extra features compared to the standard Python REPL, including color-coding of syntax, which makes it easier to see what’s happening on-screen. However, every command you see will run exactly the same in the standard Python REPL, which typically you will access by typing
01:51 For analysis purposes, you’ll be looking at miles-per-gallon data on vehicles by make, model, year, and other vehicle attributes. You can specify which columns to read into a DataFrame, and in this course, you’ll only need a subset of the available columns.
On-screen, you’ll see the commands to read the relevant columns of the fuel economy dataset into a DataFrame and to display the first five rows. First,
pandas is imported with the usual alias of
02:45 The next line creates a DataFrame by downloading the CSV data from the selected URL, limiting the size of the DataFrame to the first hundred rows. Note that the fuel economy dataset is around eighteen megabytes.
04:08 The row index of the DataFrame is outlined in blue on-screen. An index isn’t considered a column, and you typically have only a single row index. The row index can be thought of as the row numbers, which start from zero.
Become a Member to join the conversation.