Installing pandas and Preparing Data
For more information about concepts covered in this lesson, you can check out:
00:00 Installing pandas and preparing data.
00:04 The code in this course is executed with Python 3.9 and pandas 1.2.5. It’s generally considered best practice to work in a virtual environment when you’re working with any new package. If you’re not sure how to do this, check out this Real Python course.
00:22 First things first, you’ll need the pandas library. You can install it with the line seen onscreen.
00:32 Once the installation process is complete, you should have pandas installed and ready to use with Python.
00:40 In this course, you’ll use the data related to twenty countries. Here’s an overview of the data and sources you’ll be working with. Country is denoted by the country name.
00:49 Each country is in the top ten list for either population, area, or gross domestic product. The row labels for the dataset are the three-letter country codes defined in ISO 3166-1.
01:02
The column label for the dataset is COUNTRY
. Population is expressed in millions. The data comes from a list of countries and dependencies by population on Wikipedia.
01:13
The column label for the dataset is POP
. Area is expressed in thousands of kilometers squared. The data comes from a list of countries and dependencies by area on Wikipedia, and the column label for the dataset is AREA
. Gross domestic product is expressed in millions of U.S. dollars, according to United Nations data for 2017.
01:35
This data was also sourced from Wikipedia, and the column label for the dataset is GDP
. Continent is either Africa, Asia, Oceania, Europe, North America, or South America.
01:48
This information is also from Wikipedia, and the column label for the dataset is CONT
. Independence day is a date that commemorates a nation’s independence.
01:58 The data comes from the list of national independence days on Wikipedia, and the dates are shown in ISO 8601 format. The first four digits represent the year, the next two numbers are the month, and the last two are for the day of the month.
02:12
The column label for the dataset is IND_DAY
. This is how the data looks as a table. You may notice that some of the data is missing. For example, the continent for Russia is not specified because it spreads across both Europe and Asia.
02:29 There are also several missing independence days because the data source omits them. This data can be organized in Python using a nested dictionary. Each row of the table is written as an inner dictionary whose keys are the column names and values are the corresponding data.
02:46
These dictionaries are then collected as the values in the outer data
dictionary. The corresponding keys for data
are the three-letter country codes.
02:56
While you could type the contents of this dictionary manually, it’s probably going to be easier and more accurate if you download the included course files, open up data.py
in a text editor, copy the contents, and then paste them into the REPL as seen onscreen.
03:13
You can use this data to create an instance of a pandas DataFrame. First, you’ll need to import pandas
, and here you can see it being imported using the traditional alias of pd
.
03:26
Now that pandas
is imported, you can use the DataFrame
constructor and data
to create a DataFrame
object.
03:33
The data is organized in such a way that the country codes correspond to columns. You can reverse the rows and columns of a DataFrame with the property .T
as seen at the end of the line onscreen.
03:48
Now you have your DataFrame
object populated with information about each country, so you’re ready to start working with files. The first format you’ll be looking at is CSV.
Chris Bailey RP Team on April 8, 2022
Hi @pnmcdos, You can find the course files in the “Supporting Material” drop down just below the video and above these comments. You will see a link to the original article, a .PDF of the slides, and a .zip file with the code and data.
Become a Member to join the conversation.
pnmcdos on April 8, 2022
Am I missing somenthing? I don’t seem to see where the referenced include course files are accessible for download.