Locked learning resources

Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Locked learning resources

This lesson is for members only. Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Saving Your Data Locally

00:00 Now that you’ve downloaded your data, you’ll want to save it locally so that you don’t have to keep relying on your internet connection and on Wikipedia to be able to work in your projects.

00:10 Now, thankfully, pandas supports many different types of files. For example, CSV and JSON files, HTML files (and in fact, you just used the HTML capabilities of pandas), Parquet files, which are very common formats in the world of data, and many, many more.

00:29 But for this lesson, you’re going to keep it simple, and you’re just going to use a CSV file, which is a file you can easily inspect by hand. So what you’ll want to do now is go ahead and open your notebook.

00:41 Once you have your notebook open, you want to make sure that you still have the variable data from the previous lesson, but maybe you closed your notebook.

00:49 So what you have to do is just rerun all of your cells. So you run all of them. You check that the response code is 200, and then you have your data.

00:59 Now to take this and save it locally, all you have to do is type data.to_csv(). So this is a method on your data object, and you just write the name of the file in which you want to save the data.

01:15 For this example, you’re going with world_population.csv. You execute the cell, and there you go. Now, if you open your terminal, navigate to the folder you’re working in, and run the command ls -al, you will find two files, the IPYNB file, which here is called Wikipedia.ipynb, and is the notebook you’re currently working in and where the code that produces the world_population.csv file lives.

01:45 Now, the CSV file is a file you just created with a local version of your data, which means that if you close the notebook you’re currently working in, or if you want to do your analysis in a separate notebook, you don’t need to fetch the data from the internet.

01:58 Again, you can just use your local version, which by the way, you can also download from the video course resources that’s linked below.

02:08 Now, how do you take your CSV file and load it into pandas so that you can work with it?

02:15 Go to your notebook. You can actually separate the two steps. This could have been your step that fetches data, and now you’re going to create a new notebook to process your data.

02:27 And in it, you’re just going to import pandas as pd, the very, very common abbreviation, and you’ll say that your data is pd.read_csv(). You just pass in the file path, and pandas should read your data in.

02:47 So now you can look at your variable data, and you can see that it still contains the same table with the same countries, the same dependencies, and the same population values.

02:57 And in the next lesson, you’re going to understand what this object is, because in Python, everything’s an object, and every object has a type. But what’s the type of this object? Is this a list?

03:07 Is this a string? Well, it’s neither of those. It’s a DataFrame. But what is a DataFrame? And by the way, why do you suddenly have two columns with the same numbers?

03:17 Why is that column duplicated? So these are the questions you’ll see answered in the next lesson.

Become a Member to join the conversation.