Hint: You can adjust the default video playback speed in your account settings.
Hint: You can set the default subtitles language in your account settings.
Sorry! Looks like there’s an issue with video playback 🙁 This might be due to a temporary outage or because of a configuration issue with your browser. Please see our video player troubleshooting guide to resolve the issue.

Exploring the Olympic Data

00:00 This lesson is an introduction to the first dataset that you’ll be cleaning, the Olympic data.

00:07 The first step you should take now that your project is set up is to do some initial exploration of the Olympic data. You’ll find that the data is about the performance of countries in the Olympic games.

00:18 The main thing about this data is that the headers need renaming because they aren’t very descriptive. So without further ado, open up VS Code to your project and get started with some initial data exploration.

00:32 Before writing any code or doing anything with pandas, the first step is to just visually inspect the data, see what we’re dealing with. So open up the data-sets/ folder and go to the olympics.csv.

00:44 Okay. So what are you looking at here? Now sometimes with CSV data, it can be good to turn off word wrap. You can do that with Alt + Z,

00:53 or you can Control + Shift + P, find Word Wrap, and Toggle Word Wrap. Okay. At the top here, you can see that there is some numbers indicating how many columns there are.

01:08 The next line looks like the headers, and as you can see, they’re not very descriptive, and they have some strange question marks, exclamation marks, doesn’t really make much sense.

01:19 The rest of the data, however, looks okay. There’s countries. There’s numbers, which probably indicate how many medals have been won in different Olympic games. So that all looks good.

01:31 Let’s go to the bottom, pressing Control + N, and all this data looks reasonably well structured. The main things that you are going need to do with this one—I’m going to press Control + Home to go to the top again—take these headers here and rename them into something a bit more descriptive.

01:51 That was exploring the Olympic data. Now you’ve seen it’s about the performance of countries in the Olympics, most likely about the medals that each country has won.

02:00 And you’ve seen that the headers need a bit of cleanup. In the next lesson, you’ll be setting up some boilerplate for this specific cleaning script.

Become a Member to join the conversation.