Explore Your Dataset With pandas (Overview)

Do you have a large dataset that’s full of interesting insights, but you’re not sure where to start exploring it? Has your boss asked you to generate some statistics from it, but they’re not so easy to extract? These are precisely the use cases where pandas and Python can help you! With these tools, you’ll be able to slice a large dataset down into manageable parts and glean insight from that information.

In this course, you’ll learn how to:

  • Calculate metrics about your data
  • Perform basic queries and aggregations
  • Discover and handle incorrect data, inconsistencies, and missing values
  • Visualize your data with plots

You’ll also learn about the differences between the main data structures that pandas and Python use.

To follow along in this course, you’ll need just Python and pandas. Optionally, you can install Jupyter Notebook and Anaconda. If you don’t want to run the code on your local machine, you can find the course demos on Google Colab.

The first demo on Google Colab gets you started with some NBA data, which you’ll explore in the next lesson as you get to know pandas.


Sample Code (.zip)

64.8 KB

Course Slides (.pdf)

723.7 KB

00:00 Hello and welcome! In this course, you will learn how to use the Python package Pandas to discover interesting insights about data. My name is Douglas Starnes and I’ll be your host as you slice and dice data to query, clean, and more.

00:17 Before getting started, there are a few prerequisites that you should have installed. First is Python 3. The current version of Python as of recording is Python 3.9 but earlier versions will work just fine.

00:31 I wouldn’t go back any further than 3.5 or 3.6, but 3.7 and up will be okay. I strongly recommend that if you don’t have Python on your machine to get it with the Anaconda distribution.

00:45 This is a one-stop shop for all of your data science and Python needs. You can download the free, open-source version of Anaconda from anaconda.com/products/individual.

01:00 Click the Download button and select an installer for your operating system.

01:08 After the installation is complete, you will be able to open a terminal or a command prompt and everything will be ready to go. Anaconda will pre-configure an environment, which includes Pandas.

01:20 But Anaconda also does more. In the data science community is a tool called Jupyter Notebook. This is an interactive computing tool that basically stores the output of a Python session in a webpage. I’ll be using it for the demo in this course, and I strongly suggest you do too as Jupyter Notebook is a good habit to develop if you want to work in data science.

01:44 To start a Jupyter Notebook server, simply run the command jupyter notebook at the prompt.

01:52 The default web browser on the system will open to this page. If it doesn’t, simply copy and paste this URL from the output of the Jupyter Notebook server.

02:03 Be sure to copy of the entire token as this is required for security purposes in case the server were exposed on the public web.

02:11 Click the New button and then select Python 3 under Notebook. This will create a new Jupyter Notebook that understands Python 3. In the first cell, enter the following Python code to import the Pandas package.

02:29 Press Shift + Enter to execute the cell. Now, this cell won’t have any output, but if you execute this Python code in the next cell, you’ll see the version of the Pandas package that was installed.

02:43 Of course, you don’t have to use Anaconda. You could create a virtual environment using the python.org installation and then install Pandas and Jupyter Notebook using pip. Basically, to follow along in this course, you’ll need just Python and Pandas. Jupyter Notebook and Anaconda just make it easier. You could also run the demo on Google Colab, a free service for hosting Jupyter Notebooks from Google.

03:07 I’ll include a link to a Jupyter Notebook on Google Colab that will have all of the demo code ready to run. In the next lesson, you’ll see how to take advantage of Pandas and Jupyter Notebook to load and explore a dataset.

Become a Member to join the conversation.