Hint: You can adjust the default video playback speed in your account settings.
Hint: You can set the default subtitles language in your account settings.
Sorry! Looks like there’s an issue with video playback 🙁 This might be due to a temporary outage or because of a configuration issue with your browser. Please see our video player troubleshooting guide to resolve the issue.

Combining Data in pandas with concat() and merge() (Overview)

The Series and DataFrame objects in pandas are powerful tools for exploring and analyzing data. Part of their power comes from a multifaceted approach to combining separate datasets. With pandas, you can merge and concatenate your datasets, allowing you to unify and better understand your data as you analyze it.

In this video course, you’ll learn how and when to combine your data in pandas with:

  • merge() for combining data on common columns or indices
  • concat() for combining DataFrames across rows or columns

If you have some experience using DataFrame and Series objects in pandas and you’re ready to learn how to combine them, then this video course will help you do exactly that. If you’re feeling a bit rusty, then you can watch a quick refresher on DataFrames before proceeding.

Download

Sample Code (.zip)

8.0 KB

Download

Course Slides (.pdf)

4.0 MB

00:00 Welcome to this course on Combining Data in pandas Using concat() and merge(). If you’re working in data analysis, you’ll often have to combine two DataFrames together.

00:09 Maybe you have two CSV files that you want to combine or some other types of data that you’re pulling from an API, but you want to combine them in order to gain some interesting insights. Now, how do you perform these operations using pandas, and what’s the right function to use? Should you use pd.concat() or should you use pd.merge()?

00:28 Do you want to concatenate DataFrames, or do you want to use database join operations? And what does all of this really mean? And finally, what’s this tomato got to do with all of that?

00:43 In this course, you’ll set up a working environment and create a quick dataset, and then you will learn about how to use pd.concat() and pd.merge() to combine your data using pandas.

00:55 You’ll start by creating a virtual environment and installing the necessary dependencies.

01:03 Then you will create a healthy dataset that consists of just two small DataFrames that you will work with throughout this whole course.

01:13 You will get to know the pd.concat() function in depth, and you will get to know a lot of the optional keyword arguments that you can pass to this function. On this slide, you can see all the keyword arguments that you will work with throughout the course.

01:28 You will start by concatenating two DataFrames along the row axis. Then you will concatenate the same DataFrames along the columns axis and notice how the results are quite different.

01:38 You will learn to mark your DataFrames using keys to create a multi-index DataFrame. Then you’ll see how you can access data in such a multi-index DataFrame.

01:48 Alternatively, you’ll learn how you can re-create a new index after concatenation. And then you will start going into database join territory when you learn how to join the columns while concatenating on rows, as well as how to join the rows when you’re concatenating on columns. Next, you will get to know the merge() function in pandas. Again, you can see that there’s a lot of keyword arguments that you will learn about throughout this course. You will start by performing an inner join, continue with an outer join, and then explore other joins that you can do, specifically left outer and right outer joins. In another lesson, you will learn how you can specify the join columns explicitly using the on keyword argument and how you can join DataFrames using index columns or named columns from either of the two DataFrames or even combinations of both of those.

02:43 You will perform a cross join to create a Cartesian product of both of the DataFrames, and you’ll learn how you can customize the column suffixes so that you can still keep an overview of where did the data come from. Finally, you’ll get to practice your intuition about using database joins with pd.merge() and run into a couple of peculiarities and gotchas that are helpful to be aware of. And that sums up the content of this course.

03:11 I hope you’re excited to get started and that it will be useful for you to practice the skills of combining data using pandas. In the next lesson, you’ll get started getting started.

Become a Member to join the conversation.