Combining Data in pandas With concat() and merge() (Summary)
You’ve now learned two of the most important techniques for combining data in pandas:
merge()
for combining data on common columns or indicesconcat()
for combining DataFrames across rows or columns
In addition to learning how to use these techniques, you also learned about set logic by experimenting with the different ways to join your datasets. Additionally, you learned about the most common parameters to each of the above techniques, and what arguments you can pass to customize their output.
To learn more about the concepts covered in this course, you can check out:
- Combining Data in Pandas With merge(), .join(), and concat()
- pandas GroupBy: Your Guide to Grouping Data in Python
Congratulations, you made it to the end of the course! What’s your #1 takeaway or favorite thing you learned? How are you going to put your newfound skills to use? Leave a comment in the discussion section and let us know.
00:01
With that, you’ve made it to the end of this course on Combining Data in pandas Using pd.concat()
and pd.merge()
.
00:09 You started in the first section with getting set up by first creating your virtual environment. And here’s, again, the commands, with the one that’s different for Windows shown here in a comment.
00:22 Then next, you created the dataset, and it didn’t look quite as healthy as this one, but still pretty healthy. You created a fruits DataFrame and a vegetables DataFrame, which were the two DataFrames that you worked with throughout the course and barely changing them, but mostly just seeing the effects on concatenating and merging these two DataFrames amongst each other.
00:44
Then you got started using pd.concat()
, which is a module-level function that allows you to stick two DataFrames together with optional set logic against the other axis.
00:56 Now you started by concatenating along the row axis, then concatenated along the column axis. You learned how you can mark your DataFrames, using keys to create multi-index DataFrames.
01:07 You learned how you can access data in such a multi-index DataFrame. You learned how you can re-create a new index after concatenation and drop the two index columns that you had in the source DataFrames.
01:19
You learned how you can join columns when you were concatenating on rows, ao that is the optional set logic applied on the other axis. And then you also learned how you can join rows when you’re concatenating on columns. While going through these different lessons, you step-by-step learned about different keyword arguments that you can pass to pd.concat()
.
01:41
And you learned about all of the ones shown on the screen right now, but keep in mind that there are some other ones that you can explore by yourself. Then you moved on to combining data using pd.merge()
.
01:52
That’s another module-level function that you can use to combine data in pandas, but merge()
does something different than concat()
. It uses join operations on the data in the DataFrames.
02:05
You explored what that means by first performing an inner join using merge()
, which is the default for a call to merge()
. Then you changed that parameter and performed an outer join.
02:15 You tried some other types of joins: left joins and right joins. You learned how you can specify the join columns explicitly instead of letting pandas discover which columns are the intersections between the DataFrames. You learned how you can join two DataFrames using index columns and a mix of index columns and specific named columns in one of the DataFrames.
02:38
You also learned how you can customize column suffixes and to give a good example of why that can be helpful sometimes, you also learned how you can perform a cross join and saw the resulting very large DataFrame. Finally, you also got to practice your intuition by stepping through a couple of join operations and finding some peculiarities and some gotchas that might be helpful to be aware of. While going through these lessons, you also got to know a lot of the keyword arguments to pd.merge()
, aside from the necessary positional arguments left
and right
. And on this slide, you can see all the keyword arguments that you played around with in this course.
03:19
Again, keep in mind that there is more, and don’t be afraid to go explore a little more by using pd.merge()
on your projects by yourself.
03:29 If you want to learn more about working with pandas, make sure to go to the Real Python search, type in pandas as a keyword, and you can either search for all resources or restrict it to articles or courses or both of those. And then see what you can find there.
03:43 There’s two articles that I specifically want to suggest to you as next steps. The first one is the source article for this video course that approaches the same topic, but quite differently. If you go through that tutorial, you’ll see that it works with a real-life dataset, and the data is much more complex than our fruits and vegetables that you got to know in this course.
04:06 I think it’s a great next step to go through this tutorial and work with the real-life dataset after you’ve gotten a good understanding of what the merge and concatenation functions actually do.
04:17
The tutorial also introduces you to the .join()
method that you can use on a DataFrame. And that does the same thing as merge()
, but works on a DataFrame object instead of on a module level.
04:31 Once you get bored of just combining data, I would say that you move on to pandas GroupBy, where you learn how you can group data in Python using the pandas library.
04:42 That’s it for this course. I hope you had a good time and learned something, and go out there, grab an apple or an orange, and have a nice day.
Become a Member to join the conversation.