Practice Your Intuition With Some Peculiarities
In the previous lesson, you’ve learned how you can use
suffixes parameter to make the column names in the output of your DataFrame a little more understandable. And in this lesson, you will play around a bit with joins using
pd.merge() and run into a couple of maybe unexpected results of your join operations so you can see what’s possible and maybe what you can look out for.
For these small play-around examples, I’m going to change the
fruits DataFrame a little bit and add another row to it. So currently,
fruits DataFrame looks like this, that you have an apple, a pear, tomato, and lemon.
The best way to do this is say
fruits = pd.concat(). And then I want to pass in an iterable that consists first of the current
fruits DataFrame, and then I also want to add a new DataFrame that just has a single row that contains the orange. Let me put this in here.
Now why did I add this orange? Just to show you how these combinations can work and how a piece of data is just a piece of data, and pandas doesn’t really have an understanding of what it means. Think back to your
You also have something in here that’s called an orange, but here it’s about the color of the carrot. Now you could perform a merge on these two DataFrames when specifying that you want to use the
name column of the
fruits DataFrame and the
color column on the
veggies DataFrame. So give it a try.
And when you execute that, you get a new DataFrame with a single row that has a combination of the row that matched here in the
fruits DataFrame, which has the index
4 and has an orange as the name, and then also the color in the
I just want to show you that these merge operations are very versatile, and you can do a lot of stuff with it. Now, what happens when you have these two DataFrames, and you use column names that exist on both of them, but don’t have a match? For example, if you define that you want to perform the merge on
fruits, veggies, left_on="image" and
right_on="name", so you’re explicitly specifying which columns you want to use from the two DataFrames, they exist in these DataFrames, and you perform this merge.
You can see that there is no match. There is no row in these two DataFrames where there is the same piece of data in the
image column of the left DataFrame and the
name column of the right DataFrame.
So your output is not going to be an error. It’s just going to be an empty DataFrame. So this is something to keep in mind. And for a final gotcha, I want you to perform another merge operation, where you put together
veggies not with a default inner join, but you want to do it
how="outer". And then pay attention to what happens in the
amount column. So if I run this, you can see the additional fields get filled with
NaN values, as you already have seen in previous lessons, but something also changed in the
You might wonder, why did this change in data type happen? And the reason simply is that there were a couple of
NaN values, and
NaN values in pandas are represented as instances of floats. And because pandas needs to keep a data type for the whole column, it changes the integer values that it can change into floating-type numbers.
because in this case, you don’t have any
NaN values, which means that the data types remain the same. And this is like an automatic conversion that pandas has to perform when there’s any
NaN values in a numeric column.
06:07 And this sums up a couple of gotchas. What you’ve seen in this lesson is that you can perform merge operations on pieces of data that look the same to the computer, even though they’re not the same for you as the viewer.
So it’s important that you know your dataset, and you understand why and how you want to join specific tables and what you want to join them on. You’ve seen that if you perform a join using
pd.merge() and there are no matching pieces of data, then you will end up with an empty DataFrame.
And you’ve also seen that if you perform a merge that produces
NaN values in a numeric column, this column gets automatically cast to float64, because this is how pandas internally represents
And with this, you are at the end of this course on combining data using pandas
concat() and pandas
merge(). In the next lesson, you’ll see a quick recap of all the content that you’ve covered and all the things that you’ve learned.
Become a Member to join the conversation.