Check for Correlation
00:03 Often, you want to see whether two columns of a dataset are connected. If you pick a major with higher median earnings, do you also have a lower chance of unemployment?
00:13 As a first step, let’s plot those two columns against each other as a scatter plot.
00:24 You should see quite a random-looking plot, as seen here. A quick glance at this figure shows there’s no great correlation between the earnings and unemployment rate. While a scatter plot is an excellent tool for getting a first impression about a possible correlation, it certainly isn’t definitive proof of a connection.
00:43
For an overview of the correlations between different columns, you can use the correlation method, .corr()
. If you suspect a correlation between two values, then you have several tools at your disposal in this Real Python course to verify your hunch and measure how strong the correlation is.
01:01 Keep in mind, though, that even if a correlation exists between two values, it still doesn’t mean that a change in one would result in a change in the other. In other words, correlation does not imply causation. With that fundamental tenant of statistical analysis taken on board, it’s time to move forwards and look at how to analyze groups of data points.
Become a Member to join the conversation.