New Statistics Functions
statistics module was added to Python in version 3.4. This release has added three new functions:
linear_regression(). Note that although this module has lots of useful stuff in it, if you’re really deep into the stats and math, you probably want to use one of the popular third-party libraries to do your thing instead.
00:34 All three of the new methods help you evaluate the dependencies between two sets of data. To help illustrate this idea, assume you have two lists—one with the number of words in a series of articles and the second with the corresponding views on those same articles.
covariance() indicates how much a change in one variable influences the change in another variable. A positive result means that as the first variable gets bigger, so does the second variable. A negative result means the second variable gets smaller when the first gets bigger.
01:31 correlation! Correlation is a normalized covariance. It ranges between -1 and 1. The closer to 1, the more positive a correlation. The closer to -1, the more negative a correlation. The closer to 0, the less correlation. For the data here, the resulting correlation is 0.45.
01:54 This means that there is some relationship between the two values but not an extreme one. Note that correlation does not indicate causation. Although these two values have some correspondence, they both might be caused by a third unknown factor. For example, maybe an author who writes longer articles is more popular.
02:14 The correlation between the size of the article and the views would have nothing to do with the length, but due to the popularity of the author. Removing the popular author’s data might cause the correlation to plummet.
02:43 Linear regression creates a line of best fit through the data, and using that line, you can plot values between the data points. This function returns an object with two values in it: the slope of the best fit line and the intercept point on the graph.
03:27 I take the 10,000 words, multiply that by the slope, and then add the intercept. I get a result of 3,528 and a bit. So based on the data given and the linear regression, an article with 10,000 words would probably get about 3,500 views.
Become a Member to join the conversation.