Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

This lesson is for members only. Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Using Statistical Methods on DataFrames

00:00 pandas provides many statistical methods for DataFrames. For example, you can get some basic stats for the numerical columns of a DataFrame by using the .describe() method. So on our df DataFrame, if we call the .describe() method,

00:18 we’re going to get a new DataFrame. The rows are going to give us the stats for all of the columns that have a numerical value. We’re going to get a count—so, the number of rows—and then we’re going to get the mean, standard deviation, the min, and then the 25th, 50th, and 75th percentiles, and then the maximum.

00:40 This provides a quick overview of some of the statistical information of your DataFrame. Now, if you wanted to get some of these stats for a particular column—so, for example, let’s say for the py-score column—

00:55 then we can call the .mean() method and that’ll give us the mean. So that corresponds to that 35.0. Or say we wanted the standard deviation,

01:07 we could you use the .std() method. Now, we can also use these methods on the entire DataFrame. So if we just wanted, say, the mean of all of the numerical columns, this is going to return a Series where the labels are going to be the columns where a numerical mean can be computed.

01:26 So, notice that when you apply these methods to a DataFrame, you get a pandas Series, and when you apply one of these methods to a Series, like standard deviation, you’re going to get a number. All right, that’s a quick lesson there on doing some basic stats on a DataFrame.

01:45 In the next lesson, we’ll take a look at the methods that pandas provides for working with missing data in the DataFrame.

tonypy on March 23, 2023

Just an FYI.

When using df.mean():

FutureWarning: The default value of numeric_only in DataFrame.mean is deprecated. In a future version, it will default to False. In addition, specifying ‘numeric_only=None’ is deprecated. Select only valid columns or specify the value of numeric_only to silence this warning. df.mean()

Become a Member to join the conversation.