Join us and get access to thousands of tutorials and a community of expert Pythonistas.

This lesson is for members only. Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Using Statistical Methods on DataFrames

The pandas DataFrame: Working With Data Efficiently Cesar Aguilar 01:53

Transcript
Discussion (2)

00:00 pandas provides many statistical methods for DataFrames. For example, you can get some basic stats for the numerical columns of a DataFrame by using the .describe() method. So on our df DataFrame, if we call the .describe() method,

00:18 we’re going to get a new DataFrame. The rows are going to give us the stats for all of the columns that have a numerical value. We’re going to get a count—so, the number of rows—and then we’re going to get the mean, standard deviation, the min, and then the 25th, 50th, and 75th percentiles, and then the maximum.

00:40 This provides a quick overview of some of the statistical information of your DataFrame. Now, if you wanted to get some of these stats for a particular column—so, for example, let’s say for the py-score column—

00:55 then we can call the .mean() method and that’ll give us the mean. So that corresponds to that 35.0. Or say we wanted the standard deviation,

01:07 we could you use the .std() method. Now, we can also use these methods on the entire DataFrame. So if we just wanted, say, the mean of all of the numerical columns, this is going to return a Series where the labels are going to be the columns where a numerical mean can be computed.

01:26 So, notice that when you apply these methods to a DataFrame, you get a pandas Series, and when you apply one of these methods to a Series, like standard deviation, you’re going to get a number. All right, that’s a quick lesson there on doing some basic stats on a DataFrame.

01:45 In the next lesson, we’ll take a look at the methods that pandas provides for working with missing data in the DataFrame.

tonypy on March 23, 2023

Just an FYI.

When using df.mean():

FutureWarning: The default value of numeric_only in DataFrame.mean is deprecated. In a future version, it will default to False. In addition, specifying ‘numeric_only=None’ is deprecated. Select only valid columns or specify the value of numeric_only to silence this warning. df.mean()

Alex Hewson on March 14, 2026

I also ran into an issue here that had been carried forward from replacing the deprecated append method earlier in the Deleting and Inserting Rows video.

The newly created john pd.Series object from that lesson converts the py-score and age column values to object, rather than maintaining float64 and int64. So when you use the df.describe() method, only the django-score and js-score columns are returned.

To fix I went back to where the bug was introduced and used the astype method at the end of the exercise. The age column is dropped shortly after, so I didn’t bother changing that back.

df = df.astype(dtype={'py-score': np.float64})

Become a Member to join the conversation.