Using Statistical Methods on DataFrames
pandas provides many statistical methods for DataFrames. For example, you can get some basic stats for the numerical columns of a DataFrame by using the
.describe() method. So on our
df DataFrame, if we call the
00:18 we’re going to get a new DataFrame. The rows are going to give us the stats for all of the columns that have a numerical value. We’re going to get a count—so, the number of rows—and then we’re going to get the mean, standard deviation, the min, and then the 25th, 50th, and 75th percentiles, and then the maximum.
This provides a quick overview of some of the statistical information of your DataFrame. Now, if you wanted to get some of these stats for a particular column—so, for example, let’s say for the
we could you use the
.std() method. Now, we can also use these methods on the entire DataFrame. So if we just wanted, say, the mean of all of the numerical columns, this is going to return a Series where the labels are going to be the columns where a numerical mean can be computed.
So, notice that when you apply these methods to a DataFrame, you get a pandas
Series, and when you apply one of these methods to a Series, like standard deviation, you’re going to get a number. All right, that’s a quick lesson there on doing some basic stats on a DataFrame.
Become a Member to join the conversation.