For more examples of what you can do with data cleanup, check out Pythonic Data Cleaning With pandas and NumPy.
Cleaning Your Data
Most of the columns have 126,314 non-null values. And the
DataFrame has 126,314 rows, thus those columns have no null or missing values. However, the
notes column has only 5,424 non-null values, which says there are a lot of missing values in the
One way to manage missing data is to completely remove it. You can simply omit any row that has a column with missing values. The
DataFrame only has missing values for the
notes column, but the
.dropna() method will search all columns in the
DataFrame for nulls.
The cleaned data has 5,424 rows, so more than 90% of the rows were dropped. For the
DataFrame, the notes data is not that relevant, at least for this course, therefore it makes more sense to drop the column itself. Remember, in the previous lesson there was a discussion about the
axis keyword argument. The
.dropna() method accepts the same argument and values. By default, it is
0, which means “Drop rows with missing values.” However, if you explicitly set it to
.dropna() will search for columns with missing values.
Apparently, there was at least one game in which a team scored no points. Did they even try? Let’s take a deeper look. If you look at the
notes column, it says that this game was forfeited. Now since there’s only one game with
0 points it might not have much of an impact, but you can now proceed to either remove or handle this row.
The first condition checks that the number of points scored by a team is greater than the number of opposition points. The second condition checks that the game was not a win. When joined with the AND operator (
&), the query should yield no results. And it does, but there’s an easier way to see. Simply get the
A query returns a
DataFrame, and a
DataFrame with no rows is empty. To check the opposite, reverse the first comparison to check for games in which the opponents had more points and make sure the game was not a loss. Again, this should be empty.
Become a Member to join the conversation.