Slicing DataFrames Using Datetime Indices
00:00
When your DataFrame has a DatetimeIndex
, you can use slicing to get just part of the DataFrame that you want on a specified datetime range. So for example, if we wanted to extract from 5:00 in the morning
00:17
to, say, some other time in the same day… So for example, let’s put… And this should be 27 as well. From 5:00 in the morning until maybe 3:00 PM in the afternoon, so we would pass in '15'
for 3:00 PM and then this would extract the DataFrame only from 5:00 in the morning until 3:00 PM.
00:41 Now, I think that’s pretty cool how you can provide strings for the datetimes and pandas knows that because your rows are labeled as a time series, it interprets those strings as datetimes in the ISO 8601 datetime format. All right, let’s now take a look at a couple of methods that we can apply when we’ve got time-series data. One common thing you may want to do when you’ve got time series data is to resample. The idea of resampling is to change the frequency or the period that exists in your time series.
01:15 So for example, instead of going on an hourly interval, we could, say, go on a four-hour interval or a six-hour interval. So we basically want to do a conversion of our time index.
01:29
Now, the method for this is called .resample()
, and we have to pass in one keyword argument, which is called rule
. Here, if we wanted to, for example, convert our intervals into six-hour intervals, we would pass '6h'
as an example. In this case, what this would do is create four groups.
01:50 Each group has a time index that is covering a six-hour interval. Now, in order for us to do the conversion, we need to decide in those six-hour interval groups, what temperature reading are we going to use to represent the temperature over each interval? What we could do is, say, for example, take the mean.
02:13
What we get is a new DataFrame. Again, it’s still a DataFrame where the index is a DatetimeIndex
, but instead of the period between each reading of one hour, it’s six hours.
02:24 And what we did was, because we chose the mean, in the first six hours we take the mean of those six hours and we get this temperature, and then for the following six hours from 6:00 in the morning until noon, the mean of that is 11.01 and so on.
02:41
So if you’re familiar with the idea of grouping data, or the group by operation in general on databases, that’s exactly what the .resample()
method returns and then this is the aggregate function that we’re applying to each of the groups. Now, instead of the mean, what we could do is, say, take the minimum temperature in each six-hour interval.
03:03 In this case, this changes from the values that we had before to what the minimums were.
03:10
Now, if I go ahead and I show you the DataFrame, so in the first six hours—so from 12:00 in the morning to 5:00 in the morning, so those are six periods—the minimum there was 5.4
, and so that’s why we get the 5.4
. And then in the next time periods from 6:00 in the morning to 11:00 in the morning, the minimum there looks like it’s 4.8
, and that’s what we get here.
03:38 So, the idea of resampling is essentially to convert from one period to another period. In this case, we would have to choose a period that’s longer than the initial period.
03:51 All right, so in this lesson, we saw that when we’re working with a pandas DataFrame that has an index that contains a datetime row index, then we can use slicing to extract part of the DataFrame.
04:03 We also saw the resampling method that allows us to create a new DataFrame by resampling the datetime row indices to create a new time interval in between the individual index values.
04:18 Coming up next, we’ll see one more operation that we can do on a pandas DataFrame that contains a datetime row index, and that is rolling-window analysis.
Become a Member to join the conversation.