Working With Time Series
If you’re following along with this lesson and not using the provided Jupyter Notebook from this course’s supporting materials, you can copy-paste the following temp_c
list:
temp_c = [ 8.0, 7.1, 6.8, 6.4, 6.0, 5.4, 4.8, 5.0,
9.1, 12.8, 15.3, 19.1, 21.2, 22.1, 22.4, 23.1,
21.0, 17.9, 15.5, 14.4, 11.9, 11.0, 10.2, 9.1]
00:00
One of the main thrusts for creating the pandas module was to work with time-series data. To showcase some of the ways that you can work with time-series data in pandas, we’re going to create a pandas DataFrame using the hourly temperature data from a single day. To do this, let’s define a list, which we’ll call temp_c
, and the data that I’m going to paste will be included in the video description.
00:27 This data contains temperature measurements taken at one-hour intervals over a 24-hour period in Celsius.
00:37
So, go ahead and run that. The DataFrame that we’re going to create is going to contain as its index a datetime index. And to do this, we’re going to use the date_range()
function.
00:49
date_range()
takes on several keyword arguments. One of them is the start
keyword argument, which can be used to specify the left bound of the date range.
01:01 Let’s go ahead and put the year 2019 and it’s going to be, say, October 27th. We’re going to be using the ISO 8601 datetime format, year, month, and then the day. And then the hour, we’re going to start at 12:00 AM in the morning.
01:22 This format for the time is going to be the hours, the minutes, and the seconds, and all of this is over a 24-hour time format. Then we’re going to want to have this datetime range over a period of 24 hours, and the frequency is going to be in hours.
01:41
So this date range is going to serve as our index for the DataFrame that we’re going to construct. So why don’t we save this, say, in a variable dt
.
01:49
Let’s go ahead and run that, and let’s take a look at the type of this date range object, and we see that what we get is a DatetimeIndex
. This is one of the Index
types that exists in pandas—similar, for example, to the RangeIndex
that we’ve seen before. All right, so now that we have the index as a DatetimeIndex
object and we also have the data, we can go ahead and create our pandas DataFrame
.
02:16
We’re going to call this pandas DataFrame
temp
. Let’s call the constructor. The data
is going to be this temperature data.
02:25
The name of the column will be, say, 'temp_c'
, and the data is in this list that we created. And then, lastly, the index
is this DatetimeIndex
that we just created.
02:37
Let’s go ahead and run that. Let’s take a look at this temperature DataFrame. We’ve got our column of temperatures in Celsius. And then again, the index is this DatetimeIndex
. It starts off at 12:00 AM on 2019, October 27th, and runs all the way up to 11:00 PM on the same day.
03:01 And that’s all there is to it to creating a DataFrame with a time-series data and a datetime row index. In the next lesson, you’re going to see how you can conveniently apply slicing techniques to get just part of the information of a pandas DataFrame when you’re working with time-series data and we’ll also take a look at the resampling method.
Become a Member to join the conversation.
Lahiru Ramesh on Nov. 13, 2024
FutureWarning: ‘H’ is deprecated and will be removed in a future version, please use ‘h’ instead. dt = pd.date_range(start=‘2019-10-25 00:00:00’, periods=24, freq=’H’)