Create a DatetimeIndex From Component Columns
In this lesson you’ll learn how to create a DatetimeIndex
from multiple component columns that together form a date or datetime.
00:00
In this video, you’re going to learn how to take a series of columns and create a DatetimeIndex
out of them. A cool feature in Pandas is you can create a DatetimeIndex
from a set of component columns in a DataFrame.
00:14 These component columns are columns that contain components of a date, so something like a year or a month. Pandas has a function that can mash all these together to give you a new index.
00:24
So, let’s see what this looks like. Open up your terminal and start the Python interpreter. import pandas as pd
, import numpy as np
, and to help make some data, from itertools import product
.
00:50
Now define a list of names of columns called datecols
, and just set this equal to ['year', 'month', 'day']
.
01:04
Now you can create the DataFrame, so set df = pd.DataFrame(list())
, and now use that product()
function and just pass in, let’s say, two years here.
01:19
I’ll just do two months to save some time. And go ahead, and maybe like three days. product()
will then iterate through each of these lists like nested for
loops and generate a row for each combination, which is a good way to make some data real quick.
01:38
And for column names, just put in datecols
,
01:47
just like that. Because you’re going to turn each of these date columns into the index and get rid of them, you should make a column for 'data'
.
01:57
So just make a new column and just set this equal to np.random.randn()
, and run this for the length of the DataFrame. All right, let’s see what that looks like.
02:11
So here you can see that product()
did its job. It looped through everything and generated these dates with a value for each column. Your data is over here. And if you note, there’s just an integer index on this side.
02:24
To convert that index into a DatetimeIndex
, you can just say df.index = pd.to_datetime()
, and pass in those columns. Just call them with datecols
.
02:40 Now let’s see what that looks like. Here, you’ve got all the same data columns, so your date columns and your actual data, and then your index has been converted to datetimes, which lets you use functions and methods specific to datetimes in Pandas.
02:57
Because you’ve got everything here, you can get rid of these columns. So let’s do something like df = df.drop()
, get rid of those datecols
, and because you’re getting rid of columns instead of rows, make sure you set the axis=1
.
03:12
And because you’ll just have that one column of data left, you can call .squeeze()
to convert it into a Series as opposed to a DataFrame. So now if you’ll get df
, which is now a Series, you just have your index, your values, and then your Series information down here.
03:29 So, the reason you can do this is because DataFrames are treated somewhat like dictionaries, where each column is a key and then the values in that column are the values.
03:39
If you’ve ever used the datetime
library and you wanted to do something like datetime.datetime()
and then construct it based on the components of the date,
03:54
you can imagine that Pandas is doing something pretty similar to this function call here. And that’s it! You should now feel pretty comfortable setting a new DatetimeIndex
using Pandas’ to_datetime()
method and passing in a set of component columns. Thanks for watching.
Become a Member to join the conversation.