In this lesson you’ll get an introduction into Pandas’ basic data structures: Series and DataFrame. However, this video focusses on the Pandas Series data structure.
Basic Pandas Data Structures
Let’s start by importing a few of the data structures that Pandas includes. So,
from pandas import DataFrame, Series. We’re going to first begin with a
Series DataFrame, which is pretty awesome.
What you need to do is pass it some values, so these would be your apples. Okay, so let’s say you had
[1, 2, 3, 4], and then you also could pass an optional
index, which would be something like this, which you’d use to index the thing.
I believe they’d be
float64. And then that’s how you go about doing stuff like that. So, another thing you can do here is you can go
s.index, and that’ll give you the index column. As you can see, they’re objects, they’re strings saying what the index is. And that’s how you go about it.
And there’s a bunch of other options that you can do with
Series data with Pandas. Next, I’m going to show you an example with some time stamps over time and other things you can do with
We’re then going to go provide an
index will be
DateTimeIndex. It starts on January 1st, 2013. The
periods will be—that’s the number of samplings we would take—is equal to the length of
data. And how frequently they’re sampled is provided by the
freq. So, this is a minutely frequency, so when we look at our object here, what we should see is the first minute in January 1st, 2013, second minute in January 1st, 2013, and so on and so forth.
So as you can see here, we have 10,000 things. Frequency is minutely, of type
int64. So, that’s how the
Series objects look. You can do a bunch of things like
.tail() once you’re dealing with a lot of data.
You can look at the last, by default it says five, but you can provide a number here, like
.head() is vice versa, it’ll give you the first ten, like so. The really cool thing that you can do, though, is seeing that we have
s now here… We’ll call that
s, we’ll evaluate that out.
All right. I believe that’s the case. Next, we’ll go
s_daily = s.resample(), resample that at a daily frequency. So what that ends up doing, it ends up resampling all the
Series objects that you have in your data and it gives you all of the days that we span it to.
So that gives you an easy way from going from a very low frequency to a very high frequency. You can fill forward to fill back. If you go from a low frequency to a much higher frequency, you can fill, you can carry forward. That’s generally how you’d use
Series objects, and that’s where really their power lies. Next, let’s go into DataFrames.
In the previous example, I said this was calculating the sums. This is incorrect. It is actually calculating the means. In order to calculate the sums, you need to pass a
how method to the
.resample(), which will then resample and then sum the daily values.
Become a Member to join the conversation.