Setting Up the pandas DataFrame
00:00
This is conjunctions.py
.
00:03
The first import here is from Python’s Standard Library. It’s the warnings
module. You can use this to control and filter warnings issued by the code that runs.
00:14
There are several different calls I need from the coordinates
module, so I’m importing the whole thing and then I import the pandas
module.
00:22
It is common practice when doing so to alias it as pd
. I’m not actually a huge fan of aliasing. Too many short forms. Makes it hard to track what’s going on in your code, but in the case of pandas
, everybody does it.
00:35 So I try to be consistent with everybody.
00:38
In the slides I mentioned tabulate
. This is our data frame pretty printer,
00:44
and here is where I’m grabbing my location from the comp file. That’s an EarthLocation
object, the one that I just showed you being constructed.
00:53 The goal of this script is to loop through a number of days checking for conjunctions. With Mercury, I’ve defined two constants. The number of days between checking and how many iterations I want to do.
01:06 These two combined give me just under two years of data checking once a week, declared the reference planet and other planets to loop through up at the top, kind of like constants.
01:16 So if you wanted to search using a different reference or a subset, you could just modify these values here.
01:23 Let me scroll down a little bit.
01:28
I want to use a pandas
data frame so that I can track my results as a table. I’m actually going to use two data frames to do this work. This is the first one and it’s going to contain the data.
01:40 The second one will be a temporary object containing the single row I want to add to this table. So this line is declaring the first table, starting it out empty.
01:52 When you build the data frame directly, the first argument is row data. Since I’m starting out empty, I’m passing in an empty list as I have no row information to pass in, yet the data frame won’t know the shape of the table, so I’m using the columns argument to specify just what columns I want.
02:11 I’m gonna start out with a column for my date, and then I’m gonna have one column for each of my six planets.
02:18
I’m naming the date ts
for timestamp. Technically, a date isn’t really a timestamp, but this short form is common and it keeps the width of my column title small.
02:28 Remember, every row in a data frame gets an index, which you can use to reference it by default. This is an auto-generated integer when dealing with time series data.
02:37
That’s data where everything has some sort of timestamp. It’s best practice to use the timestamp as the index. Instead, I’m changing the index to be based on the ts
column by calling .set_index()
.
02:51
For this program, I won’t really be taking advantage of the time series features, but pandas
has tools for interpolating between rows when time is involved.
02:59 So using the timestamp as your index is a good thing, and in this case, it also removes an extra column. I’m going to need the date column. I don’t need an auto-generated index.
03:10 Most operations on a data frame create a copy of a data frame and return it. I actually don’t like this default. It means using a lot of memory and you often end up reassigning the same variable name to the new thing and throwing the old one out.
03:23
If you don’t like this behavior, which like I just said, I don’t, many of the data frame methods take an argument called inplace
, which when true operates on the existing data frame.
03:34
Instead, you have to be careful with this. I think I had to squish at least three bugs in this course because I forgot to use inplace
, but expected the change to operate on the existing data frame and had to hunt down why it looked like the change hadn’t happened at all.
03:47 So it might be better practice just to get used to reassignment, but it makes me uncomfortable. Anyhow, the next bit here is what does the actual calculations.
03:57
I start the conjunction check based on today. That’s actually an oversimplification, but I’ll get back to it in a the today()
function on the date
in datetime
module returns.
Become a Member to join the conversation.