Graphing HR Diagrams

Exploring Astrophysics in Python With pandas and Matplotlib Christopher Trudeau 05:13

00:00 Remember when I kvetched and moaned about the context-based graphing mechanism in Matplotlib? Well, this time around, I’m not going to use it. Well, sort of. The subplots() function on the pyplt module creates a graph object to manipulate.

00:16 It then gives you two things related to that graph: the figure, which is the container that everything’s in, and the axes on the graph. Personally, I find the fact that Matplotlib calls this the axes confusing.

00:29 I think of an axis as being the X or Y of a graph. And I guess when you pluralize it like this, you’re getting both back? But it definitely wouldn’t have been how I structured or named things.

00:40 I think I’m grumbling about Matplotlib again. The suptitle() method on figure is the super title, something for the overall figure.

00:49 In this case, I’m titling it “HR Diagram”.

00:52 And remember when I said instead of doing context-based, I’d be using graph objects? And that was kind of why I hesitated. You kind of are and you kind of aren’t, although you’ve got a reference to the figure and the axis here, you still call scatter() directly on the pyplot module.

01:09 scatter() is what plots a scatter chart. The first argument is the data for the x axis, and the second is the argument for the y axis. For the HR Diagram, these are the temperature and luminosity columns from our DataFrame respectively.

01:24 The s argument stands for size and is what specifies the size of the dot in the scatter plot. In this case, I’m passing in our radius_sol column, so the dots will be of different sizes depending on the star.

01:38 The color argument allows you to specify the color of the dot, and the edge_color is for the color of the dot’s outline. This is why I calculated those new columns before.

01:51 Once I’ve got the data plotted, I want to add some info to the axes. First, for the y axis, I’m setting a label and telling it what the range on the axis is to be.

02:02 Matplotlib will default the range to something that fits, but that can mean some weird start and end values, and because I want a log scale, I’m being explicit about these being powers of 10.

02:14 Speaking of the log scale, the set_yscale() call allows you to determine how the scale of the axes gets displayed. Passing in log as a string says to use the log-based 10 scale. Logs are tricky in coding.

02:28 Different libraries will use a different base when you write log without a number. Whenever you are using these kinds of things, make sure to look it up in the docs or you might end up using log base 2 when you mean 10 or e or whatever.

02:41 It can get confusing.

02:43 Next, I do the same thing for the x axis, setting a label, a limit, and changing it to log scale as well.

02:51 If you’d stopped there, you’d find that the x axis was actually kind of unreadable. The values of the temperatures would be displayed in scientific notation and they’d overlap each other on the screen.

03:02 To get around this, I want to use a custom formatter for my tick labels that displays values in thousands. To use a custom formatter, you first need a lambda that returns a string.

03:14 This lambda will be passed the x value and its position. I’m actually only concerned about the x value in this case, so I convert it to an integer, divide it by a thousand, and then tack on k for kilo on the end. Matplotlib comes with a bunch of tick formatters out of the box.

03:31 The one I’m using is the FuncFormatter, and as the name implies, it uses a function to format the tick label. That’ll be our lambda.

03:40 A graph in Matplotlib has both major and minor ticks. These would be big ticks and smaller ones between them. Here, for reference, I’m turning the minor ticks on.

03:51 And then finally, this is the call that actually sets the FuncFormatter. set_major_formatter() uses my custom lambda and we’ll use that to produce the actual tick labels.

04:05 That’s our primary graph. Now I want to augment it by adding the absolute magnitude information as an axis on the right. The twinx() call on an axis creates a corresponding second axis. Somewhat counterintuitively, calling twinx() creates a new set of axes, plural.

04:23 You get both of them, a top and a right in our case, but I’ve had challenges getting them to behave when you do it this way. So I twin the x and then I set the y inside the twinned set.

04:36 These three calls set the label, position, and the limit of the axes to display our magnitudes. The magic numbers for the limits here were created by me looking up the range of luminosities and calculating the corresponding lower and upper absolute magnitudes.

04:51 And finally, now I do the same thing for the B-V color index on the top. And as I said, although twinx() creates two axes, I couldn’t get them to behave.

05:01 So now I twiny() to get my new top axes. Then I set a label on it and the limits and I’m done. The last line there pops up the resulting graph.

Become a Member to join the conversation.