Locked learning resources

Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Locked learning resources

This lesson is for members only. Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Understanding HR Diagrams

00:00 Overwhelmed yet? Yeah, it’s a fair amount of information. If only there was a way of visualizing it all, and I’m sure you’re ahead of me. There is! The Hertzsprung-Russell Diagram plots the relationship between a star’s luminosity and its temperature, and because the absolute magnitude and color index are related to the temperature and luminosity, you can augment the graph with other axes to show this information as well.

00:24 The name of the graph is a bit of a mouthful because it was invented independently twice, so Hertzsprung and Russell get credit together. By looking at a large number of stars on graphs like this, astronomers began to understand the lifecycle of stars and their evolution. To tackle the HR diagram, yeah, I’m not going to be saying Hertzsprung-Russell over and over again, you’ll need some more Python.

00:51 First off is the pandas library. This is a third-party library that does data processing and treats data in memory similar to how a spreadsheet works, using tables with rows and columns.

01:03 The pandas object that stores all this is known as a DataFrame, and if you’re careful with your column names, you can reference them as attributes on the DataFrame.

01:13 pandas provides a read_csv() function that allows you to read in comma-separated values files and create a DataFrame from it. See what I mean about spreadsheet -like? You can export any spreadsheet as a CSV, then create a DataFrame from it.

01:28 The assign() method on a DataFrame is used to add columns, and you can do this based on calculations of other columns. This is similar to what I did with arrays in NumPy that you saw in the earlier lesson.

01:40 A DataFrame also has a decidedly un-Pythonic access mechanism known as .loc. You use .loc and square brackets to access subsets of the DataFrame, including being able to put in conditions.

01:53 They say it’s un-Pythonic because if you’re used to using square brackets to get at the index values of a list, this is not that. It’s closer to the idea of a dictionary, but not quite that either.

02:05 This can cause some confusion if you’re new to pandas. In fact, there is also an iloc which allows you to use the Python-style numeric indices, but I won’t be using it here.

02:15 Since pandas is a third-party library, you’ll need to pip install that fuzzy-wuzzy, little monochromatic bamboo-muncher. You’ve already seen some Matplotlib, but I’m going to be using some more features this time.

02:28 The graph I’ll be doing is a scatter plot. You haven’t seen one of these before it takes XY coordinates and puts a dot for each one that you have.

02:37 You can control the size and color of the dots, and seeing as our dots will be stars, I’m going to use this feature to represent the size and color of the stars in our data. Since I’ll be dealing with a large range of data values, the axes in this plot will be log-based.

02:52 I’ll show you how to do that in the library as well.

02:56 Remember, our plot is luminosity versus temperature, but because magnitude and color index are related to those, I’m also going to show you how to add extra axes to the top and right-hand side of the graph, displaying that information.

03:09 Alright, let’s go look at the actual code.

Become a Member to join the conversation.