Loading video player…

Graphing the Quasar Spectrum

Resource mentioned in this lesson: Astro Data Lab

“Serena” Quasar’s sparcl ID: 719e7ea6-8b79-11ef-be93-525400f334e1

00:00 In the previous lesson, I explained redshift and introduced you to the massive stellar objects known as quasars. In this lesson, I’ll show you how to take a quasar’s spectral data and graph it in a marimo notebook.

00:13 You’ve spent two lessons on science stuff, so how about a quick reminder on our goal? The intent is to create a dashboard where you can interactively overlay spectral lines and change their redshift to match a quasar’s data.

00:26 The data I’m using comes from the Astro Data Lab, which can be found here. The Astro Data Lab has several different databases. The one I’m interested in is known as Sparcl, and has its own Python client for accessing the data.

00:42 To build the dashboard, you’ll need to find a quasar to visualize, fetch its corresponding spectral data. Once you’ve got that, create a marimo notebook to graph the data, and you’ll also need spectral line data.

00:55 You take that, overlay it on the graph, and once all that’s done, you’ll add the interactive parts to the notebook, allowing you to adjust the redshift values for the spectral lines.

01:06 All that is a lot. It’ll take us several lessons to get there.

01:11 This lesson focuses on graphing the spectral data. As such, I’m going to use a CSV file with the data in it rather than fetching it from scratch. The CSV in question is called flux_data.csv and is available in the supporting materials dropdown.

01:27 I’ll use Polars to read the CSV file and then Matplotlib to graph it. All of this will go in a marimo notebook for visualization. In the overview lesson, I told you what libraries are needed for this course.

01:39 For this lesson, you’ll need to pip install Matplotlib, marimo, and Polars. As always, you should use a virtual environment for package installation.

01:48 For ease of reading, on the screen here, I’ve split the bash command up into multiple lines. If you’re typing it all on one line, you won’t need the backslashes.

01:56 Those are the bash shell’s continuation character. Now off to the Python REPL for a quick intro to Polars, in case you haven’t used it before.

02:07 Polars is a DataFrame library, which means it is kind of like a spreadsheet with rows and columns. In case you need a review on how it works, let me show you the code you’ll need in the dashboard.

02:17 You start by importing it. Convention is to alias the import as pl, similar to how pd is used for pandas and np for NumPy. Data science folks like their abbreviations.

02:29 The read_csv() call reads data from a CSV file into a DataFrame.

02:37 The resulting DataFrame, that’s df for short (I told you abbreviations), contains our quasar’s spectrum. Let’s look at the data. When you evaluate a DataFrame in the REPL, Polars shows you a bit of information.

02:51 The shape is the size of it. Our DataFrame has 7,781 rows and two columns. The first column is named wavelength and the second, flux. The f64 is short for 64-bit float, which is Python’s floating-point number.

03:09 The summary here shows the first five and the last five rows of our data. You can reference a single column through square bracket operators.

03:21 The graphing call that I’ll be using in a minute expects two columns rather than a single DataFrame, so this kind of notation is how I’ll pass the columns to Matplotlib.

03:33 Speaking of which, this is some Matplotlib code. This is the exact code I’ll be putting in the marimo dashboard, but I put it in a file for now so that I could show it to you.

03:43 To start with, I need Polars and convention is to alias the graphing part of Matplotlib as plt, short for plot. Yep. Another abbreviation. Here, I’m reading the CSV file just like I showed you before in the REPL, and this is the start of the code that creates the graph.

04:03 Matplotlib is a little weird. It just assumes you’ll only create one graph, so you just start using plt functions and it automatically creates a graphing object behind the scenes.

04:13 I don’t particularly like this approach and there are other ways of doing it, but this is the most common way, so I’m sticking with convention. This first call specifies the creation of a figure to put the graph inside of. The tuples specify the size.

04:29 My American friends will be happy to know this size is in inches, although it doesn’t really matter as marimo will resize it to the browser’s width.

04:38 This call sets the title of the graph. This long ugly value is the ID of the quasar in the Sparcl database. I think I’ll take a page out of the data science book and shorten the name a little bit.

04:50 I’m gonna call it Serena. This sets the X-axis label. The paired dollar signs tell Matplotlib to use LaTeX math format. The double slash A is the sign for angstroms.

05:04 This is the Y-axis label. The units of flux are rather messy, and this is another LaTeX math value.

05:12 The x and ylim calls specify the data boundaries of the graph, and finally, this is where the actual graphing of Serena happens.

05:22 The plot() call creates a line graph. It takes two arguments: the set of X values and the set of Y values. I’m getting those from the individual columns, from the Polars DataFrame.

05:35 If you’re running this file as a script, the show() method will launch a viewer so you can see your graph. Inside marimo, this bit won’t be necessary.

05:44 Alright, now you’ve seen the code. Let’s create a marimo notebook and paste this into it.

05:50 When you pip install marimo, you got the marimo command with it. You use this command to create and edit notebooks. Let’s do that now. Creating one called dashboard.py.

06:06 Two quick things. First, note the name of our dashboard is a Python file. This is one of the reasons I prefer marimo to other notebook tools. The data file is a valid Python program.

06:18 This solves a lot of the problems found in other tools where they use something like JSON and that can be problematic to merge with other files. The marimo edit command starts a web server and shows you its URL here.

06:33 That URL is where you interact with the notebook. See the --headless argument? Well, you might want to leave that out. With it, you have to copy the URL manually into a browser. Without it, marimo will launch your default browser with this URL for you.

06:50 As I do a lot of web dev, I tend to have several browsers open at a time, so I use headless mode so I can make sure I open it in the browser of my choosing.

07:00 Speaking of which, let me go launch that browser.

07:04 Here I am in my browser. Let me paste that URL, and this is the notebook. Most things require the marimo module imported, and the first cell here shows that as default, so let me accept that.

07:20 And run the cell by pushing the play button.

07:25 Notebook cells can be Python or Markdown. Let’s start with a title. So I’m going to use a Markdown cell. Click the Markdown button,

07:38 and there you go, a pretty title. Now I’ll add a Python cell, and inside here I’m going to paste the graphing code I showed you earlier. Click the play button, takes it a second, and there you go.

07:53 You’ve got a graph. If you’re coding along with me, feel free to play around in here a bit. Make some edits to the graph. For example, change the boundaries or the titles.

08:04 Remember, once you’ve made a change, you need to hit the play button again to refresh the output. Walk around a little, get comfortable with it. You’re going to be playing with this dashboard for the rest of this course.

08:16 The next two lessons are about how I got the CSV file, you just graphed. Let me show you how to look things up in space.

Become a Member to join the conversation.