Loading video player…

Graphing the Quasar Spectrum

Resource mentioned in this lesson: Astro Data Lab

“Serena” Quasar’s sparcl ID: 719e7ea6-8b79-11ef-be93-525400f334e1

00:00 In the previous lesson, I explained redshift and introduced you to the massive stellar objects known as quasars. In this lesson, I’ll show you how to take a quasar’s spectral data and graph it in a marimo notebook.

00:13 You’ve spent two lessons on science stuff, so how about a quick reminder on our goal? The intent is to create a dashboard where you can interactively overlay spectral lines and change their redshift to match a quasar’s data.

00:26 The data I’m using comes from the Astro Data Lab, which can be found here. The Astro Data Lab has several different databases. The one I’m interested in is known as Sparcl, and has its own Python client for accessing the data.

00:42 To build the dashboard, you’ll need to find a quasar to visualize, fetch its corresponding spectral data. Once you’ve got that, create a marimo notebook to graph the data, and you’ll also need spectral line data.

00:55 You take that, overlay it on the graph, and once all that’s done, you’ll add the interactive parts to the notebook, allowing you to adjust the redshift values for the spectral lines.

01:06 All that is a lot. It’ll take us several lessons to get there.

01:11 This lesson focuses on graphing the spectral data. As such, I’m going to use a CSV file with the data in it rather than fetching it from scratch. The CSV in question is called flux_data.csv and is available in the supporting materials dropdown.

01:27 I’ll use Polars to read the CSV file and then Matplotlib to graph it. All of this will go in a marimo notebook for visualization. In the overview lesson, I told you what libraries are needed for this course.

01:39 For this lesson, you’ll need to pip install Matplotlib, marimo, and Polars. As always, you should use a virtual environment for package installation.

01:48 For ease of reading, on the screen here, I’ve split the bash command up into multiple lines. If you’re typing it all on one line, you won’t need the backslashes.

01:56 Those are the bash shell’s continuation character. Now off to the Python REPL for a quick intro to Polars, in case you haven’t used it before.

02:07 Polars is a DataFrame library, which means it is kind of like a spreadsheet with rows and columns. In case you need a review on how it works, let me show you the code you’ll need in the dashboard.

02:17 You start by importing it. Convention is to alias the import as pl, similar to how pd is used for pandas and np for NumPy. Data science folks like their abbreviations.

02:29 The read_csv() call reads data from a CSV file into a DataFrame.

02:37 The resulting DataFrame, that’s df for short (I told you abbreviations), contains our quasar’s spectrum. Let’s look at the data. When you evaluate a DataFrame in the REPL, Polars shows you a bit of information.

02:51 The shape is the size of it. Our DataFrame has 7,781 rows and two columns. The first column is named wavelength and the second, flux. The f64 is short for 64-bit float, which is Python’s floating-point number.

03:09 The summary here shows the first five and the last five rows of our data. You can reference a single column through square bracket operators.

03:21 The graphing call that I’ll be using in a minute expects two columns rather than a single DataFrame, so this kind of notation is how I’ll pass the columns to Matplotlib.

03:33 Speaking of which, this is some Matplotlib code. This is the exact code I’ll be putting in the marimo dashboard, but I put it in a file for now so that I could show it to you.

03:43 To start with, I need Polars and convention is to alias the graphing part of Matplotlib as plt, short for plot. Yep. Another abbreviation. Here, I’m reading the CSV file just like I showed you before in the REPL, and this is the start of the code that creates the graph.

04:03 Matplotlib is a little weird. It just assumes you’ll only create one graph, so you just start using plt functions and it automatically creates a graphing object behind the scenes.

04:13 I don’t particularly like this approach and there are other ways of doing it, but this is the most common way, so I’m sticking with convention. This first call specifies the creation of a figure to put the graph inside of. The tuples specify the size.

04:29 My American friends will be happy to know this size is in inches, although it doesn’t really matter as marimo will resize it to the browser’s width.

04:38 This call sets the title of the graph. This long ugly value is the ID of the quasar in the Sparcl database. I think I’ll take a page out of the data science book and shorten the name a little bit.

04:50 I’m gonna call it Serena. This sets the X-axis label. The paired dollar signs tell Matplotlib to use LaTeX math format. The double slash A is the sign for angstroms.

05:04 This is the Y-axis label. The units of flux are rather messy, and this is another LaTeX math value.

05:12 The x and ylim calls specify the data boundaries of the graph, and finally, this is where the actual graphing of Serena happens.

05:22 The plot() call creates a line graph. It takes two arguments: the set of X values and the set of Y values. I’m getting those from the individual columns, from the Polars DataFrame.

05:35 If you’re running this file as a script, the show() method will launch a viewer so you can see your graph. Inside marimo, this bit won’t be necessary.

05:44 Alright, now you’ve seen the code. Let’s create a marimo notebook and paste this into it.

05:50 When you pip install marimo, you got the marimo command with it. You use this command to create and edit notebooks. Let’s do that now. Creating one called dashboard.py.

06:06 Two quick things. First, note the name of our dashboard is a Python file. This is one of the reasons I prefer marimo to other notebook tools. The data file is a valid Python program.

06:18 This solves a lot of the problems found in other tools where they use something like JSON and that can be problematic to merge with other files. The marimo edit command starts a web server and shows you its URL here.

06:33 That URL is where you interact with the notebook. See the --headless argument? Well, you might want to leave that out. With it, you have to copy the URL manually into a browser. Without it, marimo will launch your default browser with this URL for you.

06:50 As I do a lot of web dev, I tend to have several browsers open at a time, so I use headless mode so I can make sure I open it in the browser of my choosing.

07:00 Speaking of which, let me go launch that browser.

07:04 Here I am in my browser. Let me paste that URL, and this is the notebook. Most things require the marimo module imported, and the first cell here shows that as default, so let me accept that.

07:20 And run the cell by pushing the play button.

07:25 Notebook cells can be Python or Markdown. Let’s start with a title. So I’m going to use a Markdown cell. Click the Markdown button,

07:38 and there you go, a pretty title. Now I’ll add a Python cell, and inside here I’m going to paste the graphing code I showed you earlier. Click the play button, takes it a second, and there you go.

07:53 You’ve got a graph. If you’re coding along with me, feel free to play around in here a bit. Make some edits to the graph. For example, change the boundaries or the titles.

08:04 Remember, once you’ve made a change, you need to hit the play button again to refresh the output. Walk around a little, get comfortable with it. You’re going to be playing with this dashboard for the rest of this course.

08:16 The next two lessons are about how I got the CSV file, you just graphed. Let me show you how to look things up in space.

Avatar image for leawood4

leawood4 on Oct. 25, 2025

I am using a Conda environment on Windows 11, and I am having difficulty installing the Sparcl client, see below for the errors.

Collecting numpy<1.26.4,>=1.23.5 (from sparclclient)
  Using cached numpy-1.26.3.tar.gz (15.7 MB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Installing backend dependencies ... done
  Preparing metadata (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Preparing metadata (pyproject.toml) did not run successfully.
   exit code: 1
  ╰─> [21 lines of output]
      + C:\Users\leawo\anaconda3\envs\real_python\python.exe C:\Users\leawo\AppData\Local\Temp\pip-install-bwier1ua\numpy_3228e667f5974b1db5c7924907e014f2\vendored-meson\meson\meson.py setup C:\Users\leawo\AppData\Local\Temp\pip-install-bwier1ua\numpy_3228e667f5974b1db5c7924907e014f2 C:\Users\leawo\AppData\Local\Temp\pip-install-bwier1ua\numpy_3228e667f5974b1db5c7924907e014f2\.mesonpy-0ytv7iu9\build -Dbuildtype=release -Db_ndebug=if-release -Db_vscrt=md --native-file=C:\Users\leawo\AppData\Local\Temp\pip-install-bwier1ua\numpy_3228e667f5974b1db5c7924907e014f2\.mesonpy-0ytv7iu9\build\meson-python-native-file.ini
      The Meson build system
      Version: 1.2.99
      Source dir: C:\Users\leawo\AppData\Local\Temp\pip-install-bwier1ua\numpy_3228e667f5974b1db5c7924907e014f2
      Build dir: C:\Users\leawo\AppData\Local\Temp\pip-install-bwier1ua\numpy_3228e667f5974b1db5c7924907e014f2\.mesonpy-0ytv7iu9\build
      Build type: native build
      Project name: NumPy
      Project version: 1.26.3
      WARNING: Failed to activate VS environment: Could not parse vswhere.exe output

      ..\..\meson.build:1:0: ERROR: Unknown compiler(s): [['icl'], ['cl'], ['cc'], ['gcc'], ['clang'], ['clang-cl'], ['pgcc']]
      The following exception(s) were encountered:
      Running `icl ""` gave "[WinError 2] The system cannot find the file specified"
      Running `cl /?` gave "[WinError 2] The system cannot find the file specified"
      Running `cc --version` gave "[WinError 2] The system cannot find the file specified"
      Running `gcc --version` gave "[WinError 2] The system cannot find the file specified"
      Running `clang --version` gave "[WinError 2] The system cannot find the file specified"
      Running `clang-cl /?` gave "[WinError 2] The system cannot find the file specified"
      Running `pgcc --version` gave "[WinError 2] The system cannot find the file specified"

      A full log can be found at C:\Users\leawo\AppData\Local\Temp\pip-install-bwier1ua\numpy_3228e667f5974b1db5c7924907e014f2\.mesonpy-0ytv7iu9\build\meson-logs\meson-log.txt
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
(real_python) PS C:\Users\leawo>

Chris Copeland

Avatar image for Christopher Trudeau

Christopher Trudeau RP Team on Oct. 25, 2025

Hi Chris,

I’m neither a Conda nor Windows guy, so I’m just taking a best guess. A similar error is posted on StackOverflow indicating a missing C++ compiler:

stackoverflow.com/questions/77606062/could-not-parse-vswhere-exe-output-what-am-i-doing-wrong

That said, the version of NumPy that Sparcl wants is very old, so it is possible that it won’t work well with Conda. This is why we included the CSV files: scientific libraries like this are notorious for being “runs on my machine”.

My colleague did install it fine on their own Windows box, but they weren’t using Conda so they had full control over the version installed.

I don’t know Conda at all, is there a way of getting a pre-built older package? The error appears to be in their installer attempting to build it.

Hope my guessing points you in the right direction. If you’re still stuck, reply back and I’ll post internally to see if someone here knows Conda.

Avatar image for leawood4

leawood4 on Oct. 26, 2025

Chris,

Thanks for the advice. I was leaning towrds the wrong version on numpy. I uninstalled it and them sparkle installed correctly using a much older version. I appreciate you prompt response to this query.

Chris Copeland

Become a Member to join the conversation.