Locked learning resources

Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Locked learning resources

This lesson is for members only. Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Graphing Hierarchical Data

00:00 In the previous lesson, I showed you how to populate an array based on a set of CSV files containing hierarchical data. In this lesson, I’ll show you how to use Matplotlib to graph that data.

00:11 Matplotlib is probably the most popular Python library for creating graphs. It integrates nicely with NumPy. Then you can use NumPy arrays as the source of data in charts.

00:23 This goes back to the rows and columns in an array being like spreadsheet data and just like spreadsheet data, you can graph it. Let me show you how to do that with just a few lines of Python.

00:34 I’m in the same REPL session as the previous lesson where I created the portfolio array. The first thing I want to do is peel out just the tech stocks. You’ve seen me slice this data before, but this time instead, I’m going to create a mask.

00:53 In earlier lessons, I ran this kind of filter inside of square brackets to get the data out. The filter on its own creates a mask array, and instead of referencing it inside square brackets, I can store it.

01:11 And then like before, I used the mask to slice. In previous lessons, I did this all in one step, but this time I took the mask first and then applied it so you could better see how this works.

01:24 The mask itself is an array of Booleans where the value for sector is technology. While the sliced array based on the mask contains the two company names that belong in that sector.

01:37 Like in the previous lesson, I can get other data out using the mask.

01:49 tech_value is the value of the two tech stocks where I have 250 shares in each. Now let’s graph the value of the two tech stocks.

02:02 Like with NumPy, it’s standard practice to alias Matplotlib. Matplotlib is a big library, and the pyplot module is the one with the graphing stuff in it that I want.

02:13 The usual alias for this module is plt. Matplotlib is a little bit of a funky library. It has two modes, one where you can create graph objects and one where you just start plotting stuff.

02:26 In the second case, each call to the pyplot module simply builds on top of a default graph object built into the module. I find this a very bizarre design decision, but that’s the way it is and that’s the way most programmers use it, so I’m going to hold my nose and use this approach.

02:50 The bar() function creates a bar graph. The x argument is for the x-axis data, and the height argument is for the y-axis data.

02:59 The bar() function returns a tuple, and I want to chain it so I’m accessing the graph with index zero on the end here. And then I chain the set_color() function, which changes the color of the first column to green.

03:14 At this point, I could show the graph, but let me just add some niceties.

03:22 That’s a label for the x-axis. You can ignore what it returns on the REPL. It does set the value of the default graph.

03:31 A y-label,

03:36 and a title. Now I’ll call show() to see the result. Magic! When you call show(), a new window pops up with the resulting graph inside of it.

03:48 Here you can see the values of our two stocks. Although there is over a $3,000 difference on the scale of the values being 28 and 24,000 respectively, the heights look pretty close to each other.

04:02 And that’s it for our third practical NumPy example. Next up, I’ll start the fourth: how to write a custom vectorization function.

Become a Member to join the conversation.