Locked learning resources

Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Locked learning resources

This lesson is for members only. Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Analyzing Hierarchical Data

00:00 In the previous lesson, I completed the example on structured arrays by showing you how to deal with duplicates. This lesson is the beginning of the third example, which examines the use of hierarchical data and how to chart it.

00:13 As the name implies, hierarchical data consists of levels. A common example of this kind of data occurs in the financial sector. A total revenue line in an account system might be made up of multiple currencies from multiple countries in multiple sectors.

00:28 Being able to drill down the data in different ways can be useful. In fact, there are specialized databases called OLAP databases that are for exactly this kind of work.

00:38 You can use structured arrays in NumPy to handle hierarchical data. I’m going to ignore my “Don’t use floats for money” advice once more and consider a stock portfolio. Inside the portfolio, I’ll have six companies across three categories, price data for each company for the last five days of the week, and all that data is going to be in different CSV files. To collate it all together, I’ll start with a zeroed array similar to our first example and then populate the data over top of it.

01:10 Like before, the data shown on the screen here is available in the supplementary materials dropdown. The base CSV file with the company information is called companies.csv and there are five additional price files, one for each of the five business days in the week.

01:26 Let’s go to the REPL and play with this data. I’ve imported NumPy and the Path object and now I’ve declared an array containing the days of the week for our data.

01:46 This comprehension creates a data type list pairing each of the days of the week with the f8 specifier, meaning a 64-bit float. And now for the data type list for the company data.

02:06 Just two columns here, a company name and its corresponding sector. The company data and the price data get combined into a portfolio array.

02:23 This code creates a NumPy dtype object. Up until now, you’ve seen this being created by passing a list of specifiers into the array constructor, but internally NumPy is actually using that list to create a dtype object.

02:37 Here, I’ve created one directly.

02:43 The result looks just like you’ve expected from before. Now I’ll create a portfolio array that’s been zeroed out.

02:56 And here’s the result, and there you go, a structured array filled with empty strings and zeros. Now I’ll load the company CSV information.

03:20 Normally, the loadtxt() call with this kind of CSV file would create a two-dimensional array, but as the company-sector pair are going to go inside a row of our existing portfolio, you want the data to be in a different shape.

03:33 The reshape() call changes the form. Rather than a list-of-lists style mechanism like you’ve seen before, the reshape() call here gives you a simple list of tuples.

03:46 Now I can stick this information in the portfolio. I did that by slicing the portfolio for the company name and sector columns and then assigning the data.

04:02 And there’s our updated portfolio. The next step is to load the price data from each of the weekday files.

04:16 This is the dtype for the price information,

04:27 and this is me getting ready to load each of the price files. The built-in zip() function in Python takes two lists and pairs the first items, the second items, etc, from each of the two lists together, kind of like teeth in a zipper.

04:41 Here I’ve taken the days of the week and zipped it with the five price files, which I’ve grabbed as a pattern glob. The end result is a zip object, which is iterable.

04:53 I’ll give you a peek inside. By converting the zip object into a list, I’ve iterated over the zip object, which for zip objects consumes the zip object. This is because the zip works like a generator, so it’s a one-time deal, which means now I have to do it again, so I can actually use it. A little repetitive for sure, but I wanted you to see what was inside.

05:16 Let’s use this in a for loop and load the corresponding data in the portfolio array.

05:26 I’m iterating over the zip object, getting each day and CSV path object out,

05:44 and then I replace the column with the data from the file. Note the use of the second set of square brackets on the end. The price data has both a company name and a price in it.

05:55 I only want the price column. Alternatively, this could have been done with recjoin() on the company name. There’s many ways of skinning cats in NumPy.

06:03 Horrible idiom, that. Anyhow, now the data is loaded. Let’s look at it. And there you go, a full portfolio. You’ll recall from the structured array examples, I can slice and dice information out of the array.

06:27 This is just the prices for companies in the tech sector on Friday. Let’s say I have 250 shares of each stock in the tech sector. I could determine the value of the tech stocks in my portfolio with a little math.

06:49 This is one of the wonders of NumPy. Do math on an array and you get a new array containing the result of the same operation applied to each item in the original array.

07:00 You can then get the total value of tech in your portfolio with the sum() function.

07:13 When you pass a NumPy array to sum(), it acts similar to a list, giving each value in the array to sum() to get a total. Note that once again, you’re getting NumPy data types, which if you needed, could be converted to Python native ones.

07:28 If you’re coding along with me, don’t close your REPL. In the next lesson, I’m going to use this same data. In the next lesson, I’ll show you how little code it takes to create a graph from NumPy data.

Become a Member to join the conversation.