Reviewing NumPy Arrays
00:00 In the previous lesson, I gave an overview of the course. In this lesson, I’ll do a quick intro to NumPy arrays in case you need some review. The code in this course requires three different third-party libraries, and as such, you should be using a virtual environment.
00:14 If you’re new to virtual environments, I recommend you take a read of this tutorial to get you started before proceeding. The three libraries used in this course are NumPy, which is where I’ll be spending most of my time, then Matplotlib to do some charting, and one single but helpful use of the natsort library, which has an alternate way of sorting things like file names.
00:35
You can install all three using a single pip install
command inside of your activated virtual environment.
00:44 NumPy has its own data types that are in addition to the regular Python data types. The core concept that you’ll be using over and over in this course is the NumPy array.
00:53 This is a list-like structure, which is fixed in size. Once you’ve created it, that’s how big it’s going to be. There are ways of pretending to insert something, but what is really happening is a new version is being created and the old version is being copied into it.
01:09 For small items, this doesn’t matter, but for large amounts of data, there are speed and memory considerations to think about when doing this. Now, of course, that sounds like a drawback, but it comes with a really, really big upside.
01:21 NumPy is screamingly fast. If you need to crunch a lot of numbers, a NumPy array is going to outperform any equivalent list pretty much all the time. NumPy is built for doing number crunching, and as such, its array concept is homogenous.
01:37 That means it contains the same kind of data. This isn’t as restrictive as it might sound. You can have multiple columns and it is the column that has to be homogenous, so you can kind of think of it like a spreadsheet where each column in the sheet can only contain a single type.
01:52 This restriction is another reason for NumPy’s speed. Essentially, you’re trading off the greater flexibility of the Python list for high-grade performance.
02:01 Let’s go into the REPL and I’ll show you some NumPy arrays.
02:06
NumPy has a lot of stuff in it, and rather than import the whole thing into the namespace, the convention is to reference it directly. And as programmers are lazy and don’t want to keep typing numpy
, the import typically aliases it as np
, which I’ve done here.
02:23
The numpy
array is a class, even though its name is lowercase, and you can instantiate it directly. By
02:32 passing a list into the array like I’ve done here, I’ve instantiated it to be a single-dimension array with the values one, two, and three inside of it. Let’s take a look at it.
02:44 NumPy displays an array similar to how you construct it. Like with a list, you can use square brackets to access an item in an array.
02:52
Also like lists, the index is zero-based, so I’ve accessed the second item in the array here. Note that it shows a 64-bit integer with a value of 2
.
03:03 This is a NumPy type rather than a regular Python integer. NumPy numbers play fairly well with Python numbers. If you add a Python integer to a NumPy int, you’ll get the correct value as long as it fits inside the 64 bits used to store it.
03:18 Arrays can be multidimensional. Let’s create a two-dimensional structure.
03:24 Like before, I used the constructor, but this time I’m using a list of lists.
03:30 This is how you get the two dimensions.
03:34 So now I have two rows and three columns. This is what it looks like. Once more, NumPy uses the list-like notation to show you what is inside. I mentioned before that arrays have a fixed size.
03:48 If you need to incrementally add data, one approach is to start with an array with the correct dimensions filled with zeros, then overwrite the values that you need.
03:57
This is so common that NumPy provides an array factory called zeros
. The argument to zeros
is a shape value. This is a common mechanism in NumPy that indicates the size of dimensions.
04:11
A single integer like here means one-dimensional with that size, so in this case, three values in a single row. You can see the shape of an array object by accessing its .shape
attribute.
04:23
Our array named one
is one-dimensional with three values inside of it, and our array named two
is two-dimensional with two rows and three items in each row.
04:35
You can pass a shape tuple like this to the zeros
factory to create multi-dimensional arrays.
04:43
This has the same shape as our array named two
, but has a value of zero for each item. For more dimensions, simply add more items to the tuple.
04:55
This is a pre-populated three-dimensional array with lengths 3
, 2
, and 3
, respectively.
05:03 Now that you’ve reviewed NumPy arrays, next up I’ll dive into the actual example: Populating multidimensional arrays from multiple CSV files.
Become a Member to join the conversation.