Changing Array Sizes
00:00 In the previous lesson, I showed you how to construct a three-dimensional array from three separate CSV files. In this lesson, I’ll continue on in that theme, but this time make modifications to the array structure.
00:12
If you think back to our three CSV files in the previous lesson, what would have happened if one of those files was missing a row? Since you used the zeros
factory to create a default array populated with zeros, any missing data would simply be zeros.
00:27 Now, in some cases, that can be problematic as zero is also a number and it can mess with your calculations, but in a lot of cases, it’s better to have something than nothing, as nothing often causes your code to fall over.
00:40
NumPy arrays are fixed in size, but there are helper functions that do array copies making it act like they can be changed. The insert
function allows you to modify the shape of an array, except that’s a lie.
00:53 It isn’t really modifying it. It’s creating a brand new array with the new dimensions and using the old one to populate it. As long as you ignore the computation overhead of that operation, you can think of it as a subtle difference.
01:05 With large amounts of data, you’re creating two copies though, and it’s something you want to keep in mind. Let’s take our three existing array files from the previous lesson and add some content.
01:18
I’ve got a new file called long_file.csv
, which has one more row than our other files. To incorporate this new data, you need to change the shape of the array and read in the new values.
01:29
Note that the shape change means a need for default values for the existing slices that are being modified. Off to the REPL to play with long_file.csv
.
01:43 Bit of deja, and some vu to go with it.
01:50
And like before, I’m starting out with a zeroed array, but this time the shape is different, noting the id
of the result.
02:04 And this code reads the three original CSV files in.
02:17 And this is the result. Note what I’ve done here. I’m already ready for a fourth slice, but I’m not ready for the fact that the size of the slices will be different.
02:27
That comes next. To prep for the new content, I’ll use np.insert()
.
02:39
The arr
argument here is short for array, it’s not a pirate thing. The obj
argument states what slice will be changing. Using 2
means each of our four slices will have an additional row.
02:52
The values
argument says what to use as the new default, and the axis
argument states which axis to insert the arguments along. And there’s the result.
03:04
Each slice now has an additional row each filled with zeros. Now, I can read in long_file.csv
.
03:18
I’ve read that into index three, which is of course the fourth slice, and this is the result. Note that this result, although still in our variable named array
, has a different id
than before.
03:34
The insert()
function returned a brand new array, which I then stored over our existing one.
03:42
The way I used the glob
call to load the files made some assumptions about the order of files. That can be problematic, so in the next lesson, I’ll show you what to do about that.
Become a Member to join the conversation.