Data Types

Using NumPy's np.arange() Effectively Liam Pulsifer 04:59

Transcript
Discussion (1)

00:00 In this lesson, I’m going to take you through how to use different data types with arange(). So, moving into the terminal. And remember, you always have to import numpy as np—or, you don’t need to do that, but you need to import numpy, for sure in order to use arange().

00:15 As I mentioned in the last video, if you want to use arange(), you have the option for several parameters—start, stop, and step—and then you also have one called dtype.

00:25 If I just do the most simple possible arange(10), and let’s assign that to the variable x real quick just to make it a little easier.

00:36 You can take a look at the .dtype variable of the x array, and you can see here that it’s an 'int64'. What that means is that we’re representing integers, which makes sense, because I only specified an integer as the stopping value, so there’s no reason to assume that I need floating-point values, and each value in here can be perfectly represented as an integer.

01:00 Now, the 64 part means that this has a size of 64 bits. And you might say—oh, I’m sorry. I need to say x.itemsize, not x.size. x.size is the number of elements. Oh, you might say “Oh, wow, that’s just 8. That’s weird. Why is it 8 with 64?” Well, the .itemsize is in bytes, and the designation is in bits.

01:21 So, 8 bytes—there’s 8 bits to a byte, so 8 * 8 = 64, no big deal. As you can see, this has a data type of an 'int64', which is different than a traditional Python int.

01:34 The reason for that is that NumPy, np, has many different types. I’m going to move over to show them to you. It has many different types which designate the size of integers, and all of these have floating-point analogs as well.

01:49 And so there are many different cases where you might want to use smaller than the maximum possible number of bits to represent a number. For example, if you’re doing picture representation, you don’t often need more than 255 possible colors—red, green, blue values—because the RGB color system doesn’t actually need any of that.

02:10 You can represent all of the colors in, say, a JPEG image with just 8-bit integers to represent each pixel, rather than a whole 64 bits that you’re not going to use. So it saves space that way. Many libraries like TensorFlow, which is a popular Python machine learning library you might’ve heard of, use 32-bit integers and floats just to save space because they use such massive amounts of data in their computations.

02:35 Convenient to know, for sure. When you use the arange() function, normally if you don’t specify the data type, then NumPy will try to infer from the data points that you give it what it should do.

02:49 I’m just choosing some random numbers. As you’ll notice, my start and my step are both floats, so I don’t think it’s too hard to guess what the .dtype of the result is.

03:02 It’s a float, and it’s a 'float64' because we didn’t specify to make it any smaller. So it assumes that we need all of the precision that we have there.

03:11 But if you want to do something a little different and you want to generate a range of, for example, color values—maybe you want to use int8.

03:19 You could do 1 to 10 with a dtype of np.int8.

03:27 So these are only 8-bit integers, but in this case it works perfectly.

03:34 The issue is sometimes if you’re not careful with what you’re doing, you might say 1000 to 1010, and then you might try to use an int8.

03:42 Well, you’ll notice this isn’t anything close to a 1000 or 1010. There are these -24 to -15. That’s because of just how when numbers are stored in 8 bits, if you try to represent something much bigger, what you’ll actually be doing is just essentially going in a loop around the number system.

03:59 And this is something that you’ll have to read a little bit more about how binary number representation works in order to understand exactly why these particular numbers are generated, but just trust me when I say that you only want to specify a smaller size than you need—or a smaller size than the default—when you are sure that that’s what you actually need. So in this case, what I just did was a very silly thing, because clearly my start and my stop are both much bigger than what can be represented by an int8. But if I did int32 here, I wouldn’t actually have any issues. So, not a big deal.

04:34 And certainly, if I didn’t specify a dtype at all, my 64-bit integers, as you’ll remember from this slide, can represent all the way up to 2**63-1, which is a gargantuan number.

04:47 So, unlikely that you’ll overflow that in most applications. Those are data types in the numpy.arange() function. I hope you found this useful.

Chris James on May 24, 2020

A quick mention that integers in Python are different to other computer languages in that there is no upper bound other than the memory on your computer, it’s the math idea of what ‘integer’ means, rather than the C approximation. Mostly because its a really cool feature of Python which is easy to miss.

Become a Member to join the conversation.