Locked learning resources

Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Locked learning resources

This lesson is for members only. Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

NumPy and Pandas

00:00 In the previous lesson, I showed you some common coding cases of len(). In this lesson, I’ll show you how to use two third-party libraries and how they use len().

00:11 NumPy is a popular scientific calculation library for Python. It is written using C-extensions, meaning the code is quite performant. This library does all sorts of mathy collection stuff, including multi-dimensional arrays and vectors.

00:25 It can also help you with your calculations, having features for linear algebra, Fourier transforms, and many of the other things that still haunt my nightmares from engineering school. NumPy is a third party library, and so you’ll need to use pip to install it. As always with this kind of stuff, it’s best practice to use a virtualenv to do so. Let’s take a look at NumPy and the len() function. First off, I’ll start with a single dimensional array.

00:52 That’s a very list like thing. I’ll import NumPy as NP as that’s shorter to type. Then I’ll create a NumPy array from a list.

01:12 There it is. And to be more specific, you can see its type. It’s a NumPy array. And what would this course online be without … and that’s kind of expected.

01:29 Let’s take it up another notch and add a second dimension.

01:42 Note the list of lists here. Creating the 2D array from it …

01:51 and here it is … and then the length. You might be thinking, “Great! That’s the dimension size.” But it isn’t. And I’ll show you in a second when I add a third dimension.

02:04 NumPy arrays have a property called .shape that shows you the lengths of the things inside of them. To get the number of dimensions, you do len() on the .shape property.

02:19 shape is like length for each dimension, and sinse it returns a tuple, the length of that tuple is the number of dimensions. You can get at the same thing through the .ndim property.

02:33 I believe I promised a third dimension. Put on your red and blue glasses, and get ready for a shark. Wow. Is that a dated reference? You see, back in my day—You know what?

02:44 Never mind. Google “3D Jaws” and figure it out for yourself. Where was I? Oh, right. Three dimensions.

03:02 List of lists of lists this time … and the NumPy array …

03:14 and with len() same result as with the 2D. What this is doing is returning the length of the first dimension, which in both the 2D and 3D examples was two. Using .shape again, you can see the three dimensions, and .ndim, or the length of the shape, and that gives you how many D your 3D is.

03:48 Another very common third-party library is Pandas. This one is for doing data crunching. It’s built on top of NumPy, so it is also quite speedy. Its key component is the DataFrame object, which is a dictionary on steroids.

04:02 Just a quick pip install into your virtualenv, and you’re ready to go. Let’s go back into the REPL. importing the fuzzy bear as a shorter PD creating the dictionary to populate a DataFrame …

04:29 creating the frame …

04:41 and there it is. Each list in the dictionary becomes a value in the row of the DataFrame. The index property specifies the name of the row. Looking at the data itself, you can see Neo does everything well, Cypher needs to stay after school because his loyalty grade is … got some work to do. And here’s what you came for.

05:05 Running len() on a frame returns the number of rows. In this case, three: Hacking, Kungfu, and Loyalty. Like NumPy, Panda’s DataFrame has a .shape property.

05:17 It shows the number of rows and columns as a tuple. Next up, you’ll see how the writers of NumPy and Pandas used the special __len__() method to get len() to work with their classes.

Avatar image for Dean Marsh

Dean Marsh on Nov. 24, 2024

Just curious about Numpy arrays when printed. Any reason why my output would lose the commas between the values? and why it doesn’t respond with the type being <class ‘numpy.ndarray’>?

import numpy as np
numbers2 = [
    [1,2,3,4,5],
    [6,7,8,9,0]
]
two_d = np.array(numbers2)
print(f"Two d: {two_d} is of type {two_d} with length {len(two_d)}\n")

Output:

Two d: [[1 2 3 4 5]
 [6 7 8 9 0]] is of type [[1 2 3 4 5]
 [6 7 8 9 0]] with length 2
Avatar image for Christopher Trudeau

Christopher Trudeau RP Team on Nov. 24, 2024

Hi Dean,

Two things are going on here. First, NumPy’s array type has a custom __str__ method, which rightly or wrongly, they’ve decided to format so that when you do print(two_d) it shows the array as a readable, lined up thing. This means that when you put it inside of an f-string like you have it looks weird.

>>> import numpy as np
>>> a = [ [1,2,3,4,5], [6,7,8,9,0] ]
>>> d = np.array(a)
>>> d
array([[1, 2, 3, 4, 5],
       [6, 7, 8, 9, 0]])
>>> str(d)
'[[1 2 3 4 5]\n [6 7 8 9 0]]'
>>> print(d)
[[1 2 3 4 5]
 [6 7 8 9 0]]

The second thing is an error in your f-string. You’ve done “type {two_d}” when you meant to do “type {type(two_d)}”. The type of an array is exactly what you had expected. Continued from the same REPL session above:

>>> type(d)
<class 'numpy.ndarray'>

Become a Member to join the conversation.