Locked learning resources

Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Locked learning resources

This lesson is for members only. Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Searching the Parse Tree

00:00 In this lesson, you’ll learn how you can search the parse tree by using some convenient methods on a Beautiful Soup object. The most commonly used methods are .find() and .find_all().

00:12 Let’s try out how they work. I can say soup.find() and pass it a tag name. Let’s keep working with the images. So if I say soup.find() and pass it in the name of a tag image, then I’ll get back the first image HTML element in the parse tree.

00:30 Okay. So, but again, this is not different from doing just soup.img, which is the more convenient syntax. So in this case, .find() doesn’t help you much, but you want to get the second image element.

00:43 And for this you can use .find_all(). You can say soup .find_all() And then again, just pass the name of the tag. And this gives you back a list of all of the image elements that are in the parse tree.

00:56 In this case, the one that you’re interested in is number two, the grapes. And you could access them by just using

01:05 the indexing on the list. So you want to get the second element at index one.

01:11 And this is a way that you could navigate to the second image element in your parse tree.

01:18 Now, there is more convenient ways of doing that as well. You could filter for a specific attribute. So soup, let’s stick with soup .find() which returns one element, right?

01:29 But you don’t want to get the first one so you pass it some more information. You say, I want to get the element that has as the src attribute the value static/ grapes.png.

01:43 So I’m going to paste that in here. And now it’s going to pick the first image element that has /static/grapes.png as its value to the src attribute, which in this case there’s only one and it’ll give us back that second image element. You can do the same for a .find_all() as well.

02:05 You can also filter in that same way, and in this case it’ll just give us back a list containing one element. But if there were more images that have that same value for the src attribute, then you’d get back all of them when you use .find_all(). If you’re familiar with CSS selectors, there’s another option that you can use for searching the parse tree and that’s the .select() method.

02:30 So you can say soup.select(),

02:33 and let’s start off by just passing it a tag name again. In this case, you can see it gives you back a list of all the image elements so while just .find() gives you only one, .select() gives you all of them, right?

02:47 But it also has a companion method that’s soup.select_one() that’ll give you, again, back the first one if you don’t do any sort any more filtering.

02:58 So here you get Dionysus’ profile picture back again.

03:02 Now for .select() and .select_one(), you can pass CSS selectors. So if you’re familiar with them, then you can specify quite precisely what’s the element that you’re looking for.

03:12 You can say .select_one(), and then img:nth-of-type( and then pass in number two because you want to get the second image element.

03:27 And you can see that this directly gives you back the grapes image. There’s more that you can do with CSS selectors but using .select() and .select_one(), you can pass in CSS selectors to very precisely select the elements that you’re interested in.

03:44 Alright, so this is what you can use to search the parse tree. And now that you know how to search it, let’s take a look in the next lesson, how you can modify the parse tree.

Become a Member to join the conversation.