Searching the Parse Tree
00:00
In this lesson, you’ll learn how you can search the parse tree by using some convenient methods on a Beautiful Soup object. The most commonly used methods are .find()
and .find_all()
.
00:12
Let’s try out how they work. I can say soup.find()
and pass it a tag name. Let’s keep working with the images. So if I say soup.find()
and pass it in the name of a tag image, then I’ll get back the first image HTML element in the parse tree.
00:30
Okay. So, but again, this is not different from doing just soup.img
, which is the more convenient syntax. So in this case, .find()
doesn’t help you much, but you want to get the second image element.
00:43
And for this you can use .find_all()
. You can say soup
.find_all()
And then again, just pass the name of the tag. And this gives you back a list of all of the image elements that are in the parse tree.
00:56 In this case, the one that you’re interested in is number two, the grapes. And you could access them by just using
01:05 the indexing on the list. So you want to get the second element at index one.
01:11 And this is a way that you could navigate to the second image element in your parse tree.
01:18
Now, there is more convenient ways of doing that as well. You could filter for a specific attribute. So soup, let’s stick with soup
.find()
which returns one element, right?
01:29
But you don’t want to get the first one so you pass it some more information. You say, I want to get the element that has as the src
attribute the value static/
grapes.png
.
01:43
So I’m going to paste that in here. And now it’s going to pick the first image element that has /static/grapes.png
as its value to the src
attribute, which in this case there’s only one and it’ll give us back that second image element. You can do the same for a .find_all()
as well.
02:05
You can also filter in that same way, and in this case it’ll just give us back a list containing one element. But if there were more images that have that same value for the src
attribute, then you’d get back all of them when you use .find_all()
. If you’re familiar with CSS selectors, there’s another option that you can use for searching the parse tree and that’s the .select()
method.
02:30
So you can say soup.select()
,
02:33
and let’s start off by just passing it a tag name again. In this case, you can see it gives you back a list of all the image elements so while just .find()
gives you only one, .select()
gives you all of them, right?
02:47
But it also has a companion method that’s soup.select_one()
that’ll give you, again, back the first one if you don’t do any sort any more filtering.
02:58 So here you get Dionysus’ profile picture back again.
03:02
Now for .select()
and .select_one()
, you can pass CSS selectors. So if you’re familiar with them, then you can specify quite precisely what’s the element that you’re looking for.
03:12
You can say .select_one()
, and then img:nth-of-type(
and then pass in number two because you want to get the second image element.
03:27
And you can see that this directly gives you back the grapes image. There’s more that you can do with CSS selectors but using .select()
and .select_one()
, you can pass in CSS selectors to very precisely select the elements that you’re interested in.
03:44 Alright, so this is what you can use to search the parse tree. And now that you know how to search it, let’s take a look in the next lesson, how you can modify the parse tree.
Become a Member to join the conversation.