Find Elements by HTML Class Name
00:00 You’ve identified the column that contains all the information, and now you want to drill down a bit more by actually finding the elements for the boxes that contain all of the job descriptions. We’re going to do that by HTML class name.
Yeah, those seem to actually be repeating.
"unifiedRow row result clickcard". This seems to be the appropriate classes that each of those job search results has. Now, that means that we can create a list of all of those entries, of all of these elements, and just pick it out of the
"resultsCol" that we have from before, instead of keeping all of the content that additionally has some information in there that we’re not interested in. So, what you can do is pass in, just for example, one class name. You can also pass more, but then they have to be in the exact order with spaces in between.
Now, in this case, I know they all have the
'result', in here—one of the class names. And that’s a pretty—for me—understandable one because I’m looking for results, so this is the one that I pick here. I’m going to say
.find_all(). So, this is a bit different than the
.find() one up here, which is going to only return one element.
.find_all() is going to return a list of elements, and I’m giving them the
class_='result'. And what I’m doing additionally here is I say—because classes in HTML can be applied to all sorts of elements, while ID is unique and can only be applied to one element.
So, here, this
.find() makes a lot of sense. A class could give me different results. And I don’t really have a complete understanding of this whole HTML in here, so I don’t know whether there is maybe some other HTML element somewhere down there that also has the class
'result'. That’s always possible.
So, I went to restrict it some more and I say, I want to find only
<div> HTML elements that have the class
'result'. So if there’s a paragraph (
<p>) somewhere, or emphasis tag (
<em>), or something like that with the same class name, it’s going to not be taken into this list but only
<div> elements with that specific
Which seems to me to be the things that we’re looking for. Now, another thing to note here is that you can see I’m calling the
.find_all() method on
results, which is one of the very, very useful things about Beautiful Soup is that each of the things that get returned from a call like this is another Beautiful Soup object, which means that you can call all of the useful methods on it again.
03:22 12, 13, 14, 15. Okay, perfect. So there’s 15 results on the page, and that seems to be what we collected with this. Let’s look at one of them. So, the first one—and you see, this is still a lot of HTML. Still not really very clear what’s going on here, but it seems like we identified correctly one of those job cards over here. As before, you can think of this now as whatever is nested in here—in your HTML structure—gets returned. And because it’s again a Beautiful Soup object, you can keep digging deeper and accessing different parts of this with the intuitive syntax that it has. Cool! Okay.
So, we have a list of Beautiful Soup elements that each contain one of those cards, that each contain the information that we’re looking for. So, we’re moving ahead and learned how to use the
class_ to filter for it.
One note here is also this underscore (
_) at the end. Because
class is a reserved Python keyword, then if you want to search for the class, you have to put this
class_ underscored. It’s just the syntax that Beautiful Soup uses. Okay! In the next lesson, we’re going to see how to get from this result, which is still a lot of garble HTML text, to actually get to a specific text that we’re interested in—so, something that we can read and save.
Become a Member to join the conversation.