Join us and get access to hundreds of tutorials and a community of expert Pythonistas.

Unlock This Lesson

This lesson is for members only. Join us and get access to hundreds of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Hint: You can adjust the default video playback speed in your account settings.
Hint: You can set the default subtitles language in your account settings.
Sorry! Looks like there’s an issue with video playback 🙁 This might be due to a temporary outage or because of a configuration issue with your browser. Please see our video player troubleshooting guide to resolve the issue.

Find Elements by HTML Class Name

00:00 You’ve identified the column that contains all the information, and now you want to drill down a bit more by actually finding the elements for the boxes that contain all of the job descriptions. We’re going to do that by HTML class name.

00:14 Heading back over to the site, I’m going to do a bit more inspecting. So, let’s see. When I click on this,

00:23 I see that each of those cards, it looks like that. We have a class="jobsearch-" […] blah, blah, blah. Okay. So, this has a couple of classes here, right?

00:33 And seems like there’s also—let me see that.

00:39 Yeah, those seem to actually be repeating. "jobsearch-" […] "unifiedRow row result clickcard". This seems to be the appropriate classes that each of those job search results has. Now, that means that we can create a list of all of those entries, of all of these elements, and just pick it out of the "resultsCol" that we have from before, instead of keeping all of the content that additionally has some information in there that we’re not interested in. So, what you can do is pass in, just for example, one class name. You can also pass more, but then they have to be in the exact order with spaces in between.

01:17 Now, in this case, I know they all have the 'result', in here—one of the class names. And that’s a pretty—for me—understandable one because I’m looking for results, so this is the one that I pick here. I’m going to say .find_all(). So, this is a bit different than the .find() one up here, which is going to only return one element.

01:37 .find_all() is going to return a list of elements, and I’m giving them the class_='result'. And what I’m doing additionally here is I say—because classes in HTML can be applied to all sorts of elements, while ID is unique and can only be applied to one element.

01:53 So, here, this .find() makes a lot of sense. A class could give me different results. And I don’t really have a complete understanding of this whole HTML in here, so I don’t know whether there is maybe some other HTML element somewhere down there that also has the class 'result'. That’s always possible.

02:11 So, I went to restrict it some more and I say, I want to find only <div> HTML elements that have the class 'result'. So if there’s a paragraph (<p>) somewhere, or emphasis tag (<em>), or something like that with the same class name, it’s going to not be taken into this list but only <div> elements with that specific class.

02:33 Which seems to me to be the things that we’re looking for. Now, another thing to note here is that you can see I’m calling the .find_all() method on results, which is one of the very, very useful things about Beautiful Soup is that each of the things that get returned from a call like this is another Beautiful Soup object, which means that you can call all of the useful methods on it again.

02:59 So, this gives you an easy way to keep just step by step digging deeper into the HTML structure. Okay. And in this case, I’m saving all of the results, all of the card elements, into jobs.

03:11 Let’s see how many there are. So, there seem to be 15 on this page. That could be right. 1, 2, 3, 4…

03:22 12, 13, 14, 15. Okay, perfect. So there’s 15 results on the page, and that seems to be what we collected with this. Let’s look at one of them. So, the first one—and you see, this is still a lot of HTML. Still not really very clear what’s going on here, but it seems like we identified correctly one of those job cards over here. As before, you can think of this now as whatever is nested in here—in your HTML structure—gets returned. And because it’s again a Beautiful Soup object, you can keep digging deeper and accessing different parts of this with the intuitive syntax that it has. Cool! Okay.

04:09 So, we have a list of Beautiful Soup elements that each contain one of those cards, that each contain the information that we’re looking for. So, we’re moving ahead and learned how to use the class_ to filter for it.

04:24 One note here is also this underscore (_) at the end. Because class is a reserved Python keyword, then if you want to search for the class, you have to put this class_ underscored. It’s just the syntax that Beautiful Soup uses. Okay! In the next lesson, we’re going to see how to get from this result, which is still a lot of garble HTML text, to actually get to a specific text that we’re interested in—so, something that we can read and save.

Become a Member to join the conversation.