Join us and get access to hundreds of tutorials and a community of expert Pythonistas.

Unlock This Lesson

This lesson is for members only. Join us and get access to hundreds of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Hint: You can adjust the default video playback speed in your account settings.
Hint: You can set the default subtitles language in your account settings.
Sorry! Looks like there’s an issue with video playback 🙁 This might be due to a temporary outage or because of a configuration issue with your browser. Please see our video player troubleshooting guide to resolve the issue.

Make a Soup

00:00 Let’s get set up in the code and do a bit of orientation and then find the first element by ID.

00:09 Over here in Part 3—it’s the third Notebook over here called 03_parse for this step in our web scraping process. I’ll click this away so we have more screen space.

00:21 Here’s another overview of the different topics that we’re going to talk about, and we need to start off by, again, scraping the site. So this is what you learned in Part 2, which is just using requests to get that specific query result and save it to the response object. I’m going to execute this first cell here—this was our scraping step in this case—and now we’re going to start to parse the results that we got back from there.

00:50 So for this, we’re using a library called Beautiful Soup, which is a standard for doing web scraping with Python. It’s very powerful and pretty intuitive, so it’s definitely a good library to know. There’s some other ones out there as well but Beautiful Soup is the defacto standard for web scraping.

01:09 So, I’ll go ahead and import this. I also have this installed in the virtual environment. And then, you are ready to create a soup! Which is Beautiful Soup’s way of parsing through the HTML content so that it then is accessible through intuitive methods and attributes on that object.

01:28 We’re going to look at this more, but for now, the first step is always that you want to pass in the content from your scraping into this constructor and create a BeautifulSoup object.

01:39 And then you’re saving it into some variable name, and by convention this is just going to be soup. So I do this, and now we’ve parsed the content and it’s accessible here.

01:52 You already see that that’s going to be pretty long. I’m going to show you the content of this.

01:57 And you see here that this—it’s a bit better formatted than the stuff that we saw before, but there’s still a lot going on, right? So, this is all of the page content.

02:09 Another way that you can see this exact response is if you head over to the site and then say View Page Source.

02:20 So, this is going to show you exactly the same code. This is what requests scrapes from the web, and then Beautiful Soup—once you parse it—just also can represent it in a bit more nicer formatted way. But otherwise, here, you’re just looking at the same content that requests scraped earlier. However, the soup object that it is now has a bunch of very, very useful methods and ways of interacting with it to pick out the information. We’re going to look at those next.

02:52 So, let’s stop this video here and we’re going to look at how to actually address a specific element by ID in the next lesson.

Become a Member to join the conversation.