00:27 And the reason for that is that it’s just quicker to send just a bit of code that later generates some data inside of your browser, rather than giving you all the information of the page. This essentially outsources and offloads some of the computing power onto the browsers of the user that is actually wanting to access the data, and thus takes it away from the server of the web application that you’re interacting with. This is important if you want to scale to millions and millions of users, then it can keep your infrastructure smaller and just make the people who want the access to your page, essentially, do the work in order to get the data.
01:05 So, this is a smart move from the company’s perspective because it puts the work that needs to be done onto your browser rather than their server, but it makes scraping the page more difficult. And why is that?
So, it looks like this could be similar to how it was with indeed.com—but it is not. We do have the query parameter up here that tells us we’re searching for
realpython on the Twitter domain, but now if I run this in
it seems all fine, there is no problem, I can check the status codes. So also, here we have a
200, which means it’s a success, so there’s no authorization problem here. We’re getting everything, so you might expect that it’s fine and that you have all the data, all the tweets that you’re interested in. However, when you run this, you can see there’s a lot of code—I’m printing out all of it right now.
requests.get() the information because the response that you get—even though it’s all fine—it’s not going to be the content and information that you’re looking for.
So, scraping dynamic websites is a bit more advanced, but there are obviously ways of doing this and I’ve added some links here. You can check out
requests-html, which is from the same team that created the
requests library but also allows you to do scraping of dynamic websites and parsing right away.
03:51 And then a very commonly-used tool for scraping dynamic websites is Selenium. There’s also a tutorial that you can check out on Real Python about working with Selenium for scraping dynamic content, but we are not going to go into this in this course.
04:07 So that’s out of the scope of this course, but I wanted to make you aware of what the problem here is, and that you might run into it if you go off on your own and you have this specific website that you want to scrape and you try applying the techniques that you’re learning here, but then you run into a problem like this—that the response that you’re getting actually does not contain the information that you want. The reason is very likely that this is a dynamic website.
04:50 You want to get very used to this process, understanding the HTML in there and how you can interact with it, and once you are familiar and comfortable with scraping static websites, then you can move on to exploring some of these tools and the links that I have here—the Notebook for you—for how to also scrape dynamic content.
Become a Member to join the conversation.