The Challenge of Durability

Martin Breuss

Web Scraping With Beautiful Soup and Python Martin Breuss 01:55

Transcript
Discussion

00:00 Let’s next look at another challenge of web scraping, which is the durability of your scrapers. So, say you build a really nice scraper that does exactly what you want for your job board, and then you add the spice of some time passing and suddenly—it doesn’t work anymore. That’s a very, very common scenario when you’re building scrapers for any site out there, and the reason being that websites change.

00:25 Their structure doesn’t stay the same, because people keep working on them, improving them, changing something here, changing something there—and if the website changes that your scraper is customized to, then your scraper is going to break.

00:38 So this is something that you have to be aware of when you work with web scraping—that the work never really ends, because the work on the websites never really ends. So if there is a change on the website, you are going to have to adapt your scraper accordingly. In terms of durability, this means that your scraper plus time always means a bit of worrying about it and a little bit of work getting it back up running.

01:01 Usually, if you have the scraper built in the first place, the website doesn’t change completely, so it’s just going to be a small fix—but you have to stay vigilant in order to make sure that your scraper is still working. So to sum up again, the two main challenges of web scraping are variety—every website is different, so you have to customize your scraper to the individual structure of a website—and second, durability, meaning that websites change over time, which means that your scraper’s going to break and you’re going to have to keep it up-to-date. Now, in the next lesson, we’re going to talk about APIs, which represent an alternative to web scraping and that don’t necessarily do away with these problems of variety and durability, but they can alleviate it a little bit and make the process of gathering information from the web a little bit easier.

01:50 Let’s talk about what APIs are in the next lesson.

Become a Member to join the conversation.