The Challenge of Variety
00:00 Let’s talk about challenges that web scraping presents to you. First of all, I want to talk about variety. Now, when you think about the web containing a bunch of websites, then you can think of them as little snowflakes, essentially.
00:14 So, each of these pages has its own structure. It’s very unique to that specific page, and it might have a different size, it might have a different structure, but you can’t just think of one page and then think that you can take the same structure over to a different page. So if you write a scraper for one page, it’s not going to work for a different page because every page is special.
00:34 You’re going to have to get to know each page individually and write a scraper for that specific page in order to be able to scrape the information from it. Practically, for us, that means if you’re looking at a job board and you write code to scrape the information from one job board, you’re not going to be able to apply that same code to a different job board, because every page is unique and has its own structure that is specific for that page.
00:59 This is one of the challenges of web scraping that we want to subsume under the term variety. In the next lesson, we’re going to talk about another problem that you can run into with web scraping, which I call the durability problem. See you there.
Become a Member to join the conversation.