Scraping Ethically and Following Best Practices
00:00 Finally, let’s take a quick look at some ethical implications of web scraping because you are accessing data that’s out on the internet and you’re using automated means to access it.
00:11 So you want to make sure that you’re scraping from a site that is okay with what you’re doing. You don’t just want to overwhelm the website or work against its usage policies.
00:22
And there’s a file that a lot of websites are going to include that’s called robots.txt
, that gives instructions to web scrapers. So it’s a file that contains instructions that your scraper should read, and generally it gives you information about which of the pages of the website you can scrape and which of the ones you should just leave alone.
00:41
So make sure to respect that robots.txt
if it’s present on a website. You also want to make sure to respect user privacy and data policies of a specific site, so don’t go ahead and just get any sort of information that you can, but instead just be nice.
00:57 Finally, you also may want to implement delays between your requests if you send multiple requests, because that way you can put less pressure on a website, which is going to avoid overwhelming the servers that host that site, which is nice from the perspective of the person hosting the website.
01:13 And then from your perspective, it can circumvent rate limiting and throttling that the website may implement to prevent exactly this type of overloading if there’s too many automated requests coming to its servers.
01:28 So these are some best practices and just good ideas to keep in mind when you’re working on web scraping projects.
01:35 And that concludes the course. In the next lesson, we’ll sum up what you’ve learned about, and I’ll also give you a couple of additional resources that you can go to to continue learning about web scraping with Python.
Become a Member to join the conversation.