Introduction to Web Scraping With Python (Summary)
Although it’s possible to parse data from the Web using tools in Python’s standard library, there are many tools on PyPI that can help simplify the process.
In this video course, you learned how to:
- Request a web page using Python’s built-in
urllib
module - Parse HTML using Beautiful Soup
- Interact with web forms using MechanicalSoup
- Repeatedly request data from a website to check for updates
Writing automated web scraping programs is fun, and the Internet has no shortage of content that can lead to all sorts of exciting projects.
Just remember, not everyone wants you pulling data from their web servers. Always check a website’s Terms of Use before you start scraping, and be respectful about how you time your web requests so that you don’t flood a server with traffic.
Congratulations, you made it to the end of the course! What’s your #1 takeaway or favorite thing you learned? How are you going to put your newfound skills to use? Leave a comment in the discussion section and let us know.
00:00 So in this course, you covered a couple of topics about web scraping. It’s meant as a practical introduction. So what you did is you first understood some web scraping basics and then dove right into setting up and using Beautiful Soup, and you learned how to navigate, search, and modify the parse tree, so the HTML that you got back when you scraped a site from the internet.
00:22 Finally, you also discussed how you can handle some challenges such as big projects or dynamic sites, and also what ethical practices to consider when performing some web scraping tasks.
00:34 Keep in mind that this course is associated with a tutorial and the tutorial covers a couple more topics, so you can head to that tutorial if you want to learn how to use string methods or regular expressions for text extraction.
00:47 So essentially what you did using Beautiful Soup, but using, I’d say, lower-level approaches that are going to be more strenuous but maybe it’s interesting to find out how easy Beautiful Soup makes some of these tasks that would be pretty hard otherwise.
01:00 And towards the end of the tutorial, you’ll also learn how you can interact with HTML forms using another third party library that builds on top of Beautiful Soup called MechanicalSoup.
01:11 And you’ll also learn how you can interact with websites in real time using that same library. So these are some interesting next steps that you can consider, and you can read about them in the associated tutorial.
01:21 And the name of that tutorial is “A Practical Introduction to Web Scraping in Python” And here’s a couple of other resources that you can also take a look at.
01:29
So we’ve used urllib.request
to actually get the HTML page from the internet, so if you want to learn more about how to use that we have a tutorial and a course on it.
01:39 And then if you want to take the next step, you can work through the tutorial on Beautiful Soup and how to build a web scraper with Python, which discusses how to use Beautiful Soup for web scraping more in depth by building out a project.
01:51 Or instead, you could jump right in and build out your own project idea. Did you say something about scraping beautiful ingredients from the internet to make recipes for beautiful soups?
02:01 Or maybe that’s just my dream project. Anyways, that’s it! Thanks for watching this course, and again, you can read a tutorial for more info. I hope you learned something and see you around at Real Python.
Become a Member to join the conversation.