Exercises Course: Introduction to Web Scraping With Python (Summary)
Although it’s possible to parse data from the Web using tools in Python’s standard library, there are many tools on PyPI that can help simplify the process.
In this course, you learned how to:
- Request a web page using Python’s built-in
urllib
module - Parse HTML using Beautiful Soup
- Interact with web forms using MechanicalSoup
- Repeatedly request data from a website to check for updates
Writing automated web scraping programs is fun, and the Internet has no shortage of content that can lead to all sorts of exciting projects.
Just remember, not everyone wants you pulling data from their web servers. Always check a website’s Terms of Use before you start scraping, and be respectful about how you time your web requests so that you don’t flood a server with traffic.
Congratulations, you made it to the end of the course! What’s your #1 takeaway or favorite thing you learned? How are you going to put your newfound skills to use? Leave a comment in the discussion section and let us know.
00:00
Congratulations on making it to the end of this exercises course. In this course, you practiced scraping a website using just the standard library urllib
.
00:10 You practiced parsing the HTML using just string methods, which turned out to be kind of tricky. Then you also practiced parsing the HTML using a third-party library called BeautifulSoup and interacting with an HTML form, filling it and submitting it using another third-party library called MechanicalSoup.
00:29 Both of these libraries made it more straightforward to get the information from the web using your Python program than it was when you’re just using the raw standard library approach.
00:40 You also picked up a couple of tips along the way that can be helpful to use code comments to help you get organized and write down your tasks and thoughts.
00:49 Break the exercise into smaller tasks so that you have intermediate steps that you can tackle one by one. It’s always helpful to use descriptive variable names, experiment, and try out code snippets in your Python REPL before then, making them part of your final script and that it’s helpful to test repeatedly to see whether the code actually does what you expect it to do.
01:11 Finally, I have a couple more resources for you that you can look at if you want to continue your journey into web scraping. First of all, there’s the associated tutorial called a “Practical Introduction to Web Scraping in Python”, so that’s something you can definitely check out if you haven’t read over it yet.
01:28
Then there’s also a tutorial and the video course on “Python urllib
request for Making HTTP requests” that goes much deeper into how you can use it and what you can do with urllib
.
01:41 Then we also have a tutorial and video course specifically geared to using Beautiful Soup for building a web scraper with Python. That means you’re going to scrape the data and then use Beautiful Soup to parse it.
01:54 And some related resources I’ve mentioned that we have a guide that goes deeper into character encodings. So this is the “Unicode and Character Encodings in Python, A Painless Guide” that talks all about UTF-8 and why and how you should decode information that you get back from the web like that.
02:10 And this also exists as a tutorial as well as a course. And then we also have resources on strings and character data in Python to talk about string slicing.
02:20 If you want to refresh your knowledge about interacting with text like that.
02:25 I’ve also mentioned that we have resources on Python virtual environments, so if you want to read up on that, you can check out the article or a video course.
02:33 And finally, if you’re curious to learn more about HTML, then you can check out the guide on “HTML and CSS for Python Developers”.
02:42 Alright, that’s all I have for you. Congratulations and thanks for joining and hope to see you around at realpython.com.
Become a Member to join the conversation.