Exercises Course: Introduction to Web Scraping With Python (Overview)
Web scraping is the process of collecting and parsing raw data from the Web, and the Python community has come up with some pretty powerful web scraping tools.
The Internet hosts perhaps the greatest source of information on the planet. Many disciplines, such as data science, business intelligence, and investigative reporting, can benefit enormously from collecting and analyzing data from websites.
In this course, you’ll practice:
- Parsing website data using string methods and regular expressions
- Parsing website data using an HTML parser
- Interacting with forms and other website components
00:00 Welcome to this Real Python Exercises Course where you’ll practice scraping and parsing data from the Internet. Our exercises courses are all about training.
00:10 You’ll train the process of writing code by solving carefully selected exercises. You’ll also train reading other people’s code and communicating your thought process.
00:19 Doing all that, you’ll practice the concepts that you’ve learned about in an associated course or tutorial and help make them stick. In the upcoming lessons, I’ll introduce you to tasks, give you an opportunity to solve them yourself, and then show you step by step how I solved each of them.
00:34 You’ll go through three steps for each task. You’ll learn about the exercise, you’ll code your own solution, and then you’ll compare your solution at the process that got you there to mine.
00:44 When I walk you through a task, I’ll explain what I do and also why I do it like that. That’ll give you a chance to compare not just our final solution, but also how we got there.
00:53 Maybe you’ll gain some insight on the process of getting from a task description to a working solution in code. You’ll tackle three exercises in this course.
01:03
The first exercise asks for pure standard library Python. You should use urllib
to scrape and parse text from a website, and then string methods to extract information from it.
01:13 Then you’ll add abstraction layers on top, so you’ll actually see that the first exercise is going to be quite challenging, and then by adding abstraction layers of libraries that are just designed for web scraping and parsing, it’ll get easier.
01:28
In the second task, you use an HTML parser for web scraping, and specifically that’ll be BeautifulSoup
. In the third task, you learn how you can interact with HTML forms using the MechanicalSoup
library.
01:41 Before starting this course, you should have read through that introduction tutorial on web scraping with Python. If you went through that tutorial, then you’re well-equipped to use the tasks that I’ll throw at you as training sessions.
01:53
You can also explore the site that you’ll work with under the URL olympus.realpython.org/profiles
, and then you can click around a bit. For example, if you click on Aphrodite’s profile, then you’ll see the page that you can see at the moment on the slide.
02:10
The concepts that you’ll practice in this course are making web requests using urllib
, using string methods to extract information from text data parsing HTML using BeautifulSoup
, and filling and submitting forms using MechanicalSoup
.
02:26 If you’re somewhat familiar with these concepts and you want to fortify your knowledge with practical programming tasks, then this course is exactly right for you.
02:36 If you’re feeling playful like Poseidon and this little fish and you’re ready to do hands-on programming, then keep watching. I’ll see you in the next lesson where I’ll introduce the first exercise.
Become a Member to join the conversation.