Web Scraping With Beautiful Soup and Python (Overview)
The incredible amount of data on the Internet is a rich resource for any field of research or personal interest. To effectively harvest that data, you’ll need to become skilled at web scraping. The Python libraries
requests and Beautiful Soup are powerful tools for the job. If you like to learn with hands-on examples and you have a basic understanding of Python and HTML, then this course is for you.
In this course, you’ll learn how to:
requestsand Beautiful Soup for scraping and parsing data from the Web
- Walk through a web scraping pipeline from start to finish
- Build a script that fetches job offers from the Web and displays relevant information in your console
We’re going to talk about the different tools that you use, with the main focus on the browser, the
requests library, and the Beautiful Soup library. Now, step by step, you’re going to start off learning about intro to web scraping—essentially being about what is web scraping in the first place, why would you want to use it, some problems and difficulties with it, and also some possible alternatives.
00:31 Then, we’re going to go into the web scraping process, which I like to think about it as three parts consisting of, first, inspecting your data source, which is an important step to do because you need to understand what is the data you’re working with. For this, we’re going to work with a couple of different tools—among other things, the browser developer tools that help you to get a good understanding of how is your website structured. This is going to make it much easier for you to then also scrape the website and parse the information that you want.
Which brings us to Part 2, which is going to be scraping the HTML content from a page. Here, we’re going to use the
requests library and write some Python code to do so. You can see that in a basic example, you don’t necessarily need a lot of code for doing that, but there’s more complex scenarios that we will touch on and I will give you some pointers if you’re working with more difficult-to-scrape websites. In Part 3, we’re finally going to talk about parsing HTML code using the Beautiful Soup library, and this is the part where you go and pick out the information from the page content that you scraped before and use your browser also to investigate what are the pieces that you want and that you need, and write the code to actually fetch that out. Now, this is going to be an iterative process where it’s important that you use both your browser to inspect the information—that you have a good understanding what’s going on so that you know which code to write to pick out that specific information. So here, we’re going to talk again a bit about Part 1 and inspecting the website, and just make sure that you understand this is an iterative process where you write code, check the information on the site, and then write some more code.
02:07 In the final part, I will present you with a Jupyter Notebook that has a couple of questions and steps prepared for you that you can work on in order to practice these skills that we talked about in this course, and to build out a pipeline that scrapes a couple of pages in one go and picks out specific pieces of information related to a job board that would then be helpful for you to customize and make it your own project in a way that you can build out a job search tool that might be interesting and helpful for your own job search. Okay.
Become a Member to join the conversation.