Your Turn: Build a Pipeline
00:00 Congratulations for making it all the way down here in the course. Now, you’ve went over the basics of the web scraping process, which is: start off inspecting your page, you go ahead and scrape the content, and then you go forward and parse that content, picking out the information that you want.
This is a process that you will repeat for every website that you’re working with and every website that you want to scrape. So, keep in mind that this is like the high-level steps that you need to take, and that the actual specific code that you need to do—and obviously, the inspecting and all of this—is going to be individual for each of the websites that you’re working with. Now, in order to give you some practice, you’re going to look some more at the indeed.com website, and I’ve assembled a couple of tasks for you which will help you to practice your web scraping skills some more. Now, in this part, I want you to combine your knowledge about the site, the
requests library, and Beautiful Soup that you’ve gone over in the past parts of this video course, and then tackle a couple of tasks.
01:00 So, the idea is that I want you to automate the scraping process across multiple results pages, because we are going to look at this in a second, but you’ve only looked at 10 or 15 results of the search, but there’s more to that.
01:24 Like, you want your code to be able to search not only for Python in New York, but maybe you want to also search for Go in Berlin, or whatever—just different locations and different search terms.
01:37 So, you want to write some functions to generalize the code and allow for different inputs. And then finally, in the parsing part, I want you to be able to target specific pieces of information—and I have some suggestions for which ones that could be—and then save that specific information out to a file so that you can also use it and maybe work forward with it.
02:03 And here we are in the Notebook. So again, you have the high-level tasks written out here, and then more specifically, the tasks that I’m asking you to do. Now, I want you to scrape the first 100 available search results, because—let’s look at this some more. Here’s the search results, but—I think we counted them at some point before, it’s 15 of those, but that’s it. And then there’s another page, so you can click forward and get more search results. Because it wasn’t just 15 results, but 3,414. Right?
02:41 You’re going to have to write some code to be able to do that, and I have a couple of hints for you in the Inspect part of what you can look at in order to figure out how to do that, and I have some questions that can help you get on the right track down here in the Inspect part.
03:07 Then, I want you to pick out specific information, which is the URL for applying to the job, the job title, and the job location. Finally, save the results of your search to a file. Now, this Notebook gives you a start for that.
So, what changes in the URL when you click that? Inspect. Use your developer tools to figure out what are the specific elements that you want. What is the
id, for example, or maybe a class name that defines where the location is noted on the page?
Et cetera. So, just keep in mind that you have these tools in your tool belt now. You know how to inspect the page, then you know how to use
requests to scrape it, and then you know some of the methods that Beautiful Soup provides to pick out specific pieces of information. Go through this process, keeping in mind these specific tasks, and try to tackle them to get a lot of additional training for doing web scraping and to create a script that actually does something that collects information that might be relevant for you, or maybe one of your friends who might be looking for a job. Okay.
04:35 So, there is a solution document to these tasks, but I suggest you to go for it by yourself, try it out, see what you can do, and always you can compare with the solution document, but you know, the process of learning really is about trying to figure out stuff by yourself.
Make sure that you check out the Beautiful Soup documentation, the
requests documentation, get familiar with reading the docs that will also help you down the line, and make this project your own! Look at what would be interesting for you to scrape from this job page and really target the pipeline that you build out to that specific wish of yours. Now, before I let you go, let’s go to the final video in this course where we will do a full course recap and summary because, you know, it’s all about the iterative nature of working on things, so let’s do a quick brush over everything that you went over in this course and everything that you learned. See you over there, in the final video.
Become a Member to join the conversation.