Explore the Website
00:00 Okay! So, I’m a normal user and I just want to look for some jobs on the website, so I’m going to head over there on my browser and just take a look at it.
00:10 You can see inside of this Jupyter Notebook—that you also have access to in the course materials—we have a link to indeed.com down here, under this heading Explore the Website, so that’s what I’m going to follow. I just Command + click it to move over, open it up in a new tab, and then take a look at it!
00:28
You can see that this one links to the /worldwide
Indeed page, which is just a distinction because there’s country-specific ones that you could click on, so feel free to use your own country if you’re interested in searching in your own country, in your own language.
00:45
I believe that all of the structures of the websites are the same, so the scraping should work anywhere. They just have somewhat different URL endpoints. If you use this /worldwide
one, it’s going to search in the US. So, I’m going to start off by typing my search here.
00:59 I want to look for a Python job and I want to look for it in New York. And then I just go ahead and click Find Jobs…
01:08 and here are the results! So, this is what I would do just as a normal user. I’m looking for a Python job in New York, I type in my search queries, and then I get the results here.
01:17 So, what I can see is that there are these kind of blocks here on the side that contain some information, and it looks like the first one is highlighted, which—from my assumption here—this is probably related. Here on the right, I can see some information that relates to this one—it fits with the heading here as well. So if I click on this one, I expect that this information here is going to change to the new job, and it does indeed. I get some information about the company here.
01:48 I’m also going to dismiss this down there so we have a bit more screen space. And I can keep scrolling and clicking on these and just get information about the jobs, and then I also have some other options here. It looks like I can apply, which is relevant for me if I’m looking for a job. If I click this, I expect it’s going to take me to a different page related to that specific company. And yeah, it does.
02:13 So this takes me away from indeed.com. Their job is done. They gave you the connection to this company that is looking for someone, and then here, I can directly apply.
02:24 So most of those cards are going to have some possibility to apply. That’s something interesting. And then there’s also this option—it looks like I can star them or heart them. For this, I probably need some kind of account on Indeed—I need to sign up or create an account, and then I can probably save these. Yeah. Okay.
02:47 So, as you see, I’m just clicking around and exploring to find out how is this page structured, and what’s the information that I want to get from this page.
02:56 And for now, it looks like we’re going to focus on this part here—this column, essentially, that has all the search results. That has some title of the job, it has a location, and it has a bit of information, plus also it’s going to contain in here a link to even more information that opens up here on the side. If I can collect all of this information, that’s pretty cool, and that’s what I’m going to focus on right here.
03:22 So, feel free to explore the site some more, just in the way that I did now: scroll around, click around, think what you expect, and see what clicking does. The better you understand your site, the easier it’s going to be to scrape it eventually. And also remember, this is just step one. In the next lesson, we’re going to look at the information that you can get out of the URLs up here.
Become a Member to join the conversation.