Inspecting the Site With Your Browser
00:00 Again, I start by taking a look at the site that I want to scrape in my browser. So it’s already seen it for a second, but this is the All Profiles page and it has three links on it that point to the individual profile pages.
00:14 So I can go there and see Aphrodite’s profile or Poseidon on or Dionysus. So the three on there. Okay, but I just want to scrape this page and get out the URLs in here.
00:25
So again, I’m going to right-click and View Page Source to see the HTML that makes up the site. And you can see there are three link tags in here. Those are the a
tags and they have an attribute called href
that points to what you could consider a string here.
00:42 And this only has the relative URLs on here. So what you can see already here is that you will need to combine this relative URL together with the base URL of the All Profiles page so that you actually get the full URL that points to the specific profiles of these three Greek gods.
01:03
All right, so I have a rough idea of what I need to do. I need to script that HTML, I need to somehow keep track of the base URL, and then I need to stick it together with these individual values from the href
attribute, from the link text.
01:17 With that rough idea in mind, I’m going to head over to VS Code and start coding.
Become a Member to join the conversation.