Using an HTML Parser Exercise
00:00 Fortunately, this next task releases us from the need to slice through a large HTML string, and we can actually use Beautiful Soup for parsing, which makes it a lot more intuitive.
00:11
And this task is to write a program that grabs the full HTML from the page at the URL olympus
.realpython.org/profiles
. So that’s one of the subpages of this page that you’re scraping for this course.
00:26
And then you can use Beautiful Soup to print out a list of all the links on the page by looking for HTML tags with the name a
, those are the link tags, and then retrieving the value that the href
attribute of each of those tags has.
00:39 Now, I can click on the page here to take a look at it. So I can say it’s a bare bones page that has title All Profiles, and then Aphrodite, Poseidon, Dionysus on there.
00:50 So these are the three profiles we have, and I should get those three links. Okay, we’ll inspect it in just a moment.
00:57 The final output should look like this, that you really just have the URLs one pointing to Aphrodite’s profile, the next one to Poseidon’s, and then the next one to Dionysus’.
01:10 And you should make sure that there’s only one slash between the base URL and the relative URL. So this is going to be the base URL, and then the relative URL points to the specific profiles.
01:22 Okay, cool. So that’s the task. If you’ve worked through the tutorial, then you’ve used Beautiful Soup before, or maybe you’ve done another course on it. So if you’re not familiar, you should first try to use Beautiful Soup and make sure you know how to install it, and then use that excellent library for parsing your data scraped from the Web to get these three links out of that profiles page, basically.
01:43 Okay. Go ahead and tackle this task once you’re done, move on and you can watch me solve the challenge as well.
Become a Member to join the conversation.