Using an HTML Parser Exercise

Exercises Course: Introduction to Web Scraping With Python Martin Breuss 01:51

00:00 Fortunately, this next task releases us from the need to slice through a large HTML string, and we can actually use Beautiful Soup for parsing, which makes it a lot more intuitive.

00:11 And this task is to write a program that grabs the full HTML from the page at the URL olympus .realpython.org/profiles. So that’s one of the subpages of this page that you’re scraping for this course.

00:26 And then you can use Beautiful Soup to print out a list of all the links on the page by looking for HTML tags with the name a, those are the link tags, and then retrieving the value that the href attribute of each of those tags has.

00:39 Now, I can click on the page here to take a look at it. So I can say it’s a bare bones page that has title All Profiles, and then Aphrodite, Poseidon, Dionysus on there.

00:50 So these are the three profiles we have, and I should get those three links. Okay, we’ll inspect it in just a moment.

00:57 The final output should look like this, that you really just have the URLs one pointing to Aphrodite’s profile, the next one to Poseidon’s, and then the next one to Dionysus’.

01:10 And you should make sure that there’s only one slash between the base URL and the relative URL. So this is going to be the base URL, and then the relative URL points to the specific profiles.

01:43 Okay. Go ahead and tackle this task once you’re done, move on and you can watch me solve the challenge as well.

Become a Member to join the conversation.