Making a Plan and Reusing Code

Exercises Course: Introduction to Web Scraping With Python Martin Breuss 01:51

00:00 I will start a new file that I’ll call get links.py. That’s what I’m planning to do here. And let’s just take a quick note of the tasks that I figured out before I want to get HTML content.

00:17 Then I want to keep track of the base URL,

00:23 find the link tags, which are <a> tags, and then extract the href values

00:34 and then combine those with the base URL. This is my plan of attack, basically, of what I think that I need to do. And the first one, get HTML content. I will just go ahead and copy it over from my previous file because that’ll be the same except for the URL.

00:54 So I need to change the URL by removing the Dionysus and just stick with /profiles. And that should already give me, should already tackle the first task of getting the HTML.

01:05 Let’s take a look if I get the HTML under text that I just saw when inspecting the site before.

01:13 So I will go ahead and say python get_links.py in my terminal to run it and take a quick look at the output. Looks good. We’re getting the HTML of the All Profiles page and again, I can see the three links here with the partial URLs that I want to extract from that text.

01:37 But now I do not want to extract it again using just string methods. So as a next step, I’m going to set up a virtual environment and install Beautiful Soup so that I can then use it for parsing this text.

Become a Member to join the conversation.