Making a Plan and Reusing Code
00:00
I will start a new file that I’ll call get
links.py
. That’s what I’m planning to do here. And let’s just take a quick note of the tasks that I figured out before I want to get HTML content.
00:17 Then I want to keep track of the base URL,
00:23
find the link tags, which are <a>
tags, and then extract the href
values
00:34 and then combine those with the base URL. This is my plan of attack, basically, of what I think that I need to do. And the first one, get HTML content. I will just go ahead and copy it over from my previous file because that’ll be the same except for the URL.
00:54
So I need to change the URL by removing the Dionysus and just stick with /profiles
. And that should already give me, should already tackle the first task of getting the HTML.
01:05 Let’s take a look if I get the HTML under text that I just saw when inspecting the site before.
01:13
So I will go ahead and say python get_links.py
in my terminal to run it and take a quick look at the output. Looks good. We’re getting the HTML of the All Profiles page and again, I can see the three links here with the partial URLs that I want to extract from that text.
01:37 But now I do not want to extract it again using just string methods. So as a next step, I’m going to set up a virtual environment and install Beautiful Soup so that I can then use it for parsing this text.
Become a Member to join the conversation.