Setting Up a venv and Installing Beautiful Soup
00:00
Fortunately, I can use BeautifulSoup
for parsing the HTML for this task. So I’m going to go ahead and make a virtual environment and then install BeautifulSoup
into that environment.
00:12
So I make a virtual environment by running Python -m venv venv
. Second one is the name. I’m just going to name it venv
.
00:22
And I have links for you if you want to learn more about using virtual environments in Python. And then I will go ahead and source venv/bin/activate
, activate the virtual environment, and now I’m ready to install BeautifulSoup
.
00:36
You can see an indicator that the virtual environment is activated, and I will now run python
00:43
-m pip install beautifulsoup4
.
00:49
All right, that’s installed. Now I can go ahead and check whether it actually works by starting a Python interpreter inside of my virtual environment. And I should be able to import bs4
without a problem.
01:03
So that worked. I didn’t get an error, which means that BeautifulSoup
was successfully installed in my virtual environment and I can now use it in my script.
01:14
Okay! And to do that, I will have to import it. So up here in my imports in line three, I’m going to say from
bs4 import BeautifulSoup
.
01:28
So I want to use that BeautifulSoup
class from the bs4
library that I just installed. What do I want to do with it? I want to go ahead and use it for parsing.
01:40
I will say soup = BeautifulSoup()
01:46
and in there, I’m going to pass the HTML text and define that I will use the HTML parser to parse it for BeautifulSoup
. And now if I print that out, it’s not going to be as interesting as before because it’ll just tell me it’s a BeautifulSoup
object.
02:04
But that should work. python get_links
If I run my Python script,
02:12
I actually do get the printout of the text as well. But let’s double check that it is a BeautifulSoup
object
02:22
by wrapping the soup into a call to type()
. Then run it again. And you can see I am now working with bs4
that BeautifulSoup
object, and it contains the HTML in the same way as before.
02:38 Okay, that isn’t much of a change. You may think since if I print it out, I still get the HTML displayed, but it makes a huge difference for the ease of parsing the content of that text because to Python, what we had before in the previous task, it was just a long text, like a collection of characters.
02:56
But BeautifulSoup
has some understanding of HTML structure, so you can actually access different pieces of that text because it’s not a text, it’s a BeautifulSoup
object that has these additional capacities encoded in it.
03:10 As you can see, I’m excited to parse it. So let’s go ahead and do that in the next lesson.
Become a Member to join the conversation.