Setting Up a venv and Installing Beautiful Soup
00:00
Fortunately, I can use BeautifulSoup for parsing the HTML for this task. So I’m going to go ahead and make a virtual environment and then install BeautifulSoup into that environment.
00:12
So I make a virtual environment by running Python -m venv venv. Second one is the name. I’m just going to name it venv.
00:22
And I have links for you if you want to learn more about using virtual environments in Python. And then I will go ahead and source venv/bin/activate, activate the virtual environment, and now I’m ready to install BeautifulSoup.
00:36
You can see an indicator that the virtual environment is activated, and I will now run python
00:43
-m pip install beautifulsoup4.
00:49
All right, that’s installed. Now I can go ahead and check whether it actually works by starting a Python interpreter inside of my virtual environment. And I should be able to import bs4 without a problem.
01:03
So that worked. I didn’t get an error, which means that BeautifulSoup was successfully installed in my virtual environment and I can now use it in my script.
01:14
Okay! And to do that, I will have to import it. So up here in my imports in line three, I’m going to say from bs4 import BeautifulSoup.
01:28
So I want to use that BeautifulSoup class from the bs4 library that I just installed. What do I want to do with it? I want to go ahead and use it for parsing.
01:40
I will say soup = BeautifulSoup()
01:46
and in there, I’m going to pass the HTML text and define that I will use the HTML parser to parse it for BeautifulSoup. And now if I print that out, it’s not going to be as interesting as before because it’ll just tell me it’s a BeautifulSoup object.
02:04
But that should work. python get_links If I run my Python script,
02:12
I actually do get the printout of the text as well. But let’s double check that it is a BeautifulSoup object
02:22
by wrapping the soup into a call to type(). Then run it again. And you can see I am now working with bs4 that BeautifulSoup object, and it contains the HTML in the same way as before.
02:38 Okay, that isn’t much of a change. You may think since if I print it out, I still get the HTML displayed, but it makes a huge difference for the ease of parsing the content of that text because to Python, what we had before in the previous task, it was just a long text, like a collection of characters.
02:56
But BeautifulSoup has some understanding of HTML structure, so you can actually access different pieces of that text because it’s not a text, it’s a BeautifulSoup object that has these additional capacities encoded in it.
03:10 As you can see, I’m excited to parse it. So let’s go ahead and do that in the next lesson.
Become a Member to join the conversation.
