Locked learning resources

Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Locked learning resources

This lesson is for members only. Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Setting Up a venv and Installing Beautiful Soup

00:00 Fortunately, I can use BeautifulSoup for parsing the HTML for this task. So I’m going to go ahead and make a virtual environment and then install BeautifulSoup into that environment.

00:12 So I make a virtual environment by running Python -m venv venv. Second one is the name. I’m just going to name it venv.

00:22 And I have links for you if you want to learn more about using virtual environments in Python. And then I will go ahead and source venv/bin/activate, activate the virtual environment, and now I’m ready to install BeautifulSoup.

00:36 You can see an indicator that the virtual environment is activated, and I will now run python

00:43 -m pip install beautifulsoup4.

00:49 All right, that’s installed. Now I can go ahead and check whether it actually works by starting a Python interpreter inside of my virtual environment. And I should be able to import bs4 without a problem.

01:03 So that worked. I didn’t get an error, which means that BeautifulSoup was successfully installed in my virtual environment and I can now use it in my script.

01:14 Okay! And to do that, I will have to import it. So up here in my imports in line three, I’m going to say from bs4 import BeautifulSoup.

01:28 So I want to use that BeautifulSoup class from the bs4 library that I just installed. What do I want to do with it? I want to go ahead and use it for parsing.

01:40 I will say soup = BeautifulSoup()

01:46 and in there, I’m going to pass the HTML text and define that I will use the HTML parser to parse it for BeautifulSoup. And now if I print that out, it’s not going to be as interesting as before because it’ll just tell me it’s a BeautifulSoup object.

02:04 But that should work. python get_links If I run my Python script,

02:12 I actually do get the printout of the text as well. But let’s double check that it is a BeautifulSoup object

02:22 by wrapping the soup into a call to type(). Then run it again. And you can see I am now working with bs4 that BeautifulSoup object, and it contains the HTML in the same way as before.

02:38 Okay, that isn’t much of a change. You may think since if I print it out, I still get the HTML displayed, but it makes a huge difference for the ease of parsing the content of that text because to Python, what we had before in the previous task, it was just a long text, like a collection of characters.

02:56 But BeautifulSoup has some understanding of HTML structure, so you can actually access different pieces of that text because it’s not a text, it’s a BeautifulSoup object that has these additional capacities encoded in it.

03:10 As you can see, I’m excited to parse it. So let’s go ahead and do that in the next lesson.

Become a Member to join the conversation.