Modifying the Parse Tree
00:00 In this lesson, you’ll take a look at how you can modify the parse tree that you’ve been working with using Beautiful Soup. And let’s mess a little bit with Dionysus’ profile by changing his profile picture.
00:12
We know that we can find the profile image by using soup.find()
00:20
and then just passing it "img"
, the tag name. So this gives you, this now points to the profile image /static/dionysus.jpg
. Alright, so you can go ahead and just change the value of the src
attribute, how you would change the value of a dictionary key.
00:39
So I can say profile_image
and then access the src
, right? This is, we looked at it, this is a way that you can access the value of an HTML attribute and this is also a way that you can change it.
00:53 So instead, let’s say we’re going to point it to
00:58 Poseidon’s profile image.
01:04
And now if you look at profile_image
, you can see that it doesn’t point to dionysus
anymore, but instead it points to the new value that you added in here.
01:14 This only changed it inside of the parse tree that you’re currently working with, so if you want to create a new HTML page with that changed content, you’ll have to write it back to a file.
01:24
And you can do that, for example, by saying with open(
passing it a name for the new file, I’ll call it output
.html
in write mode.
01:36
And you also want to, again, define the encoding as "utf-8"
01:44
and we’ll work with that as file
.
01:47
And now inside of this context manager, I’m going to say file.write(
01:52
and then just pass the soup
object to the str()
constructor. This works like that. You don’t actually need to do anything else, but I will use soup.prettify()
which just spaces it more nicely because we’re going to take a look at this HTML document after just to confirm that it actually got changed, alright. So that’s it.
02:16
I’ll exit the Python interpreter. And then just take a look at output.html
. You can see it’s prettily written that’s because we used prettify()
and here if we look at the profile image, it points to /static/poseidon.jpg
instead of Dionysus’ profile image.
02:36 And you can see the rest of the document is still the same as before. So this is how you can edit an HTML document using Beautiful Soup. Okay, and this wraps up our quick exploration of what you can do with Beautiful Soup.
02:49 In the next lesson, I want to discuss a couple of web scraping challenges that you might run into and point you forward into a direction of how you can solve them.
Become a Member to join the conversation.