Brainstorming a Solution in the Python REPL
00:00 I’m ready to write the maybe more tricky part of this task because now I need to find specific pieces of information from that big blob of text that I got here.
00:10
So let me show you that again. I can run this get_name_color()
, and I get the whole HTML text returned. But now I want to pick out specific pieces of information.
00:23
I want to get just this part where it says Dionysus and I want to get just Wine out of there and I should use the .find()
method for it.
00:33 Okay, so I’m going to take some quick notes because otherwise I’m going to forget everything that I need to do. I want to find the information after Name and I want to find the information after Favorite Color.
00:49
That’s when I want to get these two pieces of information and I want to get them by using str.find()
.
00:58 Let’s spin up a little Python interpreter to play around with this and figure out how I can best get this information out of this giant chunk of text that I have.
01:08 So I’ll type Python in my terminal to start a Python interpreter, and then I will define some variables. So let me get an example
01:19 by copying this H2 that contains the Name. That’ll be my HTML for now.
01:26 So, oh, of course I need to wrap it in quotes. So what I’m trying to do here is just to give me a small example to play around with so that I figure out how to get the information so that then I can put that into my script.
01:41
I’m just making my HTML a little smaller and then playing around with this. So this string I’m going to want to search for is Name, colon, whitespace. So I will also save that as ss
01:57
and I will name those better when I put them into my script, but now I’m just experimenting, right? I can use html.find()
and give it that string and Python is going to return to me the index where I find the first occurrence of that string, which is on index four of this smaller string.
02:15
I’m going to do a short side note here because I don’t love to use find()
for this type of tasks because I think there’s another string method that is a little more descriptive, which is called index()
.
02:25
As you can see, like .find()
returns the index position where it starts, I don’t know why it’s called find. The thing is that if you use html.find()
and pass it a substring that isn’t present in your string, then it gives you minus one as an indicator of like it’s not in there, which I find pretty unintuitive.
02:44
So I prefer to use .index()
, which does the same when it finds the substring. So it gives me the starting index, but it raises an error if it doesn’t find the substring.
02:56 So I find it a little more descriptive. So it just tells me value error, the substring wasn’t found.
03:01
Okay, but that’s the end of my little rant against find()
. I’m still going to use it because that’s what the task asks me to do, right?
03:10
A little more space here. Okay, so I have this html
and I can find the start of it. Now I also need to find where the string that I found ends so that then I can start getting the information that falls after it.
03:24 I can do that by working with the length of the search string, the substring, so this is six characters long and it starts at index four. So I could do four plus six, right?
03:39 I could go to index 10 and then start looking there. So let me see if I slice my example HTML starting at 10 forward I should get the Dionysus and then whatever is the rest of it.
03:55 So that’s great. So I get the start, I can get the start like this, and now I also need to figure out how to get to the end. And hmm, one way I could do this is that since I’m working with HTML, I can expect that there’s going to be an opening angular bracket at the end of the piece of information that I want.
04:14
So I could do something like html
, slice it to get to the information that I want, and then from there I could again just find to find the index of the next opening bracket, which is going to be eight.
04:30
So plus eight is the length of the piece of information in here, which means that if I slice my html
from 10 up to 10 plus eight, so 18, then I should actually just get out the Dionysus. That works and that’s somewhat an approach that I could follow to figure this out because we also work for other pieces of content that are of a different length if I don’t hardcode these numbers, but, but use the different methods that I used to to get to them, right?
05:04 I think that’s probably an approach that I can roll with in this case. So in the next lesson, I’m going to take some more structured notes of this little exploration that I just did and then start working on it.
Martin Breuss RP Team on July 17, 2024
@alphafox28js there’s no specific setup on my VS Code terminal, it just uses the default zsh that comes with macOS.
Either way, the terminal setup won’t have any impact on what you get as your results when indexing/slicing the string.
Did you use the small example string that I copied? If you share some more info on what you got instead, then I can better help you figure out what might have happened.
Become a Member to join the conversation.
alphafox28js on July 17, 2024
how do you have your VS Code terminal setup? for some reason I do not get the same as you when indexing in the example during ‘brainstorming’