Removing the Text Before the Info
00:00
First task is to get the starting index of the search string. So I will again say search_string
is in this case name colon space. So I’ll save that as a variable so I can work with it later on because I’ll need that more often.
00:15
And I want to get the starting index of that search string. So I’m going to say start_index
is going to be html_text.find
(start index)
.
00:33 Okay? That should give me a number. It’ll be a different number than in my experiments down here because there had a smaller HTML text. But let’s print it out to see if that works so far.
00:50
So I’ll run the file and I will not print the HTML text because it gets too big, so I’ll run it again. Okay. So name start_index
is not defined and that’s why okay, text.find
, because I made a mistake here and VS Code anyways told me about it. I don’t want to find the start_index
, of course, I want to find the search string.
01:17
All right, save and try again. get_name_color
so 141. So this is the index where in the HTML I got from the site. The first occurrence of name colon space starts. Great.
01:32
Now I want the length of the search string. So that should go by len
01:39
search string. And that should be the same as before so that’s six characters long. Okay, let’s assign it to a variable. I will call that one len_search_string
equals len(search_string)
.
01:58 Maybe I won’t need this intermediate variable, but for now it doesn’t hurt to be verbose. Okay, so now I need to find the end of the search string inside of the bigger HTML.
02:09
So the index of from where to where does it go? And that’ll be the start_index
plus the length of the search string. So that’ll give me an index in the HTML text of where it ends.
02:21
And that means start_index
plus the length of the search string.
02:30
Let’s try that out just to see that we’re on the right path. I’m going to slice html_text
02:39 starting from here to the end and it should start with Dionysus basically, and then have a bunch of other things afterwards. But let me print that out
02:50 because if it starts with Dionysus that it means we did the first couple of things right and got to a point, yep, where we then just need to get rid of the rest. Okay, so what I’ve successfully found now is the start of the piece of information that I’m interested in right?
03:07 I also defined its end and its end. I can identify by going to the next opening HTML tag, which is indicated with this angular bracket. Okay? But I’ll call it a lesson here.
03:21 I did half of what I want to do. I got the start of the information and in the next lesson let’s figure out how to separate it from the rest of the HTML by identifying the end of it.
Become a Member to join the conversation.