Generalizing and Cleaning Up the Code
00:00 So I’ve gotten to the point that I extracted the name from that HTML that I scraped from the site, but now I also want to get favorite color. And you could go ahead and copy the thing and just change the value for name, but of course that’s not a super great approach.
00:16 So instead what I want to do is use a loop here so that makes it more extensible. And then you could also search for other substrings and then you know, get that information if that would be interesting.
00:27 Let’s see if it works. I will go ahead and say for search string
00:36 in, and then make a list here where I will put in two strings Name and Favorite Color because these are the two that task asks me to find in here. So I want to do this both for Name as well as for Favorite Color.
00:53 Then I’ll remove that search string and I will have to indent of this.
00:58 Oops, not the for loop. So there you go for search string in first for Name and then do the same thing for Favorite Color. I want to do the same approach that just generalized before and I should get both of the imports out there if it works with the structure of the HTML.
01:13 Let’s try that out. Got Name, Color, okay, so it just prints Dionysus, that strip and then it doesn’t
01:26 because it looks like it didn’t find it right. This is what I would assume.
01:30 But let’s go ahead and put a breakpoint in here to debug what’s going on here.
01:37
So I run this again the first time it should print Dionysus. So currently info
should hold Dionysus, so I’ll continue to the next time and now let’s look at what info
holds. It looks pretty good.
01:49 So it holds Wine the second time around, so I’m not sure why it didn’t print it out a second time. Let’s see what happens if I continue now it prints it.
01:59
Oh, that’s confusing. Let’s hide that and run it again. python get_name_color
. Alright, so this time it did print both of those.
02:15
I probably just forgot to save this is what must have happened, why it didn’t show the first time. So I don’t have autosave on on VS Code. I must have forgotten to press Save after editing the for
loop into this, right?
02:27 Because yeah, now it seems to be working alright and I get both of these informations out.
02:33 But anyways, what I do, if something happens that is different than what I expect, I throw in a little breakpoint here and then I get an interactive way to explore what’s happening inside of the code or what might be going wrong.
02:46 Okay, well I think that looks like it solved the task, I’m going to get rid of all those slightly messy comments that I put in there that I just used to develop this.
03:02
And with that, I have a relatively okay looking code that does fulfill the task. It uses urllib
only standard library information to scrape all the HTML from the given URL.
03:17
Then I decode it using utf-8 and save it as a string. And then I have this big string and here comes the slightly tricky part of then going through that string, finding the specific indexes because it wants me to use that find
for that.
03:32 And then slicing the string in a way that you get out just the information that you actually need. And then just do it for both of those. But you can see it’s relatively flexible.
03:42 I can use it for Name, I can use it for Favorite Color. I could throw something else in there and it should work as well. So it’s extensible here through that list up here.
03:51 And uses the same approach to get information. What it does rely on is that the HTML is somewhat well-formed, right? It only works if the information that I want is concluded by an HTML tag that starts with an opening angular bracket.
04:07 And I also want to say like, this is not super intuitive. This is slicing through a giant string in Python isn’t super fun or logical in my opinion. Like you, you need to think about starting indexes and how to add them up and what’s the next thing that you need to look at, right?
04:22
So while this works and what’s the task, I’m happy that the next exercise is going to let us use some external libraries to make this whole parsing step that is pretty complex like that a bit easier. Before I move on, I just saw a small opportunity to make this code a little less verbose so I’m just going to get rid of the search_string
variable and move the call to the line function right here because this doesn’t seem very necessary or pretty descriptive.
04:54 Alright, so first, celebrate, you did it. This is the way that I did it. And once you’re done celebrating, let’s move on to the next exercise.
Become a Member to join the conversation.