Scrape HTML Content From a Page (Introduction)
After inspecting your page in Part 1 you’re now ready for Part 2, which is actually scraping the content information from the website onto your computer using Python, some coding, and the
00:14 This whole course is focused on static websites—that’s something to keep in mind. There’s other situations on the web, and we will talk a bit about them in this part of the course, but just keep in mind that what you’re learning to do here is scraping static websites, and it is a little bit different and a little bit more complex for other situations. So, I’m going to walk you over how to do this with the Indeed site.
00:36 We’re going to use Jupyter Notebooks for that. And then we are also going to talk a bit about how would you do that with a hidden website, which is just the term I’m using here for websites that are password-protected—so, if you have to log in to access the information that you’re interested in, then your process is a little bit different. We will take a look at that.
01:12 This is a bit more difficult to scrape because you can’t just ask the server for the information and get the information you want, but you need to run some code first to generate the information.
01:23 We’re not going to go into depth for password-protected sites nor dynamic websites—those are different topics—but you’re going to lay a very great foundation for learning more about those as well if you go over the process of scraping static websites, which is what you’re learning in this course. Okay! So as I mentioned, static websites is our focus and we’re going to start off by scraping the Indeed site that you inspected in the previous part. Then, we will talk a little bit about these other options, these other ways of scraping, and other procedures that you need for those.
@Feras Morsi no, dynamic website content is about content that gets generated for you through running code. That can happen on the server or on the client, but the important part is that you won’t see the content of the webpage directly when you click on Inspect or View Source.
Instead you’ll see a
Does that make sense?
@Martin Breuss yes many thanks.
Become a Member to join the conversation.
Feras Morsi on Sept. 20, 2022
When we say Dynamic Website we mean Scarping the website using API ?