Scrape HTML Content From a Page (Introduction)
00:00
After inspecting your page in Part 1 you’re now ready for Part 2, which is actually scraping the content information from the website onto your computer using Python, some coding, and the requests
library.
00:14 This whole course is focused on static websites—that’s something to keep in mind. There’s other situations on the web, and we will talk a bit about them in this part of the course, but just keep in mind that what you’re learning to do here is scraping static websites, and it is a little bit different and a little bit more complex for other situations. So, I’m going to walk you over how to do this with the Indeed site.
00:36 We’re going to use Jupyter Notebooks for that. And then we are also going to talk a bit about how would you do that with a hidden website, which is just the term I’m using here for websites that are password-protected—so, if you have to log in to access the information that you’re interested in, then your process is a little bit different. We will take a look at that.
00:56 And then finally, there’s also dynamic websites, which means that the content that you’re actually interested in doesn’t get sent back directly by the server, but instead it sends some JavaScript code that your browser executes to generate the content that you’re interested in.
01:12 This is a bit more difficult to scrape because you can’t just ask the server for the information and get the information you want, but you need to run some code first to generate the information.
01:23 We’re not going to go into depth for password-protected sites nor dynamic websites—those are different topics—but you’re going to lay a very great foundation for learning more about those as well if you go over the process of scraping static websites, which is what you’re learning in this course. Okay! So as I mentioned, static websites is our focus and we’re going to start off by scraping the Indeed site that you inspected in the previous part. Then, we will talk a little bit about these other options, these other ways of scraping, and other procedures that you need for those.
Martin Breuss RP Team on Sept. 20, 2022
@Feras Morsi no, dynamic website content is about content that gets generated for you through running code. That can happen on the server or on the client, but the important part is that you won’t see the content of the webpage directly when you click on Inspect or View Source.
Instead you’ll see a <script>
tag (or often many!) that contain some JavaScript that will run and create the content for the page you want to view dynamically.
Does that make sense?
Feras Morsi on Sept. 21, 2022
@Martin Breuss yes many thanks.
Become a Member to join the conversation.
Feras Morsi on Sept. 20, 2022
When we say Dynamic Website we mean Scarping the website using API ?