00:13 So, back over in the Jupyter Notebook, we have a little section about that, which just exposes the problem here. You want to extract some information, but it’s not actually accessible as it was with the Indeed page that you looked at before. You need to log in in order to access it.
So, there are ways of doing this using
requests and they’re actually pretty straightforward. You just need to provide password authentication, and
requests has great ways of doing that. We also have a tutorial on how to do that.
00:40 So, if the information that you’re looking to scrape is behind some password protection, then make sure to check these ones out. And here, I just want to show you the problems that you might run into.
00:51 So, GitHub has an API where you can get information about the different repositories that are on a user’s account. For example, here is my account and I could get the information of which repositories are on there, but you need to authenticate as the user in order to be able to get this.
The response still has a
.content as the one above had, but that
.content is a message that tells me that it requires authentication, and it also gives a helpful link and how you can do that, so make sure to check out the
requests guide that has some information about doing that, if you’re interested in scraping this.
However, it is possible to use
requests for authentication and as I mentioned, it’s actually pretty easy. Make sure to head over to this guide, that explains in much more detail, how you can solve problems like that and still be able to scrape the information that you’re interested in from the web.
Become a Member to join the conversation.