APIs: An Alternative to Web Scraping
00:00 Let’s talk about APIs, which represent an alternative to web scraping.
00:05 First of all, API is a word that you’ve probably heard around. It’s one of those things that are everywhere, but what does it actually mean? API stands for application programming interface, which essentially just means that it provides data that is designed to be consumed by programs instead of by humans. What does that mean?
00:23 Let’s look at an example. Let’s say you’d be looking for a specific URL on a website, that you want to gather this information from there. Now, if you would do an API call, you get some code returned that looks maybe like this one.
00:37 It’s nicely structured and pretty focused on what the URL is and what the title of the URL is. This just an example, obviously, so it’s going to look a bit different, but the idea is that it’s relatively focused and clean, which makes it easy for your program to pick up the information that you want and process it forward. Now, if you would get the same information through web scraping—you already see this here in the background—the code that you get looks a bit more messy.
01:03 There’s a lot going on here and you have this HTML code that gets returned, and yes, the URL as well as the title of this link is somewhere in there, but you’re going to have to parse in a much more involved way in order to get to that information. Now, the two formats that you see here is on one hand JSON, which is a common format that APIs return to you.
01:24 There’s also other ones such as XML, but JSON is a good standard and provides the information in a easily-readable format that is also easy to process for programs. Now, HTML looks different because it isn’t designed to be consumed by programs, but instead it’s designed to be consumed by humans looking at a website.
01:44 This is what your browser renders to give you this graphical display of a website that you’re used to when you’re browsing the web.
01:51 JSON doesn’t do this because it has much more focused and specific information. So, as a sum up: The responses from an API are designed to be consumed by a program.
02:02 The responses in HTML that you would get from scraping the web are designed to be viewed by human eyes, and the step of getting it into a format to be consumed by programs is thus a little bit more involved.
02:15 Now, you might ask the question, “Why would you want to scrape the web at all instead of just using an API?” And the quick answer is that many websites simply don’t have an API, so all that you’re left with is using the front end—the HTML page—scraping it, and then sifting through the data in order to get the information that you’re looking for. An API is an extra product that a company needs to build, maintain, and offer to you, and by far not every website that is out there provides this service, simply because they’re just focused on presenting some information for the human eye instead of wanting you to pick it apart with the scripts and programs that you’re going to write.
02:56 The web scraping process that you’re going to learn about in this course, that I talk about as Part 1—inspect, then scrape, and then parse—is however, very similar also with APIs.
03:06 You can think of it in a way that instead of inspecting the page with developer tools—as you will learn in the next part of this course—for an API, you would go and read the API documentation.
03:17 You still have to learn the individual structure of this website because every API is probably going to work different than another one, so the issue of variability is still present also with APIs. However, it’s more structured, so it’s probably going to be easier unless the API documentation really isn’t good.
03:35 The next step of scraping the data relates—in an API—to making an API request. This is relatively similar to the web scraping step, only that the request that you make is not for the human-viewable HTML page, but instead to a specific API endpoint where you get the JSON or XML data that you can then process forward. And finally, the step of parsing the information is going to be a lot easier with the API because—as you saw before—you have a much more focused and structured format, which is easier to pick apart with your programs than the HTML is going to be.
04:10 So, the parse step here is picking the information from this structured response that you receive. Now, to sum this up: If you can, if there is one, then use the API. It’s probably going to make your life easier.
04:22 And for everything else, there’s web scraping. And with this, I want to sum up this first section of the course where I gave you a quick intro to web scraping—what it is, why you would want to use it, what some problems are with web scraping, and finally, also talked a little bit about APIs. In the next part, we’re going to move on to actually starting the web scraping process on the example job board indeed.com.
04:47 I’ll see you in the next lesson where we start off with Part 1, inspecting your data source.
Become a Member to join the conversation.