Download link mentioned in this lesson: https://databank.worldbank.org/data/download/WDI_CSV.zip
Streaming With Requests
00:00
To start streaming, you need to change how you make the initial request. By default, requests will try to download the entire response body immediately. You can prevent this by passing stream=True to the .get() method.
00:13
This tells the library to only download the headers and keep the connection open, waiting for you to consume the content. In this code, you add stream=True to the request call, then inside the with block, you use a loop to iterate over response.iter_content().
00:28
This method yields data in small pieces defined by chunk_size rather than giving you one giant block of data. Inside the loop, you write each chunk directly to the file.
00:40 This way, the memory usage never exceeds the size of a single chunk, even if the file is several gigabytes.
00:49 Please note that we’re using a different URL here because now we want to download a larger file. This URL points to a zip file which is around 280 megabytes.
00:59
It contains the World Development Indicators from the World Bank Open Data Platform. To use streaming to download files, you just add stream=True in the request call.
01:11
So I’ll do response is equal to requests.get(). The first parameter is the URL itself, and we pass True to the named parameter called stream.
01:26
Now you can check if the request was successful by doing if response.ok, and if response.ok is True, we’ll open a new file and write chunks to that file. So we’ll do with open().
01:42
Let’s call our file largefile.zip, and the open mode is "wb", that is write binary, as file. And let’s run a loop for each chunk.
02:01 For this example, I’m setting the chunk size to 100 kilobytes. One kilobyte is 1024 bytes.
02:09
100 times that gives us the number 102400, and that’s what I’ll use. If you’re wondering how to choose the perfect chunk size for your own projects, don’t worry, we’ll cover exactly how to make that decision in the next lesson.
02:22 For now, let’s focus on this loop where you simply write every incoming chunk directly to the file you just opened.
02:39 Now you’ll continuously see the chunk size, that is 100 KB, being printed. So every time a chunk is downloaded and written, it prints its size. This will continue till the last chunk, and your file will be saved to your local machine.
02:56
After the loop has ended, let’s exit the REPL and list the files in our directory to see if the file is saved successfully. And you’ll notice that your largefile.zip has been saved on your local machine.
Become a Member to join the conversation.
