`urllib`

The Python urllib package is a collection of modules for working with URLs. It allows you to fetch data across the web, parse URLs, and handle various internet protocols. The urllib package is a staple for networking tasks in Python.

Here’s a quick example:

>>> import urllib.request

>>> response = urllib.request.urlopen("http://www.example.com")
>>> html = response.read()
>>> html[:60]
b'<!doctype html>\n<html>\n<head>\n    <title>Example Domain</title>'

Key Features

Opens and reads URLs
Parses and constructs URLs
Handles URL encoding and decoding
Manages cookies and HTTP headers
Supports multiple protocols (HTTP, HTTPS, FTP)
Provides robust error handling for network operations
Allows customization of HTTP requests (headers, methods)
Handles redirects and authentication

Frequently Used Classes and Functions

Object	Type	Description
`urllib.request.urlopen()`	Function	Opens a URL and retrieves data
`urllib.parse.urlparse()`	Function	Parses a URL into components
`urllib.parse.urlencode()`	Function	Converts a dictionary into a URL-encoded query string
`urllib.error.HTTPError`	Class	Represents an exception raised for HTTP-related errors
`urllib.request.Request`	Class	Represents an HTTP request object
`urllib.request.urlretrieve()`	Function	Downloads a file from a URL and saves it locally

Examples

Open and read a URL:

>>> import urllib.request

>>> with urllib.request.urlopen("http://www.example.com") as response:
...     html = response.read()
...

Parse a URL into its components:

>>> from urllib.parse import urlparse

>>> url = urlparse("http://www.example.com/index.html;params?query=arg#frag")
>>> url.scheme, url.netloc, url.path
('http', 'www.example.com', '/index.html')

Common Use Cases

Fetching data from web pages
Parsing and manipulating URLs
Encoding data for query strings
Handling HTTP requests and responses
Downloading files or images programmatically
Authenticating with web APIs using custom headers
Automating web data extraction for basic web scraping
Handling HTTP errors and timeouts gracefully
Interacting with REST APIs

Real-World Example

Suppose you want to download an image from the web and verify its format and size. You can combine urllib with the popular Pillow library for image processing:

>>> import urllib.request
>>> from PIL import Image
>>> import io

>>> url = "https://www.python.org/static/community_logos/python-logo.png"
>>> filename = "python-logo.png"

>>> # Download the image data
>>> with urllib.request.urlopen(url) as response:
...     img_data = response.read()
...

>>> # Save to file
>>> with open(filename, "wb") as img:
...     img.write(img_data)
...

>>> # Load the image with Pillow and check details
>>> with Image.open(io.BytesIO(img_data)) as img:
...     print(img.format, img.size)
...
PNG (601, 203)

In this example, you download an image from the web, save it locally, and use Pillow to check its format and dimensions. These are common practical steps for anyone working with images in modern Python.

Tutorial

Python's urllib.request for HTTP Requests

In this tutorial, you'll be making HTTP requests with Python's built-in urllib.request. You'll try out examples and review common errors encountered, all while learning more about HTTP requests and Python in general.

intermediate api web-dev web-scraping

For additional information on related topics, take a look at the following resources: