urllib
The Python urllib
package is a collection of modules for working with URLs. It allows you to fetch data across the web, parse URLs, and handle various internet protocols. The urllib
package is a staple for networking tasks in Python.
Here’s a quick example:
>>> import urllib.request
>>> response = urllib.request.urlopen("http://www.example.com")
>>> html = response.read()
>>> html[:60]
b'<!doctype html>\n<html>\n<head>\n <title>Example Domain</title>'
Key Features
- Opens and reads URLs
- Parses and constructs URLs
- Handles URL encoding and decoding
- Manages cookies and HTTP headers
- Supports multiple protocols (HTTP, HTTPS, FTP)
- Provides robust error handling for network operations
- Allows customization of HTTP requests (headers, methods)
- Handles redirects and authentication
Frequently Used Classes and Functions
Object | Type | Description |
---|---|---|
urllib.request.urlopen() |
Function | Opens a URL and retrieves data |
urllib.parse.urlparse() |
Function | Parses a URL into components |
urllib.parse.urlencode() |
Function | Converts a dictionary into a URL-encoded query string |
urllib.error.HTTPError |
Class | Represents an exception raised for HTTP-related errors |
urllib.request.Request |
Class | Represents an HTTP request object |
urllib.request.urlretrieve() |
Function | Downloads a file from a URL and saves it locally |
Examples
Open and read a URL:
>>> import urllib.request
>>> with urllib.request.urlopen("http://www.example.com") as response:
... html = response.read()
...
Parse a URL into its components:
>>> from urllib.parse import urlparse
>>> url = urlparse("http://www.example.com/index.html;params?query=arg#frag")
>>> url.scheme, url.netloc, url.path
('http', 'www.example.com', '/index.html')
Common Use Cases
- Fetching data from web pages
- Parsing and manipulating URLs
- Encoding data for query strings
- Handling HTTP requests and responses
- Downloading files or images programmatically
- Authenticating with web APIs using custom headers
- Automating web data extraction for basic web scraping
- Handling HTTP errors and timeouts gracefully
- Interacting with REST APIs
Real-World Example
Suppose you want to download an image from the web and verify its format and size. You can combine urllib
with the popular Pillow library for image processing:
>>> import urllib.request
>>> from PIL import Image
>>> import io
>>> url = "https://www.python.org/static/community_logos/python-logo.png"
>>> filename = "python-logo.png"
>>> # Download the image data
>>> with urllib.request.urlopen(url) as response:
... img_data = response.read()
...
>>> # Save to file
>>> with open(filename, "wb") as img:
... img.write(img_data)
...
>>> # Load the image with Pillow and check details
>>> with Image.open(io.BytesIO(img_data)) as img:
... print(img.format, img.size)
...
PNG (601, 203)
In this example, you download an image from the web, save it locally, and use Pillow to check its format and dimensions. These are common practical steps for anyone working with images in modern Python.
Related Resources
Tutorial
Python's urllib.request for HTTP Requests
In this tutorial, you'll be making HTTP requests with Python's built-in urllib.request. You'll try out examples and review common errors encountered, all while learning more about HTTP requests and Python in general.
For additional information on related topics, take a look at the following resources:
By Leodanis Pozo Ramos • Updated July 29, 2025