filecmp
The Python filecmp module provides functions for comparing files and directory trees. It uses an internal cache to avoid redundant comparisons, checking OS-level file statistics by default and falling back to byte-by-byte content comparison when needed. The dircmp class supports recursive directory comparison with attributes that are computed lazily on first access:
>>> import filecmp
>>> from pathlib import Path
>>> Path("a.txt").write_text("hello")
5
>>> Path("b.txt").write_text("hello")
5
>>> filecmp.cmp("a.txt", "b.txt", shallow=False)
True
>>> Path("b.txt").write_text("world")
5
>>> filecmp.cmp("a.txt", "b.txt", shallow=False)
False
Key Features
- Compares two files with
cmp(), returningTrueif they appear equal - Compares a list of files across two directories with
cmpfiles(), sorting results into match, mismatch, and error lists - Caches comparison results and invalidates entries automatically when file statistics change
- Compares entire directory trees with
dircmp, including recursive subdirectory traversal - Computes
dircmpattributes lazily, only when first accessed - Supports both shallow (stat-based) and deep (content-based) comparison modes
Frequently Used Classes and Functions
| Object | Type | Description |
|---|---|---|
filecmp.cmp() |
Function | Compares two files and returns True if they appear equal |
filecmp.cmpfiles() |
Function | Compares a set of files in two directories, returning match, mismatch, and error lists |
filecmp.clear_cache() |
Function | Clears the internal file comparison cache |
filecmp.dircmp |
Class | Compares two directory trees with lazily computed difference attributes |
filecmp.DEFAULT_IGNORES |
List | Names excluded from dircmp comparisons by default |
Examples
Comparing files in two directories and sorting them into match, mismatch, and error groups:
compare_dirs.py
import filecmp
from pathlib import Path
Path("dir1").mkdir(exist_ok=True)
Path("dir2").mkdir(exist_ok=True)
Path("dir1/a.txt").write_text("same content")
Path("dir2/a.txt").write_text("same content")
Path("dir1/b.txt").write_text("version 1")
Path("dir2/b.txt").write_text("version 2")
match, mismatch, errors = filecmp.cmpfiles(
"dir1", "dir2", ["a.txt", "b.txt"], shallow=False
)
print("Match:", match)
print("Mismatch:", mismatch)
$ python compare_dirs.py
Match: ['a.txt']
Mismatch: ['b.txt']
Inspecting dircmp attributes to see which files are unique to each side:
>>> import filecmp
>>> dc = filecmp.dircmp("dir1", "dir2", shallow=False)
>>> dc.same_files
['a.txt']
>>> dc.diff_files
['b.txt']
>>> dc.left_only
[]
>>> dc.right_only
[]
Common Use Cases
The most common tasks for filecmp include:
- Verifying that a backup directory matches its source
- Detecting changed files before a deployment or sync operation
- Finding files that exist in one directory but not another
- Recursively auditing large directory trees for differences
- Checking whether a file copy operation preserved content correctly
Real-World Example
A backup audit script uses dircmp to walk a directory tree recursively and report every file that is new, missing, or changed between a source and a backup:
sync_check.py
import filecmp
import sys
def report_changes(dcmp, indent=""):
for name in dcmp.left_only:
print(f"{indent} [new] {name}")
for name in dcmp.right_only:
print(f"{indent} [missing] {name}")
for name in dcmp.diff_files:
print(f"{indent} [changed] {name}")
for sub, sub_dcmp in dcmp.subdirs.items():
print(f"{indent}{sub}/")
report_changes(sub_dcmp, indent + " ")
source, backup = sys.argv[1], sys.argv[2]
dc = filecmp.dircmp(source, backup)
print(f"Comparing {source!r} vs {backup!r}:")
report_changes(dc)
Run it against a project directory and its backup:
$ python sync_check.py project/ backup/
Comparing 'project/' vs 'backup/':
[new] new_feature.py
[changed] README.md
src/
[changed] main.py
The subdirs attribute provides pre-built dircmp instances for each common subdirectory, making recursive traversal straightforward without additional setup.
Related Resources
Tutorial
Python's pathlib Module: Taming the File System
Python's pathlib module enables you to handle file and folder paths in a modern way. This built-in module provides intuitive semantics that work the same way on different operating systems. In this tutorial, you'll get to know pathlib and explore common tasks when interacting with paths.
For additional information on related topics, take a look at the following resources:
- Reading and Writing Files in Python (Guide) (Tutorial)
- Working With Files in Python (Tutorial)
- How to Get a List of All Files in a Directory With Python (Tutorial)
- Build a Python Directory Tree Generator for the Command Line (Tutorial)
- Using Python's pathlib Module (Course)
- Python's pathlib Module: Taming the File System (Quiz)
- Reading and Writing Files in Python (Course)
- Reading and Writing Files in Python (Quiz)
- Practical Recipes for Working With Files in Python (Course)
- Working With Files in Python (Quiz)
- Listing All Files in a Directory With Python (Course)
By Leodanis Pozo Ramos • Updated April 9, 2026