Skip to content

filecmp

The Python filecmp module provides functions for comparing files and directory trees. It uses an internal cache to avoid redundant comparisons, checking OS-level file statistics by default and falling back to byte-by-byte content comparison when needed. The dircmp class supports recursive directory comparison with attributes that are computed lazily on first access:

Language: Python
>>> import filecmp
>>> from pathlib import Path
>>> Path("a.txt").write_text("hello")
5
>>> Path("b.txt").write_text("hello")
5
>>> filecmp.cmp("a.txt", "b.txt", shallow=False)
True
>>> Path("b.txt").write_text("world")
5
>>> filecmp.cmp("a.txt", "b.txt", shallow=False)
False

Key Features

  • Compares two files with cmp(), returning True if they appear equal
  • Compares a list of files across two directories with cmpfiles(), sorting results into match, mismatch, and error lists
  • Caches comparison results and invalidates entries automatically when file statistics change
  • Compares entire directory trees with dircmp, including recursive subdirectory traversal
  • Computes dircmp attributes lazily, only when first accessed
  • Supports both shallow (stat-based) and deep (content-based) comparison modes

Frequently Used Classes and Functions

Object Type Description
filecmp.cmp() Function Compares two files and returns True if they appear equal
filecmp.cmpfiles() Function Compares a set of files in two directories, returning match, mismatch, and error lists
filecmp.clear_cache() Function Clears the internal file comparison cache
filecmp.dircmp Class Compares two directory trees with lazily computed difference attributes
filecmp.DEFAULT_IGNORES List Names excluded from dircmp comparisons by default

Examples

Comparing files in two directories and sorting them into match, mismatch, and error groups:

Language: Python Filename: compare_dirs.py
import filecmp
from pathlib import Path

Path("dir1").mkdir(exist_ok=True)
Path("dir2").mkdir(exist_ok=True)
Path("dir1/a.txt").write_text("same content")
Path("dir2/a.txt").write_text("same content")
Path("dir1/b.txt").write_text("version 1")
Path("dir2/b.txt").write_text("version 2")

match, mismatch, errors = filecmp.cmpfiles(
    "dir1", "dir2", ["a.txt", "b.txt"], shallow=False
)
print("Match:", match)
print("Mismatch:", mismatch)
Language: Shell
$ python compare_dirs.py
Match: ['a.txt']
Mismatch: ['b.txt']

Inspecting dircmp attributes to see which files are unique to each side:

Language: Python
>>> import filecmp
>>> dc = filecmp.dircmp("dir1", "dir2", shallow=False)
>>> dc.same_files
['a.txt']
>>> dc.diff_files
['b.txt']
>>> dc.left_only
[]
>>> dc.right_only
[]

Common Use Cases

The most common tasks for filecmp include:

  • Verifying that a backup directory matches its source
  • Detecting changed files before a deployment or sync operation
  • Finding files that exist in one directory but not another
  • Recursively auditing large directory trees for differences
  • Checking whether a file copy operation preserved content correctly

Real-World Example

A backup audit script uses dircmp to walk a directory tree recursively and report every file that is new, missing, or changed between a source and a backup:

Language: Python Filename: sync_check.py
import filecmp
import sys

def report_changes(dcmp, indent=""):
    for name in dcmp.left_only:
        print(f"{indent}  [new]     {name}")
    for name in dcmp.right_only:
        print(f"{indent}  [missing] {name}")
    for name in dcmp.diff_files:
        print(f"{indent}  [changed] {name}")
    for sub, sub_dcmp in dcmp.subdirs.items():
        print(f"{indent}{sub}/")
        report_changes(sub_dcmp, indent + "  ")

source, backup = sys.argv[1], sys.argv[2]
dc = filecmp.dircmp(source, backup)
print(f"Comparing {source!r} vs {backup!r}:")
report_changes(dc)

Run it against a project directory and its backup:

Language: Shell
$ python sync_check.py project/ backup/
Comparing 'project/' vs 'backup/':
  [new]     new_feature.py
  [changed] README.md
src/
    [changed] main.py

The subdirs attribute provides pre-built dircmp instances for each common subdirectory, making recursive traversal straightforward without additional setup.

Tutorial

Python's pathlib Module: Taming the File System

Python's pathlib module enables you to handle file and folder paths in a modern way. This built-in module provides intuitive semantics that work the same way on different operating systems. In this tutorial, you'll get to know pathlib and explore common tasks when interacting with paths.

intermediate python stdlib

For additional information on related topics, take a look at the following resources:


By Leodanis Pozo Ramos • Updated April 9, 2026