Skip to content

test pyramid

The test pyramid is a heuristic for how to spread an automated test suite across several layers of granularity. It recommends many fast, narrow tests at the base, fewer wider tests in the middle, and only a small set of slow end-to-end tests at the top. Mike Cohn introduced the model in his 2009 book Succeeding with Agile, and Martin Fowler later popularized it as a counterweight to suites dominated by brittle UI tests.

The shape encodes two ideas at once. First, every test type has a different cost in run time, flakiness, and maintenance, so a healthy suite mixes granularities instead of relying on one. Second, the higher you go, the fewer tests you should have, because lower-level tests already cover most of the logic and run a thousand times faster, which gives the pyramid its characteristic shape:

Three stacked boxes narrowing upward, a narrow End to end box on top, a medium Integration box, and a wide teal Unit tests base, with a yellow Push tests down circle pointing to the base.
Push every test as far down the pyramid as it can usefully go.

How It Shows Up in Practice

A Python team usually maps each layer of the pyramid to a different tool and a different stage of the pipeline:

  • Unit tests at the base run a single function or class in isolation. They are written with pytest or the standard library’s unittest, mock out I/O, and finish in milliseconds. A typical project has hundreds or thousands of them, and they run on every commit.
  • Integration tests in the middle exercise how modules collaborate, including real databases, HTTP clients, and message queues. Common tools are pytest fixtures, httpx, requests, and testcontainers-python for spinning up disposable services. They run on every pull request but are slower and fewer.
  • End-to-end tests at the top drive the whole system through the user interface or a full public API. Playwright, Selenium, and Robot Framework are typical choices. A team might have a few dozen of these, run on a schedule or before each release, and accept that they take minutes rather than seconds.

The pyramid describes a shape rather than a numeric prescription. A frequently quoted 70 / 20 / 10 split across the three layers is a community rule of thumb that Cohn never put in print. The guiding principle is to push every test as far down the pyramid as it can usefully go.

A pure-stdlib sketch shows the contrast between the layers without any third-party dependency:

Language: Python
import json
import tempfile
from pathlib import Path

def total(items):
    return sum(item["price"] for item in items)

def test_total_unit():
    assert total([{"price": 10}, {"price": 5}]) == 15

def test_total_integration():
    with tempfile.TemporaryDirectory() as tmp:
        cart_file = Path(tmp) / "cart.json"
        cart_file.write_text(json.dumps([{"price": 10}, {"price": 5}]))
        items = json.loads(cart_file.read_text())
        assert total(items) == 15

test_total_unit()
print("test_total_unit passed")
test_total_integration()
print("test_total_integration passed")
Language: Program Output
test_total_unit passed
test_total_integration passed

The unit test never touches the filesystem and would survive a refactor of how the cart is stored. The integration test reads a real JSON file and would also catch a serialization bug. An end-to-end test would click through a browser and is omitted here because it requires a running web app.

Common Variations and Antipatterns

The most cited failure mode is the test ice cream cone, an inverted pyramid where most tests are slow UI or manual checks and the unit layer is thin or missing. Suites in this shape are slow, flaky, and discouraging to extend, and they are common in codebases where automated testing was added only after a QA team was already in place.

Several modern shapes adjust the pyramid for different architectures. Kent C. Dodds’ testing trophy widens the integration layer and adds static analysis from mypy, ruff, or pyright as a base, on the argument that integration tests give the best confidence per minute for frontend code. The testing honeycomb, associated with microservice teams, deliberately shrinks the unit layer and grows the cross-service integration layer.

The original pyramid still works as a starting point, and teams pick a variation when their architecture or risk profile pulls them away from the default shape.

Tutorial

Getting Started With Testing in Python

Learn Python testing in depth by writing unit and integration tests, measuring performance, and uncovering security issues. Find bugs before your users do!

intermediate best-practices testing

For additional information on related topics, take a look at the following resources:


By Martin Breuss • Updated May 29, 2026