Skip to content

Commit 6cb8403

Browse files
authored
Merge pull request #10731 from The-OpenROAD-Project-staging/docs-testing-strategy
docs: add C++-first testing strategy guide
2 parents 995b782 + f350101 commit 6cb8403

2 files changed

Lines changed: 214 additions & 0 deletions

File tree

docs/agents/testing-strategy.md

Lines changed: 210 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,210 @@
1+
# OpenROAD Testing Strategy
2+
3+
This document defines **where** new tests should live and **how** to handle the
4+
dependencies that make C++ unit testing of OpenROAD hard. It complements
5+
`testing.md`, which covers the mechanics of writing and registering a test.
6+
7+
## Goal
8+
9+
Move the bulk of *functional correctness* coverage into minimally-scoped C++
10+
unit tests, while keeping a thin layer of Tcl/Python tests that prove the
11+
bindings work and a small curated set of full-flow tests that prove the stages
12+
compose. Core logic should be tested with semantic assertions, not golden-file
13+
diffs.
14+
15+
> **Policy.** The end state is to **migrate off** the bulk of the Tcl/Python
16+
> golden tests, not keep them indefinitely. But that migration is a large,
17+
> incremental effort delivered as a long series of small steps -- never a single
18+
> big-bang rewrite. Two things happen in parallel:
19+
>
20+
> 1. **New tests** default to C++ unit tests now (this strategy's pyramid).
21+
> 2. **Existing golden tests** are retired in batches as their coverage is
22+
> re-expressed in C++ -- the C++ test and the removal of the Tcl/Python test
23+
> it replaces land in the *same* change. Prioritized by pain (flaky/slow/
24+
> frequently-broken modules and code you are already touching), with each
25+
> batch a reviewable change of its own.
26+
>
27+
> A golden test is only deleted once equivalent or better coverage exists in
28+
> C++; don't drop coverage to hit a ratio. Until then it stays.
29+
30+
## Why move off golden-file tests
31+
32+
The repo today is ~1,674 Tcl + ~265 Python regression tests vs. ~55 C++ unit
33+
tests. Most functionality is verified by reading LEF/DEF, running a command, and
34+
`diff`-ing a golden `.ok`/`.defok`/`.vok` file. That model has structural
35+
weaknesses:
36+
37+
1. **It tests serialization, not logic.** A passing diff means bytes matched, not
38+
that the result is *correct*. When a buffer count changes you learn something
39+
differed, not whether the new number is right.
40+
2. **It couples to unrelated behavior.** `diff_file` breaks on any upstream
41+
formatting/ordering change, and it only reports the first difference (so
42+
golden regen must be wholesale).
43+
3. **No fault isolation.** A flow test exercises the readers + STA + the unit
44+
under test at once; a failure could be anywhere.
45+
4. **It's slow.** Each test spawns a full `openroad` process and re-parses the
46+
Nangate45/Sky130 libraries.
47+
48+
C++ unit tests built on the existing fixtures fix all four: semantic assertions,
49+
no file coupling, single-unit scope, sub-millisecond setup.
50+
51+
## The test pyramid
52+
53+
Aim every **new** test at the lowest tier that can express it.
54+
55+
### Tier 1 -- C++ unit tests (default; target the majority of new tests)
56+
57+
One function/class, inputs built programmatically, assertions on observable
58+
state (`EXPECT_EQ(net->getITermCount(), 3)`), not on a serialized dump. This is
59+
where algorithmic correctness, edge cases, and regression-bug pins belong.
60+
61+
### Tier 2 -- Binding smoke tests (thin: ~1 per public command)
62+
63+
Their *only* job is to prove the Tcl/Python binding marshals arguments and
64+
returns without crashing -- not to validate the algorithm. Call the command once
65+
on a trivial design and assert it ran. Prefer `PASSFAIL_TESTS` (exit-code only)
66+
over golden diffs so there is no `.ok` file to maintain.
67+
68+
### Tier 3 -- Full-flow integration tests (keep deliberately few)
69+
70+
A curated handful per module that prove the stages compose end-to-end on a real
71+
design (e.g. gcd, aes). These are the "the flow still works" canaries. You want
72+
dozens across the project, not hundreds -- and this small curated set is the
73+
*permanent* home of golden-file testing. Most of today's golden tests are really
74+
unit-logic tests wearing a flow-test costume: those are the migration target
75+
(demote to Tier 1), not this canary layer. Resist adding new Tier-3 tests when a
76+
Tier-1 test would do.
77+
78+
The migration shape is therefore: **a new feature lands as Tier-1 semantic tests
79+
plus one Tier-2 smoke test, instead of a new Tier-3 golden test.**
80+
81+
## Handling dependencies
82+
83+
This is the crux. Pick the **lightest fixture that exposes the dependency the
84+
unit actually needs.**
85+
86+
| Dependency of the unit under test | Fixture / approach | Cost |
87+
| --- | --- | --- |
88+
| None -- pure algorithm/geometry/graph kernel | No fixture; plain structs + `odb` geom primitives | trivial |
89+
| An `odb` database (cells, nets, placement) | `tst::Fixture` / `odb::SimpleDbFixture` + `makeInst`/`makeBTerm`/`makeNets` | sub-ms |
90+
| STA timing | `tst::IntegratedFixture` (kNangate45 / kSky130hd) | ms (libs loaded once) |
91+
| Genuine cross-tool composition | Keep as a Tier-3 flow test | full process |
92+
93+
The existing fixture stack already supports this:
94+
95+
- **`tst::Fixture`** (`src/tst/include/tst/fixture.h`) -- owns `db_`, `sta_`,
96+
`logger_`; provides `loadTechLef`/`loadLibaryLef`/`readLiberty` and the netlist
97+
builders `makeInst`, `makeBTerm`, `makeNets`. The header explicitly states
98+
these are meant to make C++ setup "competitive with writing a Verilog or DEF
99+
test case by hand."
100+
- **`odb::SimpleDbFixture`** (`src/odb/test/cpp/helper/helper.h`) -- pre-builds a
101+
minimal tech/lib/chip/block and `createMaster*` helpers.
102+
- **`tst::IntegratedFixture`** (`src/tst/include/tst/IntegratedFixture.h`) -- wires
103+
`sta_`, `resizer_`, `dp_`, `grt_`, `ant_`, `stt_`, `ep_` against real libs and
104+
offers `readVerilogAndSetup`. This is how `rsz` and `dbSta` already test; use
105+
them as the reference pattern.
106+
107+
### Principles
108+
109+
1. **Depend on data/interfaces, not the whole pipeline.** If a function needs a
110+
fully placed-and-routed DB just to test one calculation, that is a design
111+
smell. Refactor the kernel to take the data it needs (a struct, a span, an
112+
`odb` geometry) so it can be tested with no fixture. *This
113+
refactor-for-testability is the single highest-leverage move* -- it improves
114+
the code and makes the test trivial.
115+
2. **Builders over checked-in files.** Every time a test needs "a block with a
116+
row of 3 placed cells," that should be a fixture method, not a new DEF.
117+
Growing per-module builder helpers (the `SimpleDbFixture` pattern, extended to
118+
dpl/grt/cts/...) is the shared library that makes Tier-1 cheap. Invest here.
119+
3. **Inject dependencies.** Prefer constructor/parameter injection so a unit can
120+
be handed a minimal hand-built DB (or a narrow fake of an *external* tool's
121+
output) instead of discovering global state.
122+
4. **Do not mock `odb`.** It is the lingua franca, it is cheap to instantiate,
123+
and faking it is more work than building a tiny real one. Fake/stub only
124+
expensive *external* tools, and only when the unit needs a narrow slice of
125+
their output.
126+
127+
## Keeping binding tests minimal but honest
128+
129+
- One smoke test per public Tcl command and per Python command, on a trivial
130+
design, asserting invocation + basic return marshaling. Convert to
131+
`PASSFAIL_TESTS` where possible.
132+
- Treat the C++ test as the source of truth for correctness; bindings prove
133+
plumbing only. Do **not** duplicate algorithm assertions across Tcl + Python.
134+
- A future linter (not yet implemented) could enumerate public commands from the
135+
`.tcl`/`.i` files and flag any without a smoke test, guaranteeing binding
136+
coverage without hand-curation.
137+
138+
## Decision tree for a new test
139+
140+
```
141+
Is it pure logic with no DB? -> Tier 1, no fixture
142+
Does it need an odb DB only? -> Tier 1, tst::Fixture / SimpleDbFixture
143+
Does it need STA/resizer/router state? -> Tier 1, tst::IntegratedFixture
144+
Is it only proving a binding marshals? -> Tier 2, smoke test (PASSFAIL)
145+
Does it genuinely span multiple tools? -> Tier 3, flow test (golden, used sparingly)
146+
```
147+
148+
## Runbook: retiring a batch of golden tests
149+
150+
**Rule: the C++ test and the removal of the Tcl/Python test(s) it replaces land
151+
in the same change.** Once a behavior is covered by a C++ unit test, the golden
152+
test that previously pinned it is redundant and should be deleted in that same
153+
commit -- not left behind "for safety" and not deferred to a later cleanup. The
154+
one gate is coverage equivalence (step 5): you only delete what the C++ test now
155+
covers.
156+
157+
A "batch" is one such reviewable change: it adds C++ tests for a small, related
158+
group of behaviors and removes the golden tests they supersede. Keep batches
159+
small enough to review in one sitting -- a few related tests, not a whole module
160+
at once.
161+
162+
1. **Pick a target.** Prioritize by pain: flaky/slow/frequently-broken tests,
163+
tests for code you are already modifying, or a tightly-related cluster (e.g.
164+
all `buffer_ports*`). Avoid the curated Tier-3 flow canaries -- those stay.
165+
2. **Characterize what each golden test actually verifies.** Read the `.tcl`/`.py`
166+
and its `.ok`/`.defok`/`.vok`. Write down the *intent* ("inserts a buffer on
167+
each output port", "rejects nets wider than the layer max"), not the byte
168+
diff. This intent is what the C++ test will assert.
169+
3. **Choose the fixture** via the decision tree above -- the lightest one that
170+
exposes the dependency. If the unit needs the whole pipeline to test one
171+
calculation, refactor the kernel to take its data directly first (a separate,
172+
prior change), then test it with no/low fixture.
173+
4. **Write the C++ unit test(s)** with semantic assertions on observable state.
174+
One behavior per `TEST_F`; build inputs with `makeInst`/`makeBTerm`/`makeNets`
175+
or `createMaster*` rather than checked-in DEFs. Add a builder helper if the
176+
same construction recurs. Register in BOTH build systems.
177+
5. **Confirm coverage is equal-or-better.** The C++ tests must cover every
178+
behavior the golden test pinned (use the intent list from step 2 as a
179+
checklist). A C++ test that asserts *more* (e.g. an error/rejection path the
180+
golden never reached) is the goal. If some behavior genuinely cannot be
181+
re-expressed in C++, keep that one golden test and note why -- but still
182+
remove the rest of the batch.
183+
6. **Delete the superseded Tcl/Python test(s) and their artifacts, and
184+
de-register from BOTH build systems** -- in this same change. Remove the
185+
`.tcl`/`.py` and its `.ok`/`.defok`/`.vok`; drop the name from
186+
`src/<module>/test/CMakeLists.txt` (`or_integration_tests`) **and**
187+
`src/<module>/test/BUILD` (`regression_test`). Remove checked-in DEF/LEF data
188+
only if nothing else references it (grep first -- fixtures like Nangate45 and shared `data/` files
189+
are used by many tests).
190+
7. **Check residual binding coverage.** Removing a *golden* test does not have to
191+
mean the command loses all Tcl/Python exercise: a command is often also
192+
touched by a broader binding test (e.g. odb's `test_inst.py`/`test_inst.tcl`).
193+
If the deleted test was the *only* thing invoking that command from a binding,
194+
add a thin Tier-2 smoke test (`PASSFAIL`, trivial design) so the entry point
195+
stays exercised. Note in the commit where binding coverage now lives.
196+
8. **Verify.** Run the module's C++ tests and remaining regressions in both
197+
builds (`ctest`/`make test` and the Bazel target) to confirm the new tests
198+
pass and nothing references the removed names.
199+
200+
> The discipline that keeps this safe: deletion is gated on coverage
201+
> equivalence (step 5), and every add/remove touches CMake + Bazel together (a
202+
> half-registered or half-deregistered test breaks Bazel CI while passing local
203+
> `make test`).
204+
205+
## Registration reminder
206+
207+
Every new test -- C++ or Tcl -- must be registered in **both** CMake and Bazel.
208+
Forgetting the Bazel `BUILD` entry passes local `make test` but breaks Bazel CI.
209+
See `testing.md` for the exact macros (`or_integration_tests` / `regression_test`
210+
/ `cc_test` + `gtest_discover_tests`).

docs/agents/testing.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,10 @@
22

33
OpenROAD has two types of tests: **integration tests** (Tcl) and **unit tests** (C++).
44

5+
> For **where** new tests should live (the C++-first test pyramid) and **how** to
6+
> handle DB/STA dependencies, see `testing-strategy.md`. This guide covers the
7+
> mechanics of writing and registering a test.
8+
59
## CAUTION -- Dual Registration Required
610

711
**Every new test MUST be registered in BOTH build systems. Missing either one is a build break.**

0 commit comments

Comments
 (0)