|
| 1 | +# OpenROAD Testing Strategy |
| 2 | + |
| 3 | +This document defines **where** new tests should live and **how** to handle the |
| 4 | +dependencies that make C++ unit testing of OpenROAD hard. It complements |
| 5 | +`testing.md`, which covers the mechanics of writing and registering a test. |
| 6 | + |
| 7 | +## Goal |
| 8 | + |
| 9 | +Move the bulk of *functional correctness* coverage into minimally-scoped C++ |
| 10 | +unit tests, while keeping a thin layer of Tcl/Python tests that prove the |
| 11 | +bindings work and a small curated set of full-flow tests that prove the stages |
| 12 | +compose. Core logic should be tested with semantic assertions, not golden-file |
| 13 | +diffs. |
| 14 | + |
| 15 | +> **Policy.** The end state is to **migrate off** the bulk of the Tcl/Python |
| 16 | +> golden tests, not keep them indefinitely. But that migration is a large, |
| 17 | +> incremental effort delivered as a long series of small steps -- never a single |
| 18 | +> big-bang rewrite. Two things happen in parallel: |
| 19 | +> |
| 20 | +> 1. **New tests** default to C++ unit tests now (this strategy's pyramid). |
| 21 | +> 2. **Existing golden tests** are retired in batches as their coverage is |
| 22 | +> re-expressed in C++ -- the C++ test and the removal of the Tcl/Python test |
| 23 | +> it replaces land in the *same* change. Prioritized by pain (flaky/slow/ |
| 24 | +> frequently-broken modules and code you are already touching), with each |
| 25 | +> batch a reviewable change of its own. |
| 26 | +> |
| 27 | +> A golden test is only deleted once equivalent or better coverage exists in |
| 28 | +> C++; don't drop coverage to hit a ratio. Until then it stays. |
| 29 | +
|
| 30 | +## Why move off golden-file tests |
| 31 | + |
| 32 | +The repo today is ~1,674 Tcl + ~265 Python regression tests vs. ~55 C++ unit |
| 33 | +tests. Most functionality is verified by reading LEF/DEF, running a command, and |
| 34 | +`diff`-ing a golden `.ok`/`.defok`/`.vok` file. That model has structural |
| 35 | +weaknesses: |
| 36 | + |
| 37 | +1. **It tests serialization, not logic.** A passing diff means bytes matched, not |
| 38 | + that the result is *correct*. When a buffer count changes you learn something |
| 39 | + differed, not whether the new number is right. |
| 40 | +2. **It couples to unrelated behavior.** `diff_file` breaks on any upstream |
| 41 | + formatting/ordering change, and it only reports the first difference (so |
| 42 | + golden regen must be wholesale). |
| 43 | +3. **No fault isolation.** A flow test exercises the readers + STA + the unit |
| 44 | + under test at once; a failure could be anywhere. |
| 45 | +4. **It's slow.** Each test spawns a full `openroad` process and re-parses the |
| 46 | + Nangate45/Sky130 libraries. |
| 47 | + |
| 48 | +C++ unit tests built on the existing fixtures fix all four: semantic assertions, |
| 49 | +no file coupling, single-unit scope, sub-millisecond setup. |
| 50 | + |
| 51 | +## The test pyramid |
| 52 | + |
| 53 | +Aim every **new** test at the lowest tier that can express it. |
| 54 | + |
| 55 | +### Tier 1 -- C++ unit tests (default; target the majority of new tests) |
| 56 | + |
| 57 | +One function/class, inputs built programmatically, assertions on observable |
| 58 | +state (`EXPECT_EQ(net->getITermCount(), 3)`), not on a serialized dump. This is |
| 59 | +where algorithmic correctness, edge cases, and regression-bug pins belong. |
| 60 | + |
| 61 | +### Tier 2 -- Binding smoke tests (thin: ~1 per public command) |
| 62 | + |
| 63 | +Their *only* job is to prove the Tcl/Python binding marshals arguments and |
| 64 | +returns without crashing -- not to validate the algorithm. Call the command once |
| 65 | +on a trivial design and assert it ran. Prefer `PASSFAIL_TESTS` (exit-code only) |
| 66 | +over golden diffs so there is no `.ok` file to maintain. |
| 67 | + |
| 68 | +### Tier 3 -- Full-flow integration tests (keep deliberately few) |
| 69 | + |
| 70 | +A curated handful per module that prove the stages compose end-to-end on a real |
| 71 | +design (e.g. gcd, aes). These are the "the flow still works" canaries. You want |
| 72 | +dozens across the project, not hundreds -- and this small curated set is the |
| 73 | +*permanent* home of golden-file testing. Most of today's golden tests are really |
| 74 | +unit-logic tests wearing a flow-test costume: those are the migration target |
| 75 | +(demote to Tier 1), not this canary layer. Resist adding new Tier-3 tests when a |
| 76 | +Tier-1 test would do. |
| 77 | + |
| 78 | +The migration shape is therefore: **a new feature lands as Tier-1 semantic tests |
| 79 | +plus one Tier-2 smoke test, instead of a new Tier-3 golden test.** |
| 80 | + |
| 81 | +## Handling dependencies |
| 82 | + |
| 83 | +This is the crux. Pick the **lightest fixture that exposes the dependency the |
| 84 | +unit actually needs.** |
| 85 | + |
| 86 | +| Dependency of the unit under test | Fixture / approach | Cost | |
| 87 | +| --- | --- | --- | |
| 88 | +| None -- pure algorithm/geometry/graph kernel | No fixture; plain structs + `odb` geom primitives | trivial | |
| 89 | +| An `odb` database (cells, nets, placement) | `tst::Fixture` / `odb::SimpleDbFixture` + `makeInst`/`makeBTerm`/`makeNets` | sub-ms | |
| 90 | +| STA timing | `tst::IntegratedFixture` (kNangate45 / kSky130hd) | ms (libs loaded once) | |
| 91 | +| Genuine cross-tool composition | Keep as a Tier-3 flow test | full process | |
| 92 | + |
| 93 | +The existing fixture stack already supports this: |
| 94 | + |
| 95 | +- **`tst::Fixture`** (`src/tst/include/tst/fixture.h`) -- owns `db_`, `sta_`, |
| 96 | + `logger_`; provides `loadTechLef`/`loadLibaryLef`/`readLiberty` and the netlist |
| 97 | + builders `makeInst`, `makeBTerm`, `makeNets`. The header explicitly states |
| 98 | + these are meant to make C++ setup "competitive with writing a Verilog or DEF |
| 99 | + test case by hand." |
| 100 | +- **`odb::SimpleDbFixture`** (`src/odb/test/cpp/helper/helper.h`) -- pre-builds a |
| 101 | + minimal tech/lib/chip/block and `createMaster*` helpers. |
| 102 | +- **`tst::IntegratedFixture`** (`src/tst/include/tst/IntegratedFixture.h`) -- wires |
| 103 | + `sta_`, `resizer_`, `dp_`, `grt_`, `ant_`, `stt_`, `ep_` against real libs and |
| 104 | + offers `readVerilogAndSetup`. This is how `rsz` and `dbSta` already test; use |
| 105 | + them as the reference pattern. |
| 106 | + |
| 107 | +### Principles |
| 108 | + |
| 109 | +1. **Depend on data/interfaces, not the whole pipeline.** If a function needs a |
| 110 | + fully placed-and-routed DB just to test one calculation, that is a design |
| 111 | + smell. Refactor the kernel to take the data it needs (a struct, a span, an |
| 112 | + `odb` geometry) so it can be tested with no fixture. *This |
| 113 | + refactor-for-testability is the single highest-leverage move* -- it improves |
| 114 | + the code and makes the test trivial. |
| 115 | +2. **Builders over checked-in files.** Every time a test needs "a block with a |
| 116 | + row of 3 placed cells," that should be a fixture method, not a new DEF. |
| 117 | + Growing per-module builder helpers (the `SimpleDbFixture` pattern, extended to |
| 118 | + dpl/grt/cts/...) is the shared library that makes Tier-1 cheap. Invest here. |
| 119 | +3. **Inject dependencies.** Prefer constructor/parameter injection so a unit can |
| 120 | + be handed a minimal hand-built DB (or a narrow fake of an *external* tool's |
| 121 | + output) instead of discovering global state. |
| 122 | +4. **Do not mock `odb`.** It is the lingua franca, it is cheap to instantiate, |
| 123 | + and faking it is more work than building a tiny real one. Fake/stub only |
| 124 | + expensive *external* tools, and only when the unit needs a narrow slice of |
| 125 | + their output. |
| 126 | + |
| 127 | +## Keeping binding tests minimal but honest |
| 128 | + |
| 129 | +- One smoke test per public Tcl command and per Python command, on a trivial |
| 130 | + design, asserting invocation + basic return marshaling. Convert to |
| 131 | + `PASSFAIL_TESTS` where possible. |
| 132 | +- Treat the C++ test as the source of truth for correctness; bindings prove |
| 133 | + plumbing only. Do **not** duplicate algorithm assertions across Tcl + Python. |
| 134 | +- A future linter (not yet implemented) could enumerate public commands from the |
| 135 | + `.tcl`/`.i` files and flag any without a smoke test, guaranteeing binding |
| 136 | + coverage without hand-curation. |
| 137 | + |
| 138 | +## Decision tree for a new test |
| 139 | + |
| 140 | +``` |
| 141 | +Is it pure logic with no DB? -> Tier 1, no fixture |
| 142 | +Does it need an odb DB only? -> Tier 1, tst::Fixture / SimpleDbFixture |
| 143 | +Does it need STA/resizer/router state? -> Tier 1, tst::IntegratedFixture |
| 144 | +Is it only proving a binding marshals? -> Tier 2, smoke test (PASSFAIL) |
| 145 | +Does it genuinely span multiple tools? -> Tier 3, flow test (golden, used sparingly) |
| 146 | +``` |
| 147 | + |
| 148 | +## Runbook: retiring a batch of golden tests |
| 149 | + |
| 150 | +**Rule: the C++ test and the removal of the Tcl/Python test(s) it replaces land |
| 151 | +in the same change.** Once a behavior is covered by a C++ unit test, the golden |
| 152 | +test that previously pinned it is redundant and should be deleted in that same |
| 153 | +commit -- not left behind "for safety" and not deferred to a later cleanup. The |
| 154 | +one gate is coverage equivalence (step 5): you only delete what the C++ test now |
| 155 | +covers. |
| 156 | + |
| 157 | +A "batch" is one such reviewable change: it adds C++ tests for a small, related |
| 158 | +group of behaviors and removes the golden tests they supersede. Keep batches |
| 159 | +small enough to review in one sitting -- a few related tests, not a whole module |
| 160 | +at once. |
| 161 | + |
| 162 | +1. **Pick a target.** Prioritize by pain: flaky/slow/frequently-broken tests, |
| 163 | + tests for code you are already modifying, or a tightly-related cluster (e.g. |
| 164 | + all `buffer_ports*`). Avoid the curated Tier-3 flow canaries -- those stay. |
| 165 | +2. **Characterize what each golden test actually verifies.** Read the `.tcl`/`.py` |
| 166 | + and its `.ok`/`.defok`/`.vok`. Write down the *intent* ("inserts a buffer on |
| 167 | + each output port", "rejects nets wider than the layer max"), not the byte |
| 168 | + diff. This intent is what the C++ test will assert. |
| 169 | +3. **Choose the fixture** via the decision tree above -- the lightest one that |
| 170 | + exposes the dependency. If the unit needs the whole pipeline to test one |
| 171 | + calculation, refactor the kernel to take its data directly first (a separate, |
| 172 | + prior change), then test it with no/low fixture. |
| 173 | +4. **Write the C++ unit test(s)** with semantic assertions on observable state. |
| 174 | + One behavior per `TEST_F`; build inputs with `makeInst`/`makeBTerm`/`makeNets` |
| 175 | + or `createMaster*` rather than checked-in DEFs. Add a builder helper if the |
| 176 | + same construction recurs. Register in BOTH build systems. |
| 177 | +5. **Confirm coverage is equal-or-better.** The C++ tests must cover every |
| 178 | + behavior the golden test pinned (use the intent list from step 2 as a |
| 179 | + checklist). A C++ test that asserts *more* (e.g. an error/rejection path the |
| 180 | + golden never reached) is the goal. If some behavior genuinely cannot be |
| 181 | + re-expressed in C++, keep that one golden test and note why -- but still |
| 182 | + remove the rest of the batch. |
| 183 | +6. **Delete the superseded Tcl/Python test(s) and their artifacts, and |
| 184 | + de-register from BOTH build systems** -- in this same change. Remove the |
| 185 | + `.tcl`/`.py` and its `.ok`/`.defok`/`.vok`; drop the name from |
| 186 | + `src/<module>/test/CMakeLists.txt` (`or_integration_tests`) **and** |
| 187 | + `src/<module>/test/BUILD` (`regression_test`). Remove checked-in DEF/LEF data |
| 188 | + only if nothing else references it (grep first -- fixtures like Nangate45 and shared `data/` files |
| 189 | + are used by many tests). |
| 190 | +7. **Check residual binding coverage.** Removing a *golden* test does not have to |
| 191 | + mean the command loses all Tcl/Python exercise: a command is often also |
| 192 | + touched by a broader binding test (e.g. odb's `test_inst.py`/`test_inst.tcl`). |
| 193 | + If the deleted test was the *only* thing invoking that command from a binding, |
| 194 | + add a thin Tier-2 smoke test (`PASSFAIL`, trivial design) so the entry point |
| 195 | + stays exercised. Note in the commit where binding coverage now lives. |
| 196 | +8. **Verify.** Run the module's C++ tests and remaining regressions in both |
| 197 | + builds (`ctest`/`make test` and the Bazel target) to confirm the new tests |
| 198 | + pass and nothing references the removed names. |
| 199 | + |
| 200 | +> The discipline that keeps this safe: deletion is gated on coverage |
| 201 | +> equivalence (step 5), and every add/remove touches CMake + Bazel together (a |
| 202 | +> half-registered or half-deregistered test breaks Bazel CI while passing local |
| 203 | +> `make test`). |
| 204 | +
|
| 205 | +## Registration reminder |
| 206 | + |
| 207 | +Every new test -- C++ or Tcl -- must be registered in **both** CMake and Bazel. |
| 208 | +Forgetting the Bazel `BUILD` entry passes local `make test` but breaks Bazel CI. |
| 209 | +See `testing.md` for the exact macros (`or_integration_tests` / `regression_test` |
| 210 | +/ `cc_test` + `gtest_discover_tests`). |
0 commit comments