Skip to content

Commit daa0fbe

Browse files
authored
Merge pull request #10757 from The-OpenROAD-Project-staging/docs-testing-strategy
docs: cover binding tests for expensive commands
2 parents d679188 + 713b984 commit daa0fbe

1 file changed

Lines changed: 90 additions & 1 deletion

File tree

docs/agents/testing-strategy.md

Lines changed: 90 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,11 @@ where algorithmic correctness, edge cases, and regression-bug pins belong.
6363
Their *only* job is to prove the Tcl/Python binding marshals arguments and
6464
returns without crashing -- not to validate the algorithm. Call the command once
6565
on a trivial design and assert it ran. Prefer `PASSFAIL_TESTS` (exit-code only)
66-
over golden diffs so there is no `.ok` file to maintain.
66+
over golden diffs so there is no `.ok` file to maintain. For commands whose
67+
execution is expensive (e.g. `detailed_route`), running them even once is the
68+
wrong cost trade-off -- see [Binding tests for expensive
69+
commands](#binding-tests-for-expensive-commands) for how to prove translation
70+
without executing the work.
6771

6872
### Tier 3 -- Full-flow integration tests (keep deliberately few)
6973

@@ -135,13 +139,98 @@ The existing fixture stack already supports this:
135139
`.tcl`/`.i` files and flag any without a smoke test, guaranteeing binding
136140
coverage without hand-curation.
137141

142+
### Binding tests for expensive commands
143+
144+
The Tier-2 recipe -- "call the command once on a trivial design" -- assumes the
145+
command is cheap to run. For commands that do heavy work (`global_route`,
146+
`global_placement`, `clock_tree_synthesis`, `detailed_route`, ...), executing the
147+
algorithm contributes *nothing* to the binding guarantee and costs
148+
seconds-to-minutes per test. The translation contract you actually want to pin is
149+
narrow: every flag/key reaches the right C++ parameter and defaults are applied.
150+
None of that requires the algorithm to run.
151+
152+
**Preferred policy: validate arguments in C++, not in the binding.** A value
153+
check written in a `.tcl` proc (`sta::check_positive_integer`, range/cardinality
154+
guards) only protects the Tcl entry point -- the Python binding and direct C++
155+
callers bypass it, so the check has to be duplicated or is simply missing. Put
156+
the check behind the C++ entry point instead and one implementation covers all
157+
three usages. `utl::Validator` (`src/utl/include/utl/validation.h`) exists to make
158+
this easy: construct it with a `Logger*` and `ToolId`, then call
159+
`check_positive` / `check_non_negative` / `check_range` / `check_percentage` /
160+
`check_non_null`, each of which emits a tool-scoped logged error on violation. Use
161+
it in the engine's argument-ingestion path (e.g. where parameters are set) rather
162+
than re-deriving the same guard per language. This also keeps the `.tcl`/`.py`
163+
proc to near-pure marshaling, which shrinks what the binding test must cover --
164+
and the validation itself becomes a cheap Tier-1 C++ test that exercises the error
165+
paths directly, with no process launch.
166+
167+
The key observation is that a command's .tcl proc or .py function does two separable
168+
things: it **configures** the engine from the parsed arguments, then calls a
169+
distinct **execute** entry point that does the expensive work. The execute step
170+
is almost always a single thin SWIG free function -- `grt::global_route`,
171+
`cts::run_triton_cts`, the `gpl::replace_*_cmd` calls. Because it is a plain proc
172+
in the tool's namespace, a binding test can rename it (in Tcl) or reassign/mock it (in Python) to a no-op spy, then
173+
invoke the *real* public command. All the argument handling runs; the engine does
174+
not, so the test is sub-millisecond.
175+
176+
Mind the proc's preconditions, though. Invoking the real command also runs any
177+
guards that sit *before* the execute call, and many commands require a loaded
178+
design: `global_route` errors `GRT-0051`/`GRT-0052` on a missing tech/block
179+
(`src/grt/src/GlobalRouter.tcl`) and `clock_tree_synthesis` errors `CTS-0103` on
180+
a missing block (`src/cts/src/TritonCTS.tcl`) before their execute calls are ever
181+
reached. So a no-design spy test for those fails on the guard, not on the spy.
182+
Give the test the *minimal* DB the proc's preconditions demand -- a tiny LEF/DEF
183+
or a `SimpleDbFixture`-style block is enough, since the expensive *algorithm*
184+
still never runs. (A command with no such precondition can be spied with no
185+
design loaded at all.) If you instead want to assert
186+
that a precondition guard itself fires, that is a separate, cheaper test: invoke
187+
the command with the precondition unmet and check the error code -- no spy needed,
188+
because the guard errors out before the execute call regardless.
189+
190+
What you assert depends on where the configure logic lives, which varies by
191+
command:
192+
193+
- **Setters, then a separate execute (e.g. `global_route`, `clock_tree_synthesis`).**
194+
The proc translates each flag/key into its own cheap `set_*` SWIG call
195+
(`grt::set_infinite_cap`, `cts::set_insertion_delay`, ...) before the execute
196+
call. Spy *only* the execute; let the setters run for real and assert the
197+
resulting configured state via getters (or spy the individual setters and check
198+
they were called with the right values). This is the most common shape.
199+
- **Arguments forwarded to C++ (e.g. `global_placement`).** The proc passes the
200+
raw key/flag arrays to a C++ command function that parses them itself. This is
201+
the shape the validate-in-C++ policy points toward: parsing *and* `utl::Validator`
202+
checks live in one place that all bindings share, so a C++ unit test on that
203+
parsing/validation is the natural binding check; spying the execute still lets
204+
the proc reach it without running the placer.
205+
- **Configure and execute fused (e.g. `detailed_route`).** A single
206+
`detailed_route_cmd` both marshals its arguments and calls `main()`. Spying it
207+
skips the run, so capture the arguments the spy received and assert them. Better,
208+
split the marshaling (setParams) from execution (main) so the cheap part is
209+
independently reachable -- this is the refactor that makes the command match the
210+
others, and lets a C++ test assert a `getParams()` round-trip directly.
211+
212+
In every case the heavy `main()`/run is off the translation path, so the test
213+
costs nothing at runtime. As with cheap commands, the C++ unit test remains the
214+
source of truth for algorithmic correctness -- do not assert behavior here.
215+
216+
The reusable design principle: **validate arguments in C++, and keep the
217+
expensive execute step as its own thin free function distinct from argument
218+
handling.** Validation in C++ covers Tcl, Python, and C++ callers from one place;
219+
a separate execute step keeps the heavy work off the translation path. Most
220+
commands already separate execute; a fused entry point like `detailed_route_cmd`
221+
is the outlier worth refactoring. Commands built this way are cheap to
222+
binding-test regardless of how expensive their execution is. (Free-function entry
223+
points also matter because a method on a SWIG object is much harder to intercept
224+
than a namespaced proc.)
225+
138226
## Decision tree for a new test
139227

140228
```
141229
Is it pure logic with no DB? -> Tier 1, no fixture
142230
Does it need an odb DB only? -> Tier 1, tst::Fixture / SimpleDbFixture
143231
Does it need STA/resizer/router state? -> Tier 1, tst::IntegratedFixture
144232
Is it only proving a binding marshals? -> Tier 2, smoke test (PASSFAIL)
233+
...and the command is expensive to run? -> Tier 2, intercept the C++ entry (no execution)
145234
Does it genuinely span multiple tools? -> Tier 3, flow test (golden, used sparingly)
146235
```
147236

0 commit comments

Comments
 (0)