grt: bound soft-NDR overflow-loop restarts (#8466)#10710
grt: bound soft-NDR overflow-loop restarts (#8466)#10710saurav-fermions wants to merge 2 commits into
Conversation
When global routing finds NDR nets sitting on congested edges, it disables the NDR (GRT-0273) and restarts the entire congestion (overflow) iteration loop. With low overflow this is done one net at a time, so N congested NDR nets cost ~N full overflow loops. After CTS started applying a half-pitch NDR to every clock net by default (5ca1833), designs with many clock nets whose NDR cannot be honored hit this path repeatedly, turning a ~3 minute global route into ~20 hours. Bound the number of soft-NDR-triggered loop restarts. Once the budget is exhausted, disable every remaining congested NDR net in a single batch so the loop makes guaranteed progress and terminates, making the restart count independent of the number of NDR nets. Behavior for cases that converge within the budget (e.g. existing grt regression tests) is unchanged; the CTS half-NDR feature is left intact. Signed-off-by: Saurav Singh <saurav.singh@fermions.co>
There was a problem hiding this comment.
Welcome to OpenROAD! Thanks for opening your first PR.
Before we review:
- Contribution Guide: https://openroad.readthedocs.io/en/latest/contrib/contributing.html
- Build Instructions: https://openroad.readthedocs.io/en/latest/contrib/BuildWithCMake.html
Please ensure:
- CI passes
- Code is properly formatted
- Tests are included where applicable
A maintainer will review shortly!
There was a problem hiding this comment.
Code Review
This pull request addresses a runtime regression (issue #8466) caused by repeatedly restarting the overflow loop when disabling NDR nets one-by-one. It introduces a budget (max_soft_ndr_resets) to limit individual restarts, after which all remaining congested NDR nets are disabled in a single batch. The reviewer suggested using constexpr for the maximum reset constant and optimizing the loop that collects congested NDR nets by storing the returned vector locally and reserving capacity in the target vector to avoid unnecessary copies and reallocations.
| // number of restarts; once exceeded we disable every remaining congested | ||
| // NDR net in a single batch instead of one-per-restart. | ||
| int soft_ndr_resets = 0; | ||
| const int max_soft_ndr_resets = 4; |
There was a problem hiding this comment.
Define tunable parameters as named constants instead of using hardcoded magic numbers. Use constexpr instead of const for compile-time constants to ensure they are evaluated at compile time.
| const int max_soft_ndr_resets = 4; | |
| constexpr int max_soft_ndr_resets = 4; |
References
- Define tunable parameters as named constants instead of using hardcoded magic numbers.
| for (const auto& ndr : graph2d_.getCongestedNDRnets()) { | ||
| net_ids.push_back(ndr.net_id); | ||
| } |
There was a problem hiding this comment.
Calling graph2d_.getCongestedNDRnets() directly in the loop condition can be inefficient because getCongestedNDRnets() returns std::vector<NDRCongestion> by value, which copies the vector. Storing it in a local variable avoids any potential lifetime issues and allows us to call reserve() on net_ids to prevent multiple reallocations during the loop. Prefer this local fix over changing the method's return type to avoid breaking changes for callers.
| for (const auto& ndr : graph2d_.getCongestedNDRnets()) { | |
| net_ids.push_back(ndr.net_id); | |
| } | |
| const auto congested_ndrs = graph2d_.getCongestedNDRnets(); | |
| net_ids.reserve(congested_ndrs.size()); | |
| for (const auto& ndr : congested_ndrs) { | |
| net_ids.push_back(ndr.net_id); | |
| } |
References
- When fixing a const-correctness issue inside a const method, prefer local fixes over changing the method's return type to const to avoid breaking changes for callers.
|
@saurav-fermions I think the feature is fine. But I'm not sure Recently, I improved the conditions for applying soft-NDR. Instead of always applying it at the end of congestion iterations, we do it in the middle if there is a stagnation in the minimum overflow. But I can see this can fail in some specific scenarios |
Make max_soft_ndr_resets constexpr and hoist the batch-disable getCongestedNDRnets() result into a local with reserve() to avoid re-evaluation and vector reallocations. Addresses gemini-code-assist review on The-OpenROAD-Project#10710. Signed-off-by: Saurav Singh <saurav.singh@fermions.co>
|
Thanks. I ran the cap sweep (0, 1, 2, 4, 8, 16, and unbounded): runtime tracks the number of full overflow-loop restarts, so unbounded does roughly one disable per restart and blows up, while a small cap (0–4) gives the lowest runtime with no wirelength/via/congestion penalty — higher caps just cost more time for no quality gain, so 4 looks like a reasonable default. I couldn't run Nangate45 mempool locally (not available in my tree), so I reproduced the soft-NDR storm on a smaller wide-NDR stress design; happy to rerun on mempool if you can point me at the flow. I also made |
Summary
Bound the number of soft-NDR-triggered restarts of the global-routing
overflow-removal loop. The CTS default flip to half-NDR-on-clock-nets
(
5ca183346a) attaches a non-default routing rule to every clock net; GRT thendisables them one at a time on congestion (GRT-0273) and restarts its entire
congestion/overflow iteration loop after each disable. With many clock-NDR
nets this is O(N) full overflow loops, turning a ~3-minute global route into
~20 hours. This caps the soft-NDR restart count; the CTS feature is left intact
(not reverted).
Type of Change
Impact
Eliminates the runtime blow-up while preserving routing behavior. Measured on a
Nangate45 stress design with wide NDR on 150 nets: 19.1s → 4.4s, soft-NDR
restarts 1001 → 487. No change to the default (no-NDR) flow.
Verification
ctest -R '^grt\.'132/132 (rebuilt from source).Related Issues
Fixes #8466
Developed with SAIGE, Fermions' autonomous RTL/EDA debugging agent; root-caused, tested, and signed off by the submitter (@saurav-fermions).