Add Kahn's algorithm as parallel BFS traversal option in visitParallel by openroad-ci · Pull Request #351 · The-OpenROAD-Project/OpenSTA

openroad-ci · 2026-04-21T18:42:30Z

Replaces the per-level barrier model in BfsIterator::visitParallel with
Kahn's topological traversal for both forward (arrival) and backward
(required-time) propagation. Enabled by default via use_kahns_ flag;
original level-based BFS retained as a fallback.

The original visitParallel processes vertices one level at a time with a
finishTasks() barrier between levels. Uneven level populations leave
threads idle at barriers. Kahn's removes that: a vertex becomes eligible
for processing as soon as all its predecessors are done, so vertices at
different levels execute concurrently.

Key changes:

include/sta/Bfs.hh: Add kahnForEachSuccessor virtual (forward out-edges,
backward in-edges), persistent KahnState, use_kahns_ toggle,
kahn_pred_ (separate from search_pred_), resetLevelBounds helper.
search/Bfs.cc: Two-stage Kahn's implementation -- single-threaded
discovery with flat in-degree array indexed by vertex ID, then
recursive-dispatch parallel traversal (each ready vertex dispatched
as its own DispatchQueue task; single finishTasks at end, no batch
barriers). KahnState arrays persist across calls to avoid
re-allocation.
search/Search.cc: Wire search_adj_ as kahn_pred_ on arrival and
required iterators in the Search constructor.

Bugs caught by regressions and fixed:

Vertex IDs can exceed vertexCount() after cell deletions
(ObjectTable block-based IDs). Fix: dynamic array growth during
discovery + bounds checks. Caught by rmp.gcd_restructure.
Arrival iterator is constructed with search_pred_ = nullptr -- the
original BFS never uses it, the visitor provides its own filter.
Kahn's discovery needed the filter upfront. Fix: separate kahn_pred_
wired in Search constructor; falls back to original BFS if null.
Caught by rmp.gcd_restructure.

… level. version 1 Signed-off-by: dsengupta0628 <dsengupta@precisioninno.com>

Signed-off-by: dsengupta0628 <dsengupta@precisioninno.com>

dsengupta0628 · 2026-04-21T18:47:34Z

Document to be deleted- currently kept only to track the changes

Signed-off-by: dsengupta0628 <dsengupta@precisioninno.com>

maliberty · 2026-04-21T21:16:09Z

@codex review

maliberty · 2026-04-21T21:16:44Z

/gemini review

gemini-code-assist

Code Review

This pull request implements Kahn's algorithm for topological traversal as an alternative to the existing level-based parallel BFS in OpenSTA. The goal is to eliminate per-level barriers and improve thread utilization during timing analysis. The implementation includes a functional specification, Tcl plumbing for a new design-level toggle (sta_use_kahns_bfs), and the core logic within the BfsIterator class. Feedback focuses on potential performance bottlenecks: the fine-grained scheduling of individual vertices into the dispatch queue may introduce excessive overhead, and the current visitor implementation performs redundant enqueuing into the level-based queue that is unnecessary when Kahn's algorithm is active.

gemini-code-assist · 2026-04-21T21:21:27Z

+              dispatch_queue_->dispatch([&process, succ](size_t t) {
+                process(succ, t);
+              });


Dispatching every single ready vertex as a separate task into the DispatchQueue can lead to significant overhead and lock contention, especially for large designs with many vertices having low in-degrees. This fine-grained scheduling might negate the performance benefits of eliminating per-level barriers. Consider processing a chain of ready vertices in the current thread or batching ready vertices before dispatching to amortize the task management cost.

gemini-code-assist · 2026-04-21T21:21:27Z

+      process = [&, bfs_index, pred, in_deg_size](Vertex *vertex,
+                                                  size_t tid) {
+        vertex->setBfsInQueue(bfs_index, false);
+        visitors[tid]->visit(vertex);
+        total_visited.fetch_add(1, std::memory_order_relaxed);
+        kahnForEachSuccessor(vertex, pred, [&](Vertex *succ) {
+          VertexId sid = graph_->id(succ);
+          if (sid < in_deg_size && in_deg[sid] >= 0) {
+            int prev = in_degree[sid]
+                .fetch_sub(1, std::memory_order_acq_rel);
+            if (prev == 1) {
+              // Successor is now ready -- dispatch immediately.
+              dispatch_queue_->dispatch([&process, succ](size_t t) {
+                process(succ, t);
+              });
+            }
+          }
+        });
+      };


The ArrivalVisitor::visit method (called via visitors[tid]->visit(vertex) at line 391) unconditionally calls search_->arrivalIterator()->enqueueAdjacentVertices(...). When Kahn's algorithm is active, this enqueues successors into the level-based queue_, which is redundant because Kahn's Stage 3 already handles successor propagation via kahnForEachSuccessor. These redundant enqueue calls involve mutex locking and vector operations, adding unnecessary overhead. Furthermore, the queue_ is cleared at the end of visitParallel (line 427), meaning this work is entirely wasted. Consider gating the enqueueAdjacentVertices call in the visitor when Kahn's is enabled.

The enqueueAdjacentVertices call in ArrivalVisitor::visit is not unconditional — it is guarded by always_to_endpoints_ || arrivals_changed and is skipped entirely when arrivals do not change. The call also does not generally acquire a mutex: BfsIterator::enqueue runs an unlocked !vertex->bfsInQueue(bfs_index_) check up front, and in Kahn's mode every active vertex has that flag set during Phase 1 discovery, so for active successors enqueue returns on the flag read without acquiring queue_lock_. And the cleanup in visitParallel only clears levels up to to_level -- entries beyond that are deliberately preserved for subsequent calls, so enqueues onto those levels are not wasted.

That said, you are right that some redundant work is paid on the Kahn's path. For each visited vertex, enqueueAdjacentVertices iterates out-edges and evaluates adj_pred_->searchThru/searchTo on each;
Kahn's Stage 3 then iterates the same out-edges again via kahnForEachSuccessor. The successor enqueue calls are fast-path no-ops for active successors, but the edge iteration and predicate evaluation cost
is paid twice. For a ~100K-vertex graph with out-degree ~3, that is on the order of tens of milliseconds per visitParallel. We are not applying the gate today because (a) it has to be designed not to drop beyond-to_level enqueues, which are correct behavior and required for incremental correctness; (b) the cleanest version is a scoped in_kahns_traversal_ flag plus a check on the successor's bfsInQueue state, which adds complexity in exchange for the edge-iteration and predicate-evaluation savings only; and (c) I am yet to profile to show this as a hot spot. Keeping the visitor oblivious to which BFS algorithm is above it preserves a useful abstraction boundary -- the visitor is shared with the non-Kahn's path.

If profiling on larger designs surfaces this as material wall-time overhead, the targeted fix is well-scoped and we should revisit. This is consistent with the roadmap items already documented in the spec for reducing per-vertex overhead.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 907d5f8c64

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-21T21:22:22Z

  check_crpr_(new CheckCrpr(this))
 {
  initVars();
+  arrival_iter_->setKahnPred(search_adj_);


Preserve loop-aware predicate for Kahn arrival discovery

Use a loop-aware predicate when enabling Kahn traversal for arrivals: arrival_iter_->setKahnPred(search_adj_) binds a SearchAdj that was constructed with tag_bldr_ == nullptr, but SearchAdj::searchThru() only re-enables disabled loop edges via hasPendingLoopPaths() when a tag builder is present. In multi-threaded runs where sta_use_kahns_bfs is enabled (default) and sta_dynamic_loop_breaking is on, Kahn discovery/propagation will skip loop-qualified edges that the original ArrivalVisitor path (adj_pred_) can traverse, which can leave parts of the valid arrival cone unvisited and produce incorrect arrival/slack updates.

Useful? React with 👍 / 👎.

Hmm this is a pretty narrow scoped issue, but a bu nonetheless. We cannot use Kahn's BFS when sta_dynamic_loop_breaking is on. This feature relies on arrival tags that only emerge during propagation to decide whether an otherwise-disabled loop edge can be traversed. Kahn's needs the active subgraph and in-degrees known before propagation begins, so it cannot consult those tags. To avoid silently missing vertices, visitParallel should guard the Kahn's path with an explicit check on variables_->dynamicLoopBreaking() and falls back to the original level-based BFS whenever dynamic loop breaking is active.

Signed-off-by: dsengupta0628 <dsengupta@precisioninno.com>

dsengupta0628 · 2026-04-22T01:09:57Z

@codex review

chatgpt-codex-connector · 2026-04-22T01:10:02Z

To use Codex here, create a Codex account and connect to github.

dsengupta0628 · 2026-04-22T01:10:23Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces Kahn's algorithm for topological traversal as an alternative to the existing level-based parallel BFS in OpenSTA. The goal is to eliminate per-level synchronization barriers, allowing threads to process vertices as soon as their dependencies are met. The implementation includes a functional specification, persistent state management to minimize allocation overhead, and Tcl integration for runtime toggling. Feedback suggests further optimizing performance by moving the active_vertices vector into the persistent KahnState to avoid repeated allocations during frequent incremental updates.

gemini-code-assist · 2026-04-22T01:14:32Z

+      kahn_state_->resetPrevious();
+
+      std::vector<int> &in_deg = kahn_state_->in_degree_init;
+      std::vector<Vertex*> active_vertices;


The active_vertices vector is re-allocated on every call to visitParallel. For frequent incremental updates with small active sets, this overhead can be avoided by moving the vector into the persistent KahnState struct, similar to in_degree_init and prev_ids.

Signed-off-by: dsengupta0628 <dsengupta@precisioninno.com>

maliberty · 2026-04-22T13:30:45Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2daf7a9bb7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-22T13:41:48Z

+      level = saved_first;
+      while (levelLessOrEqual(level, saved_last)
+             && levelLessOrEqual(level, to_level)) {
+        queue_[level].clear();
+        incrLevel(level);


Preserve visitor-enqueued vertices when clearing Kahn levels

After Kahn traversal finishes, this loop clears every queued level in the original [saved_first, saved_last] window without checking whether new vertices were enqueued during visitor->visit(). ArrivalVisitor::enqueueRefPinInputDelays() can enqueue vertices that are not part of the precomputed Kahn active set, so they are removed here before being visited; because queue_[level].clear() does not reset bfsInQueue, those vertices can also remain permanently marked as in-queue and be skipped by future enqueue() calls. This can drop required arrival propagation in designs using reference-pin input delays.

Useful? React with 👍 / 👎.

I was able to create a regression to catch this issue too. Implemented a fix now

Post-Kahn cleanup was queue_[level].clear(), which dropped pointers enqueued during Stage 2 by ArrivalVisitor::enqueueRefPinInputDelays without clearing their bfsInQueue flag - leaving it stuck at true and silently no-op'ing every subsequent enqueue() for those vertices. Fix: replaced clear() with a filter that keeps entries whose bfsInQueue is still true (extracted into helper BfsIterator::dropProcessedEntries), so ref-pin-enqueued vertices survive to the next call as seeds.

chatgpt-codex-connector · 2026-04-22T13:41:48Z

+        kahnForEachSuccessor(vertex, kahn_pred_,
+                             [&](Vertex *succ) {
+          if (!levelLessOrEqual(succ->level(), to_level))
+            return;
+          VertexId sid = graph_->id(succ);


Honor clks_only frontier when discovering Kahn active vertices

The Kahn discovery phase unconditionally follows graph successors, but findClkArrivals() relies on ArrivalVisitor with clks_only=true to stop propagation at register clock endpoints (it postpones fanout instead of enqueueing it). Because discovery runs before visitor logic and ignores that stop condition, fanout beyond reg clock pins is still added to active_vertices and visited in this call, which can turn a clock-only update into broad/full arrival propagation and regress runtime/state behavior for clock-latency flows.

Useful? React with 👍 / 👎.

Level-BFS stops at register CK via postponeClkFanouts, but Kahn's Stage 1 walked successors unconditionally, discovering and visiting the full data fanout past every reg CK on clock-only passes (findClkArrivals).

Fix: added virtual VertexVisitor::stopDiscoveryAtRegClk() returning false by default, overridden in ArrivalVisitor to return clks_only_. When true, Stage 1 discovery continues past vertex->isRegClk() (and Stage 2 worker mirrors the skip for efficiency), narrowing Kahn's active set to match level-BFS exactly - Kahn's itself still runs, only the reg-CK fanout is excluded.

…stead of immediate dispatch-when-ready Signed-off-by: dsengupta0628 <dsengupta@precisioninno.com>

…l].clear() which dropped vertices that ArrivalVisitor::enqueueRefPinInputDelays had enqueued during Stage 2 without clearing their bfsInQueue flag leaving it stuck at true and permanently suppressing all future enqueue() calls for those vertices. Signed-off-by: dsengupta0628 <dsengupta@precisioninno.com>

…marked as in-queue and be skipped by future enqueue() Signed-off-by: dsengupta0628 <dsengupta@precisioninno.com>

Signed-off-by: dsengupta0628 <dsengupta@precisioninno.com>

jhkim-pii · 2026-04-22T22:02:54Z

How about using .md format for this document?

I had this initially but moved to txt as I was using Microsoft Copilot to create the func spec from that. But now that I am using Claude Cowork to make the spec, I will make it into md again.

jhkim-pii · 2026-04-22T22:08:38Z

@dsengupta0628 Looks like a big changes on STA core algorithm.
Are you starting the source code differentiation from the upstream STA now?

dsengupta0628 · 2026-04-22T22:47:22Z

@dsengupta0628 Looks like a big changes on STA core algorithm. Are you starting the source code differentiation from the upstream STA now?

It is just a different type of BFS, but sits on the same BFS iterator- so not too big a difference in architecture. Tom is also getting feedback from James regarding this. If he agrees, we can push this upstream too. If not, we just need to merge carefully in the future.

Signed-off-by: dsengupta0628 <dsengupta@precisioninno.com>

…it_skip Signed-off-by: dsengupta0628 <dsengupta@precisioninno.com>

Signed-off-by: dsengupta0628 <dsengupta@precisioninno.com>

dsengupta0628 · 2026-04-27T13:23:05Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7979ce01d9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-27T13:32:31Z

+          pred_changed[vid].store(in_deg[vid] == 0 ? 1 : 0,
+                                  std::memory_order_relaxed);


Preserve seed invalidations when visit-skip is enabled

The visit-skip initialization currently keys only on in_deg == 0, so a vertex that was explicitly seeded (queued as invalid) but has an active-set predecessor starts with pred_changed=0 and can be skipped entirely if that predecessor reports no change. In incremental/batched edits that invalidate both a fanin and its fanout, this drops the mandatory recomputation of the fanout and can leave stale timing state. Seed membership needs to be tracked independently of in-degree so all original queued seeds are forced to run visit().

Useful? React with 👍 / 👎.

dsengupta0628 added 6 commits April 15, 2026 19:27

Kahn's BFS iinitial implementation- still uses batch, but better than…

0f939c2

… level. version 1 Signed-off-by: dsengupta0628 <dsengupta@precisioninno.com>

dispatch when ready

c31983a

Signed-off-by: dsengupta0628 <dsengupta@precisioninno.com>

merged from latest OSTA master

6bf6e84

Signed-off-by: dsengupta0628 <dsengupta@precisioninno.com>

update to remove levelfinished call which was a dead code

a5b6c78

Signed-off-by: dsengupta0628 <dsengupta@precisioninno.com>

update doc of spec- to be deleted before code is merged

6b2b5e2

Signed-off-by: dsengupta0628 <dsengupta@precisioninno.com>

update doc of spec - to be deleted before code is merged

1942547

Signed-off-by: dsengupta0628 <dsengupta@precisioninno.com>

openroad-ci assigned dsengupta0628 Apr 21, 2026

dsengupta0628 requested review from maliberty, precisionmoon and tspyrou April 21, 2026 18:47

enable/disable using set sta_use_kahns_bfs 1 or 0

907d5f8

Signed-off-by: dsengupta0628 <dsengupta@precisioninno.com>

gemini-code-assist Bot reviewed Apr 21, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Apr 21, 2026

View reviewed changes

dsengupta0628 added 2 commits April 21, 2026 21:56

disable Kahns when dynamic_loop_breaking is enabled

100c885

Signed-off-by: dsengupta0628 <dsengupta@precisioninno.com>

remove design specific info from doc

2daf7a9

Signed-off-by: dsengupta0628 <dsengupta@precisioninno.com>

gemini-code-assist Bot reviewed Apr 22, 2026

View reviewed changes

add some debug info for dispatch queue and regression

d53029a

Signed-off-by: dsengupta0628 <dsengupta@precisioninno.com>

chatgpt-codex-connector Bot reviewed Apr 22, 2026

View reviewed changes

dsengupta0628 added 5 commits April 22, 2026 14:24

address variable persistence-gemini fdbk, add local batch dispatch in…

a530706

…stead of immediate dispatch-when-ready Signed-off-by: dsengupta0628 <dsengupta@precisioninno.com>

fix to address codex review regarding vertices remaining permanently …

a062f2b

…marked as in-queue and be skipped by future enqueue() Signed-off-by: dsengupta0628 <dsengupta@precisioninno.com>

fix for walk past reg/ck for clk-only pass

2368afc

Signed-off-by: dsengupta0628 <dsengupta@precisioninno.com>

updated doc to include testing strategies with vtune

9356d4b

Signed-off-by: dsengupta0628 <dsengupta@precisioninno.com>

updated doc to remove blocked content from doc

bc3791b

Signed-off-by: dsengupta0628 <dsengupta@precisioninno.com>

dsengupta0628 requested a review from jhkim-pii April 22, 2026 21:30

jhkim-pii reviewed Apr 22, 2026

View reviewed changes

dsengupta0628 added 3 commits April 23, 2026 15:22

update doc

bea4533

Signed-off-by: dsengupta0628 <dsengupta@precisioninno.com>

add the arrivals unchanged optimization under a TCL variable kahn_vis…

8a42d2f

…it_skip Signed-off-by: dsengupta0628 <dsengupta@precisioninno.com>

size-4 shard dispatch for spillover vertices and update doc

7979ce0

Signed-off-by: dsengupta0628 <dsengupta@precisioninno.com>

dsengupta0628 requested review from ahmed532 and povik April 24, 2026 14:53

chatgpt-codex-connector Bot reviewed Apr 27, 2026

View reviewed changes

dsengupta0628 closed this May 6, 2026

		pred_changed[vid].store(in_deg[vid] == 0 ? 1 : 0,
		std::memory_order_relaxed);

Uh oh!

Conversation

openroad-ci commented Apr 21, 2026

Uh oh!

dsengupta0628 commented Apr 21, 2026

Uh oh!

maliberty commented Apr 21, 2026

Uh oh!

maliberty commented Apr 21, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

dsengupta0628 Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

dsengupta0628 Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

dsengupta0628 commented Apr 22, 2026

Uh oh!

chatgpt-codex-connector Bot commented Apr 22, 2026

Uh oh!

dsengupta0628 commented Apr 22, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

maliberty commented Apr 22, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

dsengupta0628 Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

dsengupta0628 Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

dsengupta0628 Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

jhkim-pii Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

dsengupta0628 Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

jhkim-pii commented Apr 22, 2026

Uh oh!

dsengupta0628 commented Apr 22, 2026

Uh oh!

dsengupta0628 commented Apr 27, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment