Revert "[hipGraph] Add graph signal pool and remove pre/post markers for non-… (#5333)"#5951
Merged
Merged
Conversation
…for non-… (#5333)" This reverts commit f17b057. Reverting due to two issues observed in GraphBench: 1. Resource leak: per-launch GraphSignalPool was deleted via a forward- declared pointer in ~AccumulateCommand, so ~GraphSignalPool never ran and every per-launch HSA signal was leaked ("Resource leak detected by SharedSignalPool, N Signals leaked"). 2. Per-launch host overhead: shifting from the runtime's rotating signal_list_ (which reuses signals after warmup with WaitCurrent + WaitNext) to a per-launch graph signal pool reintroduces N× signal_create + signal_destroy on every hipGraphLaunch, causing measurable regression in graph_bench across straight/paths2/paths4/ full2/full4 topologies. A safer redesign that preserves PR #5333's per-launch isolation while recycling signals (mirroring the runtime pool's WaitCurrent/WaitNext discipline) is needed before re-landing. Co-authored-by: Cursor <cursoragent@cursor.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Reverts the changes from #5333 to restore the prior hipGraph signal/HW-event lifecycle and queue-tracking behavior, addressing the reported resource leak and per-launch overhead regressions in GraphBench.
Changes:
- Removes the per-launch
GraphSignalPoolplumbing (command ownership, device factory methods, and graph propagation). - Restores HW-event lifetime management via
AccumulateCommandreleasing per-device HW events after graph completion. - Reverts
HwQueueTrackergraph-pool special-casing and reintroduces marker-based synchronization around non-captured nodes in segmented graph launches.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| projects/clr/rocclr/platform/command.hpp | Drops graph signal pool fields/APIs from Command/AccumulateCommand. |
| projects/clr/rocclr/platform/command.cpp | Restores AccumulateCommand destructor to always release per-device HW events. |
| projects/clr/rocclr/device/rocm/rocvirtual.hpp | Reverts GetLastSignal/addSystemScope interface changes tied to graph pool plumbing. |
| projects/clr/rocclr/device/rocm/rocvirtual.cpp | Removes graph-pool fast path and restores runtime signal rotation behavior in tracker paths. |
| projects/clr/rocclr/device/rocm/rocdevice.hpp | Removes GraphSignalPool type and related device virtual APIs. |
| projects/clr/rocclr/device/rocm/rocdevice.cpp | Removes GraphSignalPool implementation and graph pool factory helpers. |
| projects/clr/rocclr/device/device.hpp | Removes graph pool-related forward decls and VirtualDevice/Device graph pool virtuals. |
| projects/clr/hipamd/src/hip_graph_internal.hpp | Removes graph pool propagation helpers and cached signal-count members. |
| projects/clr/hipamd/src/hip_graph_internal.cpp | Reverts segmented graph launch back to CreateHwEvents + marker-based sync around non-captured nodes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
saleelk
approved these changes
May 11, 2026
Contributor
|
Overriding as there are no new errors comparing to develop, all issues are reproduced in develop branch CI |
systems-assistant Bot
pushed a commit
to ROCm/clr
that referenced
this pull request
May 12, 2026
=?UTF-8?q?ool=20and=20remove=20pre/post=20markers=20for=20non-=E2=80=A6?= =?UTF-8?q?=20(#5333)"=20(#5951)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ## Summary Reverts #5333 (`f17b05703a`) due to two issues observed in `GraphBench`: 1. **Resource leak.** Per-launch `GraphSignalPool` was deleted via a forward-declared pointer in `~AccumulateCommand`, so `~GraphSignalPool` never ran and every per-launch HSA signal was leaked (`Resource leak detected by SharedSignalPool, N Signals leaked`). 2. **Per-launch host overhead.** Shifting from the runtime's rotating `signal_list_` (which reuses signals after warmup with `WaitCurrent` + `WaitNext`) to a per-launch graph signal pool reintroduces N× `signal_create` + `signal_destroy` on every `hipGraphLaunch`, causing measurable regression in `graph_bench` across `straight`, `paths2`, `paths4`, `full2`, `full4` topologies. A safer redesign that preserves PR #5333's per-launch isolation while recycling signals (mirroring the runtime pool's `WaitCurrent`/`WaitNext` discipline) is needed before re-landing. ## Test plan - [ ] `graph_bench` runs to completion across all five topologies with no `Signals leaked` warning. - [ ] `graph_bench` per-launch numbers match pre-PR-#5333 baseline. - [ ] Existing `hipGraph` unit tests still pass. Made with [Cursor](https://cursor.com) Co-authored-by: Cursor <cursoragent@cursor.com> [rocm-systems] ROCm/rocm-systems#5951 (commit a23a24f)
cmcknigh
pushed a commit
that referenced
this pull request
May 12, 2026
…for non-… (#5333)" (#5951) ## Summary Reverts #5333 (`f17b05703a`) due to two issues observed in `GraphBench`: 1. **Resource leak.** Per-launch `GraphSignalPool` was deleted via a forward-declared pointer in `~AccumulateCommand`, so `~GraphSignalPool` never ran and every per-launch HSA signal was leaked (`Resource leak detected by SharedSignalPool, N Signals leaked`). 2. **Per-launch host overhead.** Shifting from the runtime's rotating `signal_list_` (which reuses signals after warmup with `WaitCurrent` + `WaitNext`) to a per-launch graph signal pool reintroduces N× `signal_create` + `signal_destroy` on every `hipGraphLaunch`, causing measurable regression in `graph_bench` across `straight`, `paths2`, `paths4`, `full2`, `full4` topologies. A safer redesign that preserves PR #5333's per-launch isolation while recycling signals (mirroring the runtime pool's `WaitCurrent`/`WaitNext` discipline) is needed before re-landing. ## Test plan - [ ] `graph_bench` runs to completion across all five topologies with no `Signals leaked` warning. - [ ] `graph_bench` per-launch numbers match pre-PR-#5333 baseline. - [ ] Existing `hipGraph` unit tests still pass. Made with [Cursor](https://cursor.com) Co-authored-by: Cursor <cursoragent@cursor.com>
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Reverts #5333 (
f17b05703a) due to two issues observed inGraphBench:GraphSignalPoolwas deleted via a forward-declared pointer in~AccumulateCommand, so~GraphSignalPoolnever ran and every per-launch HSA signal was leaked (Resource leak detected by SharedSignalPool, N Signals leaked).signal_list_(which reuses signals after warmup withWaitCurrent+WaitNext) to a per-launch graph signal pool reintroduces N×signal_create+signal_destroyon everyhipGraphLaunch, causing measurable regression ingraph_benchacrossstraight,paths2,paths4,full2,full4topologies.A safer redesign that preserves PR #5333's per-launch isolation while recycling signals (mirroring the runtime pool's
WaitCurrent/WaitNextdiscipline) is needed before re-landing.Test plan
graph_benchruns to completion across all five topologies with noSignals leakedwarning.graph_benchper-launch numbers match pre-PR-[hipGraph] Add graph signal pool and remove pre/post markers for non-… #5333 baseline.hipGraphunit tests still pass.Made with Cursor