[CUDA Plugin EP] Add per-node attribution and explicit GPU→ORT event linkage to profiler by tianleiwu · Pull Request #28614 · microsoft/onnxruntime

tianleiwu · 2026-05-21T19:27:24Z

Summary

Implements explicit GPU→ORT event linkage and per-node attribution in the CUDA Plugin EP profiler, resolving the two known profiling limitations tracked in the plugin-EP gap analysis.

Motivation

Previously, GPU kernel events emitted by the plugin EP profiler carried only CUPTI metadata (stream, grid_*, block_*). Consumers had to rely on timestamp proximity to correlate GPU activity with ORT graph nodes — an unreliable heuristic under concurrent execution. This PR wires up the StopEvent callback to capture node identity from OrtProfilingEvent and stamps it onto GPU events during EndProfiling.

Key Changes

Plugin Profiler Header (`cuda_profiler_plugin.h`)

Added OrtNodeInfo struct holding event_name, op_name, node_index
Added std::mutex node_info_mutex_ and std::unordered_map<uint64_t, OrtNodeInfo> correlation_to_node_ to CudaPluginEpProfiler

Plugin Profiler Implementation (`cuda_profiler_plugin.cc`)

StopEventImpl: For OrtProfilingEventCategory_NODE events, reads the event name and op_name/node_index args via OrtEpApi::ProfilingEvent_* accessors; inserts into the correlation→node map under mutex. Accessor failures are non-fatal (releases OrtStatus*, continues). CUPTI pop always executes.
EndProfilingImpl: Swaps the map under mutex for lock-free iteration. For each GPU event, always appends ort_correlation_id; on map hit, also appends ort_event_name, ort_op_name, ort_node_index.

Python Test (`test_cuda_plugin_ep.py`)

Extended _run_profiling_test() to assert:
- Every GPU Kernel event carries a numeric ort_correlation_id
- ort_event_name/ort_op_name/ort_node_index appear as a group when present
- At least one attributed event maps to MatMul (the test model op)
Graceful skip behavior preserved when CUPTI is unavailable

Design Doc (`cuda_plugin_ep_design.md` §14)

Updated §14.4 to mention the annotation pass
Rewrote §14.5 table rows for StopEvent metadata and GPU→ORT linkage
Added new §14.6 "Per-Node Attribution" (map lifecycle, NODE-only rationale, worked JSON example)
Renumbered Build Configuration → §14.7, Files → §14.8

Design Decisions

No new ORT C API surface — reuses the 1.25 ProfilingEvent_* accessors
NODE-only filter — only graph-node executions populate the map; ort_op_name always means an actual ONNX op
ort_*-prefixed arg keys — avoids collision with existing CUPTI arg names
Always emit ort_correlation_id — provides explicit linkage even on map miss
std::mutex + std::unordered_map — simple, correct, low contention (StopEvent calls are serialized per-node, EndProfiling is single-threaded)

Testing Notes

Build with onnxruntime_ENABLE_CUDA_PROFILING=ON
Run: python -m pytest onnxruntime/test/python/transformers/test_cuda_plugin_ep.py -k test_session_profiling -v
With CUPTI available, verify JSON output contains ort_correlation_id on all Kernel events and ort_op_name=MatMul on at least one

…age to profiler Wire up the StopEvent callback to read NODE-category ORT profiling events via the 1.25 OrtEpApi::ProfilingEvent_* accessors, capturing the event name, op_name and node_index into a correlation → OrtNodeInfo map. In EndProfiling, stamp every GPU event with ort_correlation_id (always) and ort_event_name / ort_op_name / ort_node_index (when the map lookup hits). This resolves the two known limitations in the CUDA Plugin EP profiler: - GPU→ORT event linkage was implicit (timestamp proximity only) - No per-node attribution for GPU kernel events No new ORT C API surface is required.

Copilot

Pull request overview

This PR enhances the CUDA Plugin EP profiling output by adding explicit GPU→ORT linkage and best-effort per-node attribution, so downstream trace consumers can reliably associate CUPTI-recorded GPU kernel events with the originating ORT graph node without relying on timestamp proximity.

Changes:

Capture NODE-category ORT profiling metadata (event_name, op_name, node_index) at StopEvent time and store it keyed by ORT correlation ID.
During EndProfiling, annotate every GPU Kernel event with ort_correlation_id, and additionally attach ort_event_name/ort_op_name/ort_node_index when attribution metadata is available.
Extend the Python profiling test and update the CUDA Plugin EP design doc to reflect the new annotation pass and attribution behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File	Description
onnxruntime/test/python/transformers/test_cuda_plugin_ep.py	Adds assertions that GPU Kernel events include `ort_correlation_id` and validates grouped per-node attribution fields when present.
onnxruntime/core/providers/cuda/plugin/cuda_profiler_plugin.h	Introduces `OrtNodeInfo` plus a mutex-protected correlation→node metadata map in the CUDA Plugin EP profiler.
onnxruntime/core/providers/cuda/plugin/cuda_profiler_plugin.cc	Implements metadata capture in `StopEventImpl` and annotates CUPTI-derived GPU events in `EndProfilingImpl`.
docs/cuda_plugin_ep/cuda_plugin_ep_design.md	Documents the annotation pass, explicit linkage, and the per-node attribution map lifecycle with an example trace record.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tianleiwu requested a review from Copilot May 21, 2026 21:22

Copilot started reviewing on behalf of tianleiwu May 21, 2026 21:23 View session

Copilot AI reviewed May 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CUDA Plugin EP] Add per-node attribution and explicit GPU→ORT event linkage to profiler#28614

[CUDA Plugin EP] Add per-node attribution and explicit GPU→ORT event linkage to profiler#28614
tianleiwu wants to merge 1 commit into
mainfrom
tlwu/cuda_plugin_ep_profiling_ort_id

tianleiwu commented May 21, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tianleiwu commented May 21, 2026

Summary

Motivation

Key Changes

Plugin Profiler Header (cuda_profiler_plugin.h)

Plugin Profiler Implementation (cuda_profiler_plugin.cc)

Python Test (test_cuda_plugin_ep.py)

Design Doc (cuda_plugin_ep_design.md §14)

Design Decisions

Testing Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Plugin Profiler Header (`cuda_profiler_plugin.h`)

Plugin Profiler Implementation (`cuda_profiler_plugin.cc`)

Python Test (`test_cuda_plugin_ep.py`)

Design Doc (`cuda_plugin_ep_design.md` §14)