Use an intercepter interface to allow GPU trace matching AFTER symbolization by gnurizen · Pull Request #212 · parca-dev/opentelemetry-ebpf-profiler

gnurizen · 2026-02-20T19:24:41Z

Symbolize GPU traces before kernel timing fixup

GPU samples can sit as raw traces for awhile waiting for the fixer to
match them with GPU timing information, during this time pointers in
the raw traces could grow stale due to functional program GC'ing
activation records. Avoid this by doing trace symbolizing before
parking traces in the fixer maps.

This has the nice side affect of removing some channel indirection
and now traces go straight into the fixer maps and when matched they
go straight to ReportTraceEvent.

Move CUDA symbolization earlier in the pipeline: ConvertTrace now
handles CUDA frames directly, and parcagpu.Start returns a
TraceInterceptor instead of a filtered channel. The interceptor
diverts symbolized CUDA traces into the GPU fixer post-ConvertTrace,
and completed traces (with timing and kernel name) are reported
directly. This eliminates the Symbolize method on the CUDA
interpreter in favor of demangling in prepTrace.

Copilot

Pull request overview

This PR restructures the CUDA/GPU trace pipeline so traces are symbolized before being parked for GPU timing correlation, using a new interceptor hook after ConvertTrace to divert CUDA traces into the GPU “fixer” and report completed traces directly.

Changes:

Add TraceInterceptor support to tracehandler and update call sites/tests.
Move CUDA frame handling into ProcessManager.ConvertTrace (no longer relying on CUDA interpreter Symbolize for demangling).
Refactor parcagpu + CUDA fixer to accept symbolized traces, attach timing/kernel-name later, recompute hashes, and emit completed traces.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
tracehandler/tracehandler.go	Adds interceptor hook and threads it through handler construction/start.
tracehandler/tracehandler_test.go	Adds interceptor behavior tests and updates `Start` call signature.
processmanager/manager.go	Handles CUDA frames directly during `ConvertTrace` (preserving correlation ID encoding).
parcagpu/parcagpu.go	Reworks timing reader and returns an interceptor that diverts CUDA traces post-`ConvertTrace`.
interpreter/gpu/cuda.go	Refactors CUDA fixer to store symbolized traces, attach kernel timing/name, recompute hash, and return completed outputs.
internal/controller/controller.go	Updates `tracehandler.Start` signature usage (currently still passes `nil` interceptor).

Comments suppressed due to low confidence (1)

parcagpu/parcagpu.go:79

This select loop calls eventReader.ReadInto() in the default branch. ReadInto blocks, so the goroutine won’t service logTicker/clearTicker ticks (or ctx cancellation) while it’s blocked waiting for events. Consider using Reader.SetDeadline / timed reads, or reading perf events in a dedicated goroutine and sending them over a channel so the outer loop can select on ctx/tickers reliably.

			case <-ctx.Done():
				return
			default:
				if err := eventReader.ReadInto(&data); err != nil {
					readErrorCount.Add(1)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

gnurizen · 2026-02-25T19:29:17Z

Converting back to draft as we're looking at doing the demangling server side

Add a TraceInterceptor callback that is invoked after ConvertTrace on cache-miss. When the interceptor returns true the trace is consumed (skipped for caching and reporting), allowing callers like the GPU subsystem to divert specific traces for further processing. Includes tests covering consume, pass-through, mixed, and non-caching behavior.

CUDA stack can sit at raw traces for awhile waiting for the fixer to match them with GPU timing information, during this time pointers in the raw traces could grow stale due to functional program GC'ing activation records. Avoid this by doing trace symbolizing before parking traces in the fixer maps. This has the nice side affect of removing some channel indirection and now traces so straight into the fixer maps and when matched they go straight to ReportTraceEvent. Move CUDA symbolization earlier in the pipeline: ConvertTrace now handles CUDA frames directly, and parcagpu.Start returns a TraceInterceptor instead of a filtered channel. The interceptor diverts symbolized CUDA traces into the GPU fixer post-ConvertTrace, and completed traces (with timing and kernel name) are reported directly. This eliminates the Symbolize method on the CUDA interpreter in favor of demangling in prepTrace.

gnurizen changed the title ~~cuda sym first~~ Use an intercepter interface to allow CUDA trace matching AFTER symbolization and simplify excess channeling Feb 20, 2026

gnurizen changed the title ~~Use an intercepter interface to allow CUDA trace matching AFTER symbolization and simplify excess channeling~~ Use an intercepter interface to allow CUDA trace matching AFTER symbolization and simplify channeling Feb 20, 2026

gnurizen force-pushed the cuda-sym-first branch from 2438c7b to 57eab74 Compare February 20, 2026 20:39

gnurizen changed the title ~~Use an intercepter interface to allow CUDA trace matching AFTER symbolization and simplify channeling~~ Use an intercepter interface to allow CUDA trace matching AFTER symbolization and simplify channels Feb 20, 2026

gnurizen force-pushed the cuda-sym-first branch from 57eab74 to 4cd2b0b Compare February 20, 2026 20:47

gnurizen requested review from brancz and umanwizard February 20, 2026 21:24

gnurizen changed the title ~~Use an intercepter interface to allow CUDA trace matching AFTER symbolization and simplify channels~~ Use an intercepter interface to allow CUDA trace matching AFTER symbolization Feb 21, 2026

gnurizen force-pushed the cuda-sym-first branch from 4cd2b0b to 42bac4a Compare February 21, 2026 10:00

gnurizen marked this pull request as ready for review February 23, 2026 19:43

gnurizen changed the title ~~Use an intercepter interface to allow CUDA trace matching AFTER symbolization~~ Use an intercepter interface to allow GPU trace matching AFTER symbolization Feb 23, 2026

umanwizard approved these changes Feb 24, 2026

View reviewed changes

gnurizen requested a review from Copilot February 24, 2026 20:18

Copilot started reviewing on behalf of gnurizen February 24, 2026 20:19 View session

Copilot AI reviewed Feb 24, 2026

View reviewed changes

gnurizen force-pushed the cuda-sym-first branch 2 times, most recently from 536ed20 to f2d2f6c Compare February 25, 2026 15:50

gnurizen marked this pull request as draft February 25, 2026 19:28

gnurizen force-pushed the cuda-sym-first branch from 89cea35 to 05dde60 Compare February 25, 2026 23:44

gnurizen added 6 commits February 26, 2026 14:56

Fix duplicate metric errors by using AddSlice instead of Add

ee8380e

Let GPU samples be cached and fix bogus frame 0 references

b28a9e5

Add tests for GPU sample handling

59dd22b

Remove demangling in favor of backend demanging

c5d6fd7

gnurizen force-pushed the cuda-sym-first branch from 05dde60 to c5d6fd7 Compare February 26, 2026 19:57

gnurizen marked this pull request as ready for review February 26, 2026 20:07

gnurizen requested a review from umanwizard February 26, 2026 20:08

gnurizen merged commit ede3a25 into main Feb 27, 2026
46 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use an intercepter interface to allow GPU trace matching AFTER symbolization#212

Use an intercepter interface to allow GPU trace matching AFTER symbolization#212
gnurizen merged 6 commits into
mainfrom
cuda-sym-first

gnurizen commented Feb 20, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gnurizen commented Feb 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

gnurizen commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gnurizen commented Feb 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gnurizen commented Feb 20, 2026 •

edited

Loading