Skip to content

Rewrite parcagpu on Proton's CUPTI infrastructure with PC sampling#16

Merged
gnurizen merged 1 commit intomainfrom
proton-squash
Apr 9, 2026
Merged

Rewrite parcagpu on Proton's CUPTI infrastructure with PC sampling#16
gnurizen merged 1 commit intomainfrom
proton-squash

Conversation

@gnurizen
Copy link
Copy Markdown
Contributor

@gnurizen gnurizen commented Apr 9, 2026

Replace the standalone C CUPTI interposer with a C++ implementation
built on Proton's callback and activity APIs. This integrates CUPTI
activity tracing, kernel correlation filtering, and PC sampling with
stall reason analysis into a single shared library.

Key additions:

  • src/cupti.cpp: CUPTI activity subscriber using Proton callbacks,
    with USDT probes for activity_batch, cubin lifecycle, stall reasons,
    pc_sample_batch, and error reporting
  • src/pc_sampling.cpp: PC sampling with probabilistic windowed
    start/stop, KERNEL_SERIALIZED mode, and token-bucket rate limiting
  • src/correlation_filter.cpp: lightweight correlation ID filter for
    matching kernel activities to profiled ranges
  • ebpf/cupti_bpf.h: shared BPF-side layouts for CUPTI activity and
    PC sampling records
  • test/bpf/: eBPF activity parser test using parca-dev/usdt for USDT
    argument extraction, with cubin loading, SASS lookup, and stall
    reason display
  • test/mock_cupti.c, test/test-pc-mock.sh, test/test-pc-real.sh:
    mock and real PC sampling test harnesses
  • microbenchmarks/: CUDA micro-benchmarks for PC sampling validation

Build system switched to CMake with Proton as a git submodule.

Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com

Replace the standalone C CUPTI interposer with a C++ implementation
built on Proton's callback and activity APIs. This integrates CUPTI
activity tracing, kernel correlation filtering, and PC sampling with
stall reason analysis into a single shared library.

Key additions:
- src/cupti.cpp: CUPTI activity subscriber using Proton callbacks,
  with USDT probes for activity_batch, cubin lifecycle, stall reasons,
  pc_sample_batch, and error reporting
- src/pc_sampling.cpp: PC sampling with probabilistic windowed
  start/stop, KERNEL_SERIALIZED mode, and token-bucket rate limiting
- src/correlation_filter.cpp: lightweight correlation ID filter for
  matching kernel activities to profiled ranges
- ebpf/cupti_bpf.h: shared BPF-side layouts for CUPTI activity and
  PC sampling records
- test/bpf/: eBPF activity parser test using parca-dev/usdt for USDT
  argument extraction, with cubin loading, SASS lookup, and stall
  reason display
- test/mock_cupti.c, test/test-pc-mock.sh, test/test-pc-real.sh:
  mock and real PC sampling test harnesses
- microbenchmarks/: CUDA micro-benchmarks for PC sampling validation

Build system switched to CMake with Proton as a git submodule.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@gnurizen gnurizen merged commit 0feb8a3 into main Apr 9, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant