Rewrite parcagpu on Proton's CUPTI infrastructure with PC sampling#16
Merged
Rewrite parcagpu on Proton's CUPTI infrastructure with PC sampling#16
Conversation
Replace the standalone C CUPTI interposer with a C++ implementation built on Proton's callback and activity APIs. This integrates CUPTI activity tracing, kernel correlation filtering, and PC sampling with stall reason analysis into a single shared library. Key additions: - src/cupti.cpp: CUPTI activity subscriber using Proton callbacks, with USDT probes for activity_batch, cubin lifecycle, stall reasons, pc_sample_batch, and error reporting - src/pc_sampling.cpp: PC sampling with probabilistic windowed start/stop, KERNEL_SERIALIZED mode, and token-bucket rate limiting - src/correlation_filter.cpp: lightweight correlation ID filter for matching kernel activities to profiled ranges - ebpf/cupti_bpf.h: shared BPF-side layouts for CUPTI activity and PC sampling records - test/bpf/: eBPF activity parser test using parca-dev/usdt for USDT argument extraction, with cubin loading, SASS lookup, and stall reason display - test/mock_cupti.c, test/test-pc-mock.sh, test/test-pc-real.sh: mock and real PC sampling test harnesses - microbenchmarks/: CUDA micro-benchmarks for PC sampling validation Build system switched to CMake with Proton as a git submodule. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Replace the standalone C CUPTI interposer with a C++ implementation
built on Proton's callback and activity APIs. This integrates CUPTI
activity tracing, kernel correlation filtering, and PC sampling with
stall reason analysis into a single shared library.
Key additions:
with USDT probes for activity_batch, cubin lifecycle, stall reasons,
pc_sample_batch, and error reporting
start/stop, KERNEL_SERIALIZED mode, and token-bucket rate limiting
matching kernel activities to profiled ranges
PC sampling records
argument extraction, with cubin loading, SASS lookup, and stall
reason display
mock and real PC sampling test harnesses
Build system switched to CMake with Proton as a git submodule.
Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com