Skip to content

Commit 34fe315

Browse files
committed
Update README
1 parent 03e26df commit 34fe315

4 files changed

Lines changed: 87 additions & 261 deletions

File tree

Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,7 @@ test: local
7171
# Clean build artifacts
7272
clean:
7373
@echo "=== Cleaning build artifacts ==="
74-
@rm -rf build build-local bin lib zig-out .zig-cache
74+
@rm -rf build build-local bin lib
7575
@rm -rf CMakeCache.txt CMakeFiles/ cmake_install.cmake compile_commands.json
7676
@echo "Clean complete"
7777

README.md

Lines changed: 86 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -1,97 +1,134 @@
11
# parcagpu - CUPTI Profiler with USDT Probes
22

3-
CUDA profiling library that exposes kernel and graph execution events via USDT/DTRACE probes for eBPF/bpftrace monitoring.
3+
CUDA profiling library that exposes GPU activity via USDT/DTRACE probes for eBPF consumption. Captures kernel executions, PC sampling with stall reasons, and cubin module loading.
44

55
## Building
66

77
```bash
8-
make # Build everything
9-
make test # Build and run tests
10-
make clean # Clean all build artifacts
8+
make local # Build libparcagpucupti.so locally (CMake, RelWithDebInfo)
9+
make debug # Build with full debug, no optimizations
10+
make clean # Clean all build artifacts
1111
```
1212

13-
### Components Built
13+
Docker cross-compilation:
1414

15-
1. **libparcagpucupti.so** (CMake + real CUPTI)
16-
- Production library for CUDA injection
17-
- Located in `cupti/build/`
18-
- Links against real NVIDIA CUPTI
19-
20-
2. **Test Infrastructure** (Zig)
21-
- Mock CUPTI library for testing
22-
- Test program that simulates CUDA events
23-
- Located in `zig-out/`
15+
```bash
16+
make build-amd64 # Build .so for AMD64
17+
make build-arm64 # Build .so for ARM64
18+
make build-all # Both architectures
19+
make docker-push # Push multi-arch image to ghcr.io
20+
```
2421

2522
## Usage
2623

2724
### As CUDA Injection Library
2825

2926
```bash
3027
export CUDA_INJECTION64_PATH=/path/to/libparcagpucupti.so
31-
# Run your CUDA application
3228
./my_cuda_app
3329
```
3430

31+
### Environment Variables
32+
33+
| Variable | Default | Description |
34+
|---|---|---|
35+
| `PARCAGPU_DEBUG` | off | Enable debug logging |
36+
| `PARCAGPU_RATE_LIMIT` | 100 | Token-bucket rate limit for callback probes (events/sec per thread) |
37+
| `PARCAGPU_SAMPLING_FACTOR` | 18 | PC sampling period; set to 0 to disable PC sampling |
38+
| `PARCAGPU_PC_SAMPLING_PROBABILITY` | 0.01 | Probability of sampling in each interval window (0-1) |
39+
| `PARCAGPU_PC_SAMPLING_INTERVAL` | 1.0 | PC sampling interval window in seconds |
40+
3541
### Monitoring with bpftrace
3642

3743
```bash
38-
# Terminal 1: Monitor probes
3944
sudo bpftrace parcagpu.bt
45+
```
4046

41-
# Terminal 2: Run CUDA application
42-
./my_cuda_app
47+
### Monitoring with the BPF Activity Parser
48+
49+
```bash
50+
make bpf-test
51+
sudo test/bpf/activity_parser -pid <PID> -lib <path/to/libparcagpucupti.so> -v
4352
```
4453

54+
The activity parser attaches to all USDT probes via eBPF, captures events through a ring buffer, and resolves PC samples to source lines using `llvm-dwarfdump`.
55+
4556
## Testing
4657

4758
```bash
48-
# Run test suite with bpftrace monitoring
49-
make test
59+
make test # Basic mock CUPTI test (no GPU, no BPF)
60+
make test-pc-mock # Mock PC sampling with BPF activity parser (no GPU, requires root)
61+
make test-pc-real # Real PC sampling with GPU (requires root + GPU)
62+
make test-multi # test_cupti_prof + BPF activity parser in parallel (requires root)
63+
```
5064

51-
# Run test continuously (for extended monitoring)
52-
LD_LIBRARY_PATH=zig-out/lib zig-out/bin/test_cupti_prof cupti/build/libparcagpucupti.so --forever
65+
### BPF Prerequisites
66+
67+
The BPF-based tests (`test-pc-mock`, `test-pc-real`, `test-multi`) require:
68+
69+
- Root (sudo) for BPF
70+
- clang, libbpf-dev, bpftool
71+
- Go 1.21+
72+
73+
Build just the BPF activity parser:
74+
75+
```bash
76+
make generate # Compile BPF objects via bpf2go
77+
make bpf-test # generate + build the Go binary
5378
```
5479

55-
See [README_TEST.md](README_TEST.md) for detailed testing documentation.
80+
### Microbenchmarks
5681

57-
## USDT Probes
82+
CUDA microbenchmarks for testing with real hardware:
5883

59-
The library exposes two USDT probes:
84+
```bash
85+
make microbenchmarks # Build all .cu files in microbenchmarks/
86+
make test-pc-real # Run pc_sample_toy under parcagpu with BPF
87+
```
88+
89+
## USDT Probes
6090

61-
### parcagpu:kernel_executed
62-
- **arg0**: start timestamp (ns)
63-
- **arg1**: end timestamp (ns)
64-
- **arg2**: correlationId | (deviceId << 32)
65-
- **arg3**: streamId
66-
- **arg4**: kernel name (string pointer)
91+
Defined in `src/probes.d`, provider `parcagpu`:
6792

68-
### parcagpu:graph_executed
69-
- **arg0**: start timestamp (ns)
70-
- **arg1**: end timestamp (ns)
71-
- **arg2**: correlationId | (deviceId << 32)
72-
- **arg3**: streamId
73-
- **arg4**: graphId
93+
| Probe | Arguments | Description |
94+
|---|---|---|
95+
| `cuda_correlation` | correlationId, cbid, name | API callback correlation |
96+
| `kernel_executed` | start, end, correlationId, deviceId, streamId, graphId, graphNodeId, name | Kernel execution timing |
97+
| `activity_batch` | ptrs, count | Batch of CUPTI activity records |
98+
| `pc_sample_batch` | records, count | Batch of PC sampling records |
99+
| `stall_reason_map` | names, count | Stall reason name table |
100+
| `cubin_loaded` | cubinCrc, cubin, cubinSize | Module load event |
101+
| `cubin_unloaded` | cubinCrc | Module unload event |
102+
| `error` | code, message, component | Profiler error event |
74103

75104
## Requirements
76105

77-
- CUDA Toolkit (CUPTI libraries)
78-
- Zig (for building test infrastructure)
79-
- CMake (for building production library)
106+
- CUDA Toolkit (CUPTI headers/libraries)
107+
- CMake
108+
- dtrace (systemtap-sdt-dev)
80109
- bpftrace (for probe monitoring)
110+
- clang, libbpf-dev, bpftool, Go 1.21+ (for BPF tests)
81111

82112
## Directory Structure
83113

84114
```
85115
.
86116
├── Makefile # Top-level build orchestration
87-
├── build.zig # Zig build for test infrastructure
88-
├── cupti/
89-
│ ├── CMakeLists.txt # CMake build for production library
90-
│ ├── cupti-prof.c # Main profiler implementation
91-
│ └── build/ # CMake build output
117+
├── CMakeLists.txt # CMake build for library and test infrastructure
118+
├── src/
119+
│ ├── cupti.cpp # Main CUPTI profiler implementation
120+
│ ├── pc_sampling.cpp # PC sampling support
121+
│ ├── probes.d # USDT probe definitions
122+
│ └── ...
123+
├── ebpf/
124+
│ └── cupti_bpf.h # Shared BPF struct definitions
92125
├── test/
93-
│ ├── mock_cupti.c # Mock CUPTI for testing
94-
│ └── test_cupti_prof.c # Test program
95-
├── parcagpu.bt # bpftrace monitoring script
96-
└── test.sh # Test runner
126+
│ ├── test_cupti_prof.c # Mock CUPTI test harness
127+
│ ├── mock_cupti.c # Mock CUPTI library
128+
│ ├── mock_cuda.c # Mock CUDA driver library
129+
│ ├── test-pc-mock.sh # Mock PC sampling end-to-end test
130+
│ ├── test-pc-real.sh # Real GPU PC sampling end-to-end test
131+
│ └── bpf/ # BPF activity parser (Go + eBPF)
132+
├── microbenchmarks/ # CUDA microbenchmarks (.cu)
133+
└── parcagpu.bt # bpftrace monitoring script
97134
```

README_TEST.md

Lines changed: 0 additions & 83 deletions
This file was deleted.

0 commit comments

Comments
 (0)