Commit c39bac5
Add CUDA graph kernel annotations tutorial
This tutorial demonstrates how to use CUDA graph kernel annotations
for semantic profiling traces with custom visualization lanes.
Features:
- End-to-end workflow from graph capture to visualization
- Transformer block example with annotated regions
- Post-processing to merge annotations into profiler traces
- Custom stream assignments for semantic organization
- Version checking for cuda-bindings compatibility
- Clear error messages with upgrade instructions
The tutorial includes:
- mark_kernels() context manager usage
- Graph capture with enable_annotations=True
- Profiling and trace post-processing
- Before/after comparison
- Troubleshooting guide
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>1 parent cdc645a commit c39bac5
3 files changed
Lines changed: 572 additions & 0 deletions
0 commit comments