Skip to content

Commit 708b21c

Browse files
yushangdiclaude
andauthored
Add CUDA graph kernel annotations tutorial (#3915)
This tutorial demonstrates how to use CUDA graph kernel annotations for semantic profiling traces with custom visualization lanes. Features: - End-to-end workflow from graph capture to visualization - Transformer block example with annotated regions - Post-processing to merge annotations into profiler traces - Custom stream assignments for semantic organization The tutorial includes: - mark_kernels() context manager usage - Graph capture with enable_annotations=True - Profiling and trace post-processing - Before/after comparison - Troubleshooting guide Fixes #ISSUE_NUMBER ## Description <!--- Describe your changes in detail --> ## Checklist <!--- Make sure to add `x` to all items in the following checklist: --> - [ ] The issue that is being fixed is referred in the description (see above "Fixes #ISSUE_NUMBER") - [ ] Only one issue is addressed in this pull request - [ ] Labels from the issue that this PR is fixing are added to this pull request - [ ] No unnecessary issues are included into this pull request. --------- Co-authored-by: yushangdi <yushangdi@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
1 parent c1c6bfb commit 708b21c

7 files changed

Lines changed: 935 additions & 0 deletions

File tree

.ci/docker/requirements.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ fastapi
2626
matplotlib
2727
librosa
2828
torch==2.12
29+
cuda-bindings>=13.1.0 # Required for CUDA graph annotations tutorial
2930
torchvision
3031
torchdata
3132
networkx
267 KB
Loading
312 KB
Loading
130 KB
Loading

0 commit comments

Comments
 (0)