Commit 708b21c
Add CUDA graph kernel annotations tutorial (#3915)
This tutorial demonstrates how to use CUDA graph kernel annotations for
semantic profiling traces with custom visualization lanes.
Features:
- End-to-end workflow from graph capture to visualization
- Transformer block example with annotated regions
- Post-processing to merge annotations into profiler traces
- Custom stream assignments for semantic organization
The tutorial includes:
- mark_kernels() context manager usage
- Graph capture with enable_annotations=True
- Profiling and trace post-processing
- Before/after comparison
- Troubleshooting guide
Fixes #ISSUE_NUMBER
## Description
<!--- Describe your changes in detail -->
## Checklist
<!--- Make sure to add `x` to all items in the following checklist: -->
- [ ] The issue that is being fixed is referred in the description (see
above "Fixes #ISSUE_NUMBER")
- [ ] Only one issue is addressed in this pull request
- [ ] Labels from the issue that this PR is fixing are added to this
pull request
- [ ] No unnecessary issues are included into this pull request.
---------
Co-authored-by: yushangdi <yushangdi@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>1 parent c1c6bfb commit 708b21c
7 files changed
Lines changed: 935 additions & 0 deletions
File tree
- .ci/docker
- _static/img
- advanced_source
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
26 | 26 | | |
27 | 27 | | |
28 | 28 | | |
| 29 | + | |
29 | 30 | | |
30 | 31 | | |
31 | 32 | | |
| |||
Loading
Loading
Loading
0 commit comments