Skip to content

Commit 298937a

Browse files
committed
Add before/after screenshots with descriptions
Added chrome://tracing screenshots showing: - Before: All 65 kernels on single stream with auto-generated names - After: Kernels organized into semantic lanes (streams 61, 62) with meaningful labels (attention, mlp) Screenshots demonstrate the value of kernel annotations for understanding execution structure and identifying components.
1 parent 6a79dc3 commit 298937a

3 files changed

Lines changed: 15 additions & 1 deletion

File tree

312 KB
Loading
130 KB
Loading

advanced_source/cuda_graph_annotations_tutorial.py

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,21 @@
5757
# their function, making it much easier to identify performance bottlenecks
5858
# and understand execution flow.
5959
#
60-
# .. image:: /_static/img/cuda_graph_annotations_before_after.png
60+
# **Before annotations:** All kernels appear on a single stream with
61+
# auto-generated names, making it difficult to understand which operations
62+
# belong to which logical component of your model.
63+
#
64+
# .. image:: /_static/img/cuda_graph_trace_before.png
65+
# :width: 100%
66+
# :alt: CUDA graph trace before annotations showing all kernels on one stream
67+
#
68+
# **After annotations:** Kernels are organized into semantic lanes (streams 61
69+
# and 62) with meaningful labels like "attention" and "mlp", making it easy to
70+
# identify different components and understand the execution structure.
71+
#
72+
# .. image:: /_static/img/cuda_graph_trace_after.png
73+
# :width: 100%
74+
# :alt: CUDA graph trace after annotations showing kernels organized by function
6175
#
6276
# Requirements
6377
# ------------

0 commit comments

Comments
 (0)