Skip to content

Commit 12af895

Browse files
authored
[None] [chore] Update skills (#13507)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
1 parent 2ea0e63 commit 12af895

12 files changed

Lines changed: 1020 additions & 6 deletions

File tree

Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
---
2+
name: ad-conf-check-update
3+
description: >
4+
Updates the ad-conf-check skill's references/config_log_patterns.md
5+
by comparing it against the latest TensorRT-LLM AutoDeploy source code.
6+
Checks for new/removed/renamed configs in default.yaml and verifies that
7+
log patterns still match the actual source code. Edits the reference doc
8+
in-place if anything changed.
9+
tools: ["Read", "Write", "Edit", "Bash", "Grep", "Glob"]
10+
model: sonnet
11+
license: Apache-2.0
12+
metadata:
13+
author: NVIDIA Corporation
14+
---
15+
16+
You are a reference-doc updater for the `ad-conf-check` skill. Your job is to ensure that `references/config_log_patterns.md` is up-to-date with the latest TensorRT-LLM AutoDeploy source code.
17+
18+
You will receive two paths:
19+
- `<trtllm_src>` — the TensorRT-LLM repo root
20+
- `<skill_dir>` — the ad-conf-check skill directory (contains `references/config_log_patterns.md`)
21+
22+
## Procedure
23+
24+
### Phase 1: Detect drift
25+
26+
#### 1a. Config drift — compare `default.yaml` against the reference doc
27+
28+
1. Read `<trtllm_src>/tensorrt_llm/_torch/auto_deploy/config/default.yaml` to get the authoritative list of all config keys and their defaults.
29+
2. Read `<skill_dir>/references/config_log_patterns.md` to get the currently documented config keys.
30+
3. Compare the two:
31+
- **New configs**: keys in `default.yaml` not documented in `config_log_patterns.md`
32+
- **Removed configs**: keys documented in `config_log_patterns.md` but no longer in `default.yaml`
33+
- **Renamed configs**: keys that appear removed but have an obvious successor (e.g., `cuda_graph_batch_sizes``cuda_graph_config.batch_sizes`)
34+
- **Changed defaults**: keys whose default value changed
35+
36+
#### 1b. Log-pattern drift — verify patterns against source code
37+
38+
1. For each config documented in `config_log_patterns.md`, grep the TRT-LLM source for the quoted log strings (success/failure indicators).
39+
- Focus on: `<trtllm_src>/tensorrt_llm/_torch/auto_deploy/` (transforms, compilers, config loaders)
40+
- Key directories to search:
41+
- `<trtllm_src>/tensorrt_llm/_torch/auto_deploy/transforms/`
42+
- `<trtllm_src>/tensorrt_llm/_torch/auto_deploy/compile/`
43+
- `<trtllm_src>/tensorrt_llm/_torch/auto_deploy/sharding/`
44+
- `<trtllm_src>/tensorrt_llm/_torch/auto_deploy/`
45+
2. Flag patterns that no longer appear in the source code (stale patterns).
46+
3. Find new log messages in the source that are not yet documented (new patterns).
47+
- Search for `logger.info`, `logger.warning`, `logger.error`, and `print` calls in the auto_deploy directory.
48+
- Focus on messages related to config application, transform results, and failure/fallback.
49+
50+
### Phase 2: Update the reference doc
51+
52+
If Phase 1 found any drift, edit `<skill_dir>/references/config_log_patterns.md` in-place:
53+
54+
1. **Add new config sections** for configs found in `default.yaml` but missing from the doc.
55+
- Place them in the appropriate section (Top-Level, kv_cache_config, or Transform Parameters).
56+
- Include the verification source tags (`[log]`, `[graph]`, `[nsys]`) based on what log patterns exist.
57+
- Document the log patterns found in the source code.
58+
59+
2. **Remove or mark deprecated** configs that no longer exist in `default.yaml`.
60+
- If a config was renamed, update the section header and add a deprecation note.
61+
- If a config was fully removed, delete its section.
62+
63+
3. **Update stale log patterns** where the source code has changed.
64+
- Replace old quoted strings with the current ones from the source.
65+
- Add newly discovered log patterns.
66+
67+
4. **Preserve the existing structure** and formatting conventions:
68+
- Section hierarchy: Top-Level → kv_cache_config → Transform Parameters → General Failure Patterns
69+
- Each config has: header with verification tags, values/transform key, success/failure indicators
70+
- Use the same markdown style as existing entries.
71+
72+
### Phase 3: Report
73+
74+
After updating (or confirming no changes needed), output a summary:
75+
76+
```
77+
## Reference Doc Update Summary
78+
79+
**Status**: UPDATED / NO CHANGES NEEDED
80+
**TRT-LLM source**: <trtllm_src path>
81+
**Reference doc**: <skill_dir>/references/config_log_patterns.md
82+
83+
### Changes made:
84+
- Added configs: <list or "none">
85+
- Removed configs: <list or "none">
86+
- Updated patterns: <list or "none">
87+
- Renamed configs: <list or "none">
88+
```
89+
90+
## Important rules
91+
92+
- **Do NOT fabricate log patterns.** Every quoted string must come from the actual source code. If you cannot find a log pattern for a config, document it with "No explicit log" as existing entries do.
93+
- **Do NOT change the overall document structure** (section order, heading levels) unless adding/removing sections.
94+
- **Be conservative**: if you're unsure whether a pattern is still valid, keep it and add a note rather than removing it.
95+
- **Preserve existing verification source tags** (`[log]`, `[graph]`, `[nsys]`) and only modify them if evidence supports the change.

.claude/agents/ad-debug-agent.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
name: ad-debug-agent
33
description: Debug the AutoDeploy model onboarding process
4-
tools: Read, Grep, Glob, Bash, Edit, Write
4+
tools: ["Read", "Grep", "Glob", "Bash", "Edit", "Write"]
55
model: sonnet
66
license: Apache-2.0
77
metadata:

.claude/agents/ad-onboard-reviewer.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
name: ad-onboard-reviewer
33
description: Independent reviewer for AutoDeploy model onboarding. Validates created model and test files against all onboarding requirements. Use after completing model onboarding work.
4-
tools: Read, Grep, Glob
4+
tools: ["Read", "Grep", "Glob"]
55
model: sonnet
66
license: Apache-2.0
77
metadata:

.claude/agents/ad-run-agent.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
name: ad-run-agent
33
description: Run AutoDeploy build and run command for a given model
4-
tools: Read, Grep, Glob, Bash, Write, Edit
4+
tools: ["Read", "Grep", "Glob", "Bash", "Write", "Edit"]
55
model: sonnet
66
license: Apache-2.0
77
metadata:
Lines changed: 220 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,220 @@
1+
---
2+
name: ad-add-fusion-transformation
3+
description: >
4+
Claude Code skill (trtllm-agent-toolkit): implement or extend TensorRT-LLM AutoDeploy fusion
5+
transforms under transform/library/ in a TensorRT-LLM checkout. Prefer existing kernels and custom
6+
ops; use Triton only when no viable existing-kernel path exists. Use ad-graph-dump for
7+
AD_DUMP_GRAPHS_DIR workflows. Covers TRT-LLM paths, registry, default.yaml registration, graph
8+
validation, tests, and a review checklist — without prescribing profiling tools or throughput
9+
targets.
10+
license: Apache-2.0
11+
tags:
12+
- tensorrt-llm
13+
- autodeploy
14+
- fusion
15+
- graph-transform
16+
- optimization
17+
metadata:
18+
author: NVIDIA Corporation
19+
---
20+
21+
# Autodeploy: Add Fusion Transformation Pass
22+
23+
## Where this skill applies
24+
25+
This file lives in the **trtllm-agent-toolkit** plugin. Paths such as `tensorrt_llm/...`, `examples/auto_deploy/...`, and `tests/...` are relative to a **TensorRT-LLM source checkout** on the user’s machine — not the plugin tree.
26+
27+
After installing the plugin (see the toolkit `README.md`), skills use the `trtllm-agent-toolkit:` prefix (for example `trtllm-agent-toolkit:ad-add-fusion-transformation`).
28+
29+
## Related skills in this plugin
30+
31+
| Skill | Use it for |
32+
|-------|------------|
33+
| **ad-graph-dump** | Enabling `AD_DUMP_GRAPHS_DIR`, dump file layout, and how to read SSA graph output. |
34+
| **trtllm-codebase-exploration** | Mapping existing transforms, custom ops, and search patterns before writing a pass. |
35+
| **trtllm-code-contribution** | TensorRT-LLM pre-commit, tests, DCO sign-off, and PR expectations. |
36+
| **triton-kernel-writing** | Implementing a **Triton** op only after existing-kernel lookup fails. |
37+
| **triton-tileir-optimization** | Tuning **existing** Triton kernels for the TileIR backend when that path applies. |
38+
| **cuda-kernel-writing** | Raw CUDA extension work if the viable path is a PyTorch C++ extension (not Triton). |
39+
| **cute-kernel-writing** / **cudeepy-kernel-writing** | CuTe DSL/LIR or CuDeepy-generated kernels when that is the chosen integration path. |
40+
41+
Use this skill when you already know **which subgraph or pattern** you are targeting (from graph dumps, logs, or code reading). For dump capture and file semantics, follow **ad-graph-dump** first.
42+
43+
## When to use this skill
44+
45+
- Adding, extending, or reviewing a fusion under AutoDeploy transforms in a TensorRT-LLM tree.
46+
47+
### Workflow (concise)
48+
49+
1. Confirm the pattern in **current** graph dumps (see **ad-graph-dump**).
50+
2. Search for an existing kernel or custom-op path before new Triton or CUDA.
51+
3. Implement the smallest change that proves correctness and matching; add tests.
52+
4. Re-run dumps and tests; if outputs drift, separate matching issues from metadata loss from numeric differences.
53+
54+
## Finding fusion candidates (lightweight)
55+
56+
Do this before writing a new pass so you work on real graph structure.
57+
58+
### Inputs
59+
60+
- Graph dump directory from a run with `AD_DUMP_GRAPHS_DIR` set (see **ad-graph-dump**).
61+
- Model id and active AutoDeploy config (registry YAML, `default.yaml` overlays).
62+
- TensorRT-LLM source tree for kernel and transform lookup.
63+
64+
### Outputs
65+
66+
- Ordered list of candidates with: graph evidence, existing-kernel lookup (`found` / `not_found`), recommendation (`use_existing_kernel`, `needs_triton_fallback`, `defer`), and trade-offs (complexity, correctness risk).
67+
68+
### Discovery workflow
69+
70+
1. Parse dumps for repeated unfused patterns (element-wise chains, norm chains, epilogues, attention-adjacent ops).
71+
2. Search the tree for equivalent transforms or custom ops; record file/symbol evidence.
72+
3. If nothing fits, mark Triton or other kernel work as a deliberate fallback.
73+
4. Prefer candidates with clear recurrence, existing support, and lower numerical risk.
74+
75+
### Per-candidate template
76+
77+
```text
78+
Candidate: <short-name>
79+
Affected graph pattern: <pattern>
80+
Existing kernel lookup: <found|not_found>
81+
Evidence: <path/symbol>
82+
Recommendation: <use_existing_kernel|needs_triton_fallback|defer>
83+
Strengths / weaknesses / risks:
84+
- ...
85+
```
86+
87+
### Guardrails
88+
89+
- Do not skip existing-kernel lookup.
90+
- Do not default to Triton when a viable existing op already exists.
91+
- If uncertain, `defer` and narrow the question with one more dump or test.
92+
93+
---
94+
95+
## Inputs (implementation)
96+
97+
- Chosen candidate or concrete subgraph.
98+
- Active model and config files.
99+
- Fresh graph dumps when available.
100+
- Current baseline: match counts from logs, unit test status, any accuracy notes you already maintain.
101+
102+
## Outputs (implementation)
103+
104+
- Pass design or patch: registered transform, `default.yaml` entry, optional model-registry YAML.
105+
- Path decision: `existing_kernel_path` vs `triton_fallback_path` (or other kernel stack).
106+
- Validation notes: graph evidence, `[SUMMARY] matches=...` before/after from AutoDeploy logs, test results.
107+
108+
## Implementation workflow
109+
110+
1. Align the pass with **observed** graph structure from dumps — not assumed op names from docs alone.
111+
2. Search `transform/library/`, `custom_ops/`, `torch.ops.auto_deploy.*`, and related tests for reuse.
112+
3. Integrate an existing op when possible; otherwise delegate kernel work to the appropriate skill (**triton-kernel-writing**, **cuda-kernel-writing**, etc.).
113+
4. Keep one logical change per patch; extend tests in the same change.
114+
5. Re-read dumps after the change; if match counts collapse, suspect pattern availability or metadata propagation.
115+
116+
## Where fusion passes live
117+
118+
- Transforms: `tensorrt_llm/_torch/auto_deploy/transform/library/`
119+
- Registry / base behavior: `tensorrt_llm/_torch/auto_deploy/transform/interface.py`
120+
- Default transform list: `tensorrt_llm/_torch/auto_deploy/config/default.yaml`
121+
- Dump helper: `tensorrt_llm/_torch/auto_deploy/utils/graph_writer.py`
122+
- Graph utilities: `tensorrt_llm/_torch/auto_deploy/utils/node_utils.py`, `tensorrt_llm/_torch/auto_deploy/utils/_graph.py`
123+
- Custom ops: `tensorrt_llm/_torch/auto_deploy/custom_ops/`
124+
125+
Tests (typical):
126+
127+
- `tests/unittest/auto_deploy/singlegpu/transformations/library/`
128+
- `tests/integration/defs/accuracy/test_llm_api_autodeploy.py` (when behavior or numerics may change)
129+
130+
## How to add a transform
131+
132+
### Implement the pass
133+
134+
Create or update a module under `transform/library/` and register the class:
135+
136+
```python
137+
@TransformRegistry.register("my_transform_key")
138+
class MyTransform(BaseTransform):
139+
@classmethod
140+
def get_config_class(cls):
141+
return MyTransformConfig
142+
```
143+
144+
Use a dedicated config class only when the pass needs parameters beyond the base transform config.
145+
146+
### Register in `default.yaml`
147+
148+
Add a key under `transforms:` in `tensorrt_llm/_torch/auto_deploy/config/default.yaml`. **Copy the field set from the closest existing transform** in the same section of the file (required keys depend on the transform config class and on how peers are declared). New experimental passes should stay **`enabled: false`** until covered by tests and dumps.
149+
150+
### Enable for a specific model
151+
152+
For targeted rollout, adjust registry YAMLs under `examples/auto_deploy/model_registry/configs/` rather than turning on unproven passes globally.
153+
154+
## Implementation rules
155+
156+
- Prefer existing AutoDeploy / TRT-LLM ops and `torch.ops.auto_deploy` entries.
157+
- Prefer stable, backend-neutral graph contracts; avoid hiding real dataflow in `node.meta` when an edge should carry it.
158+
- Use metadata for observable tensor facts (shape, dtype) and preserve it across rewrites when replacements should remain traceable.
159+
- **One hypothesis per patch** — do not mix unrelated fusions.
160+
161+
## Existing kernel first, Triton second
162+
163+
Before Triton:
164+
165+
1. Search `transform/library/` and `custom_ops/`.
166+
2. Search `torch.ops.auto_deploy.*` and TRT-LLM custom op definitions.
167+
3. Read tests for similar integrations.
168+
169+
Use **triton-kernel-writing** only when no suitable op exists and you accept owning kernel + integration work.
170+
171+
## Validation order
172+
173+
1. Graph dumps — pattern present, rewrite visible (see **ad-graph-dump**).
174+
2. Unit tests for the transform.
175+
3. Integration or accuracy checks when numerics or end-to-end behavior may change.
176+
177+
## Match counts
178+
179+
AutoDeploy logs `[SUMMARY] matches=<n>` (or `skipped` / `disabled`) per transform. Compare before and after your change; a large drop usually indicates pattern or metadata issues, not “slow runs.”
180+
181+
## Testing expectations
182+
183+
Follow **trtllm-code-contribution** for repo conventions. Cover:
184+
185+
- Happy-path micrograph or exported-graph rewrites.
186+
- Failure modes that must **not** fuse (multiple consumers, mixed consumers).
187+
- Metadata preservation when an upstream pass feeds your pattern.
188+
189+
Primary unittest location for library transforms:
190+
191+
- `tests/unittest/auto_deploy/singlegpu/transformations/library/`
192+
193+
## Review checklist
194+
195+
- Target structure appears in current dumps.
196+
- Transform registered and listed in `default.yaml` consistently with peer entries.
197+
- Model-registry toggles are intentional.
198+
- Non-zero `matches` where expected, or `skipped` is explained.
199+
- Before/after dump snippets or diffs saved for the review thread.
200+
- Tests cover both success and intentional non-match cases.
201+
- If outputs change, classify match loss vs metadata loss vs acceptable numeric drift.
202+
203+
## Guardrails
204+
205+
- Do not bundle unrelated passes in one change.
206+
- If dumps contradict expectations, document what you observed before chasing unrelated hypotheses.
207+
208+
## Iteration note (template)
209+
210+
```text
211+
Candidate: <name>
212+
Path: <existing_kernel_path|triton_fallback_path|other>
213+
Rationale:
214+
- ...
215+
Graph validation: <pass|fail — what files / ops>
216+
Summary logs: <matches before / after>
217+
Tests: <what ran>
218+
Open risks:
219+
- ...
220+
```

0 commit comments

Comments
 (0)