Skip to content

Commit 2cb3b39

Browse files
committed
Add end-to-end workflow doc and cross-skill references
- Add common/end-to-end-workflow.md documenting the PTQ → Deploy → Eval pipeline, workspace continuity, unsupported model handling, NEL deployment.command pattern, and NEL CI vs SLURM executor decision table - Add cross-skill workspace flow to workspace-management.md - Add "Next steps" to ptq/SKILL.md pointing to deployment/evaluation - Add pipeline integration note to evaluation/SKILL.md Depends on PR #1236 (deployment/references/unsupported-models.md). Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
1 parent e952bcd commit 2cb3b39

File tree

4 files changed

+94
-1
lines changed

4 files changed

+94
-1
lines changed
Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
# End-to-End Workflow: PTQ → Deploy → Eval
2+
3+
This document ties together the three domain skills (PTQ, Deployment, Evaluation) for the common workflow of quantizing a model, deploying it, and evaluating accuracy.
4+
5+
## Pipeline Overview
6+
7+
```text
8+
PTQ (quantize) → Deployment (serve) → Evaluation (benchmark)
9+
───────────────── ────────────────── ────────────────────────
10+
hf_ptq.py vLLM / SGLang / TRT-LLM NEL (SLURM or JET)
11+
↓ ↓ ↓
12+
NVFP4/FP8 checkpoint OpenAI-compatible API MMLU, GSM8K, GPQA scores
13+
(safetensors) (http://host:8000) (results.yml)
14+
```
15+
16+
## Workspace Continuity
17+
18+
All three stages share the same workspace directory. The PTQ output becomes the deployment input, and eval results land alongside:
19+
20+
```text
21+
workspaces/model-name-format/
22+
output/ ← PTQ checkpoint (safetensors + config.json)
23+
eval_results/ ← NEL evaluation artifacts (results.yml per task)
24+
eval_config.yaml ← NEL config for evaluation
25+
scripts/ ← Custom run scripts (if needed)
26+
logs/ ← SLURM job logs
27+
```
28+
29+
When starting a deployment or evaluation step, always check for an existing workspace from a prior PTQ run:
30+
31+
```bash
32+
ls workspaces/
33+
```
34+
35+
## Unsupported Models
36+
37+
Models not in the verified support matrices require extra work at each stage:
38+
39+
| Stage | What can go wrong | Reference |
40+
|-------|-------------------|-----------|
41+
| **PTQ** | Unknown architecture, FP8 source checkpoint, VLM structure | `ptq/references/unsupported-models.md` |
42+
| **Deployment** | Missing architecture mapping, weight key mismatches, quant/unquant layer confusion | `deployment/references/unsupported-models.md` |
43+
| **Evaluation** | Framework patches needed in deployment container, gated datasets, cluster storage | `evaluation/references/nel-ci-guide.md` |
44+
45+
Each stage has its own debug loop (run → read error → diagnose → patch → re-run). Fixes from one stage often inform the next — e.g., if PTQ required a transformers upgrade, deployment and evaluation will too.
46+
47+
## NEL Evaluation with Custom Deployments
48+
49+
When the serving framework needs runtime patches (e.g., transformers upgrade, model handler fix), override `deployment.command` in the NEL config to inject fixes before serving:
50+
51+
```yaml
52+
deployment:
53+
command: >-
54+
pip install "transformers>=5.0.0.dev0" --pre -q &&
55+
sed -i 's/old_pattern/new_pattern/' /path/to/framework/file.py &&
56+
${deployment.base_command}
57+
```
58+
59+
This works with both NEL SLURM executor and NEL CI (via `NEL_DEPLOYMENT_COMMAND`).
60+
61+
## Decision: NEL SLURM Executor vs NEL CI (JET)
62+
63+
| Factor | NEL SLURM executor | NEL CI (JET) |
64+
|--------|-------------------|--------------|
65+
| **When to use** | Iterative debugging, checkpoint on non-JET cluster, custom patches needed | Production evals, MLflow tracking, reproducible configs |
66+
| **Checkpoint location** | Any cluster you have SSH access to | Must be on JET cluster `/lustre/` storage |
67+
| **Secrets (HF_TOKEN, NGC)** | Provide your own via `host:` env vars | Managed centrally via JET secrets |
68+
| **Container patches** | Override `deployment.command` | Use `NEL_DEPLOYMENT_COMMAND` |
69+
| **MLflow export** | Manual setup | Automatic |
70+
| **Gated datasets** | Your HF account needs access | Handled by `COMPEVAL_HF_TOKEN` |

.claude/skills/common/workspace-management.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,21 @@ rsync -a --quiet \
9292
"$MODELOPT_REPO_DIR/" "$MODELOPT_WORKSPACE_ROOT/<name>/"
9393
```
9494

95+
## Cross-Skill Workspace Flow
96+
97+
Workspaces carry over across the PTQ → Deploy → Eval pipeline. Each stage adds to the same directory:
98+
99+
```text
100+
workspaces/model-name-format/
101+
output/ ← PTQ: quantized checkpoint
102+
eval_results/ ← Evaluation: NEL artifacts (results.yml per task)
103+
eval_config.yaml ← Evaluation: NEL config
104+
scripts/ ← Deployment/PTQ: custom run scripts
105+
logs/ ← All: SLURM job logs
106+
```
107+
108+
See `skills/common/end-to-end-workflow.md` for the full pipeline.
109+
95110
## Example Flow
96111

97112
```text
@@ -104,6 +119,10 @@ User: "deploy the model I just quantized"
104119
Agent: ls workspaces/ → sees "qwen3-0.6b-nvfp4"
105120
→ reuse, find checkpoint at workspaces/qwen3-0.6b-nvfp4/output/
106121
122+
User: "evaluate the quantized model on MMLU and GSM8K"
123+
Agent: ls workspaces/ → sees "qwen3-0.6b-nvfp4"
124+
→ reuse, write eval_config.yaml, results to workspaces/qwen3-0.6b-nvfp4/eval_results/
125+
107126
User: "now quantize Llama-3.1-8B with fp8"
108127
Agent: ls workspaces/ → no llama
109128
→ mkdir workspaces/llama-3.1-8b-fp8

.claude/skills/evaluation/SKILL.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,10 +12,12 @@ license: Apache-2.0
1212

1313
You're an expert in NeMo Evaluator Launcher! Guide the user through creating production-ready YAML configurations, running evaluations, and monitoring progress via an interactive workflow specified below.
1414

15-
### Workspace (multi-user / Slack bot)
15+
### Workspace and Pipeline Integration
1616

1717
If `MODELOPT_WORKSPACE_ROOT` is set, read `skills/common/workspace-management.md`. Check for existing workspaces — especially if evaluating a model from a prior PTQ or deployment step. Reuse the existing workspace so you have access to the quantized checkpoint and any code modifications.
1818

19+
This skill is often the final stage of the PTQ → Deploy → Eval pipeline. If the model required runtime patches during deployment (transformers upgrade, framework source fixes), carry those patches into the NEL config via `deployment.command`. See `skills/common/end-to-end-workflow.md` for the full pipeline.
20+
1921
### Workflow
2022

2123
```text

.claude/skills/ptq/SKILL.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -113,6 +113,8 @@ ls -lh <output_path>/
113113

114114
Report the path and size to the user.
115115

116+
**Next steps**: If the user wants to deploy or evaluate the quantized checkpoint, use the **deployment** or **evaluation** skill. The checkpoint workspace carries over — see `skills/common/end-to-end-workflow.md` for the full PTQ → Deploy → Eval pipeline. If the model required patches during PTQ (e.g., transformers upgrade), the same fixes will likely be needed at deployment and evaluation time.
117+
116118
## Key API Rules
117119

118120
- `mtq.register()` classes **must** define `_setup()` and call it from `__init__`

0 commit comments

Comments
 (0)