You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: .claude/skills/quant-recipe-search/SKILL.md
+4-2Lines changed: 4 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -39,9 +39,11 @@ Do not duplicate those workflows here. This skill should leave the user with a c
39
39
40
40
2.**Define the target**
41
41
- Ask the user what makes the compression successful before choosing recipes.
42
-
- If the user does not specify, use one of two default objectives:
42
+
- If the user did not provide an optimization objective, stop and ask them to choose before planning candidates. Do not infer or default silently.
43
+
- Offer these default objective choices:
43
44
-**Compute / throughput:** typical data-center target. Prefer recipes with activation quantization such as NVFP4 or FP8 when the downstream stack can use fast kernels.
44
45
-**Memory / latency:** typical edge target. Minimize activated memory per forward pass to reduce latency; prefer weight-only or W4A16-style recipes when they preserve accuracy.
46
+
-**Custom:** user-provided objective, such as checkpoint size, throughput at a fixed batch size, decode latency, prefill latency, or a product-specific memory budget.
45
47
- Default acceptance goal: find the recipe with the best performance for the chosen objective while keeping each benchmark's accuracy loss under 1 percentage point versus the matching baseline.
46
48
- Treat near-threshold or noisy benchmark deltas as inconclusive until reruns confirm whether the drop is a real regression.
47
49
- Record recipe-selection criteria: target active bytes/token, acceptable accuracy loss, calibration budget, and any user-provided throughput/latency goal.
@@ -80,7 +82,7 @@ Do not duplicate those workflows here. This skill should leave the user with a c
80
82
81
83
## Practical Defaults
82
84
83
-
- Ask whether the primary success metric is compute/throughput or memory/latency. Do not assume.
85
+
- Ask whether the primary success metric is compute/throughput, memory/latency, or a custom objective. Do not assume, and do not proceed to candidate planning until the objective is explicit.
84
86
- Default to a `<1pp` per-benchmark accuracy-loss constraint versus the matching baseline unless the user gives another threshold.
85
87
- Prefer active runtime cost over checkpoint size when optimizing routed or sparsely activated models.
86
88
- Always compare against BF16/FP16 and a near-lossless FP8/W8A8 baseline.
0 commit comments