Skip to content

Commit ed0d717

Browse files
committed
Require explicit recipe search objective
Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>
1 parent 30e8c78 commit ed0d717

1 file changed

Lines changed: 4 additions & 2 deletions

File tree

.claude/skills/quant-recipe-search/SKILL.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -39,9 +39,11 @@ Do not duplicate those workflows here. This skill should leave the user with a c
3939

4040
2. **Define the target**
4141
- Ask the user what makes the compression successful before choosing recipes.
42-
- If the user does not specify, use one of two default objectives:
42+
- If the user did not provide an optimization objective, stop and ask them to choose before planning candidates. Do not infer or default silently.
43+
- Offer these default objective choices:
4344
- **Compute / throughput:** typical data-center target. Prefer recipes with activation quantization such as NVFP4 or FP8 when the downstream stack can use fast kernels.
4445
- **Memory / latency:** typical edge target. Minimize activated memory per forward pass to reduce latency; prefer weight-only or W4A16-style recipes when they preserve accuracy.
46+
- **Custom:** user-provided objective, such as checkpoint size, throughput at a fixed batch size, decode latency, prefill latency, or a product-specific memory budget.
4547
- Default acceptance goal: find the recipe with the best performance for the chosen objective while keeping each benchmark's accuracy loss under 1 percentage point versus the matching baseline.
4648
- Treat near-threshold or noisy benchmark deltas as inconclusive until reruns confirm whether the drop is a real regression.
4749
- Record recipe-selection criteria: target active bytes/token, acceptable accuracy loss, calibration budget, and any user-provided throughput/latency goal.
@@ -80,7 +82,7 @@ Do not duplicate those workflows here. This skill should leave the user with a c
8082

8183
## Practical Defaults
8284

83-
- Ask whether the primary success metric is compute/throughput or memory/latency. Do not assume.
85+
- Ask whether the primary success metric is compute/throughput, memory/latency, or a custom objective. Do not assume, and do not proceed to candidate planning until the objective is explicit.
8486
- Default to a `<1pp` per-benchmark accuracy-loss constraint versus the matching baseline unless the user gives another threshold.
8587
- Prefer active runtime cost over checkpoint size when optimizing routed or sparsely activated models.
8688
- Always compare against BF16/FP16 and a near-lossless FP8/W8A8 baseline.

0 commit comments

Comments
 (0)