Skip to content

Commit 09b2124

Browse files
authored
scripts(dflash): switch default bench target to Q8_0 + --target flag (#65)
Per Markus 2026-06-04: DFlash quality measurement should use a Q8_0 target rather than Q4_K_M, since Q4_K_M introduces enough target-side quantization noise to confound DFlash's own accept-rate signal. Q8_0 fits in 38 GB total, well within titan A100 80 GB. * Default `TARGET` is now `gemma-4-31B-it-Q8_0.gguf`. Override via `--target PATH` or `DFLASH_BENCH_TARGET` env var. * Also added `DFLASH_BENCH_DRAFTER_DIR` env var for consistency. * Comment block documents VRAM math for Q4_K_M / Q8_0 / BF16 targets so future runs can pick the right card.
1 parent b0daec5 commit 09b2124

1 file changed

Lines changed: 14 additions & 5 deletions

File tree

scripts/bench-dflash.sh

Lines changed: 14 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,19 @@
66
# pair runs N times so variance is visible (DFlash bench has ±2-3pp
77
# run-to-run variance even at temp=0 / fixed seed).
88
#
9-
# VRAM requirement: ~22 GB free (target Q4_K_M ~18 GB + drafter ~1-3 GB +
10-
# compute). Coordinate centurion-llm scale-down before running.
9+
# VRAM requirement (target + ~1-3 GB drafter + compute):
10+
# - Q4_K_M target ~18 GB → ~22 GB total (fits on a single 24 GB card)
11+
# - Q8_0 target ~33 GB → ~38 GB total (titan A100 80 GB only)
12+
# - BF16 target ~62 GB → ~67 GB total (titan A100 80 GB only)
13+
# Coordinate centurion-llm scale-down before running on shared hardware.
1114
#
1215
# Usage:
13-
# scripts/bench-dflash.sh [--quants Q4,Q6,Q8,BF16] [--runs 3] [--ctx 4096]
16+
# scripts/bench-dflash.sh [--target PATH] [--quants Q4,Q6,Q8,BF16] [--runs 3] [--ctx 4096]
17+
#
18+
# Default target is gemma-4-31B-it-Q8_0.gguf — the higher-quality reference
19+
# preferred for DFlash quality measurement (Markus 2026-06-04). For VRAM-
20+
# constrained local runs, override with --target gemma-4-31B-it-Q4_K_M.gguf
21+
# (or set DFLASH_BENCH_TARGET in the env).
1422
#
1523
# Output goes to /tmp/dflash-bench-<timestamp>.md with a markdown summary
1624
# table at the bottom.
@@ -25,8 +33,8 @@ set -euo pipefail
2533

2634
ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
2735
BIN="$ROOT/build-cuda/bin/llama-speculative-simple"
28-
TARGET="$ROOT/models/gemma-4-31B-it-Q4_K_M.gguf"
29-
DRAFTER_DIR="$ROOT/models/dflash-gemma4-31b-gguf"
36+
TARGET="${DFLASH_BENCH_TARGET:-$ROOT/models/gemma-4-31B-it-Q8_0.gguf}"
37+
DRAFTER_DIR="${DFLASH_BENCH_DRAFTER_DIR:-$ROOT/models/dflash-gemma4-31b-gguf}"
3038
TS=$(date +%Y%m%d-%H%M%S)
3139
OUT="/tmp/dflash-bench-$TS.md"
3240

@@ -36,6 +44,7 @@ CTX=4096
3644

3745
while (( $# )); do
3846
case "$1" in
47+
--target) TARGET="$2"; shift 2 ;;
3948
--quants) QUANTS="$2"; shift 2 ;;
4049
--runs) RUNS="$2"; shift 2 ;;
4150
--ctx) CTX="$2"; shift 2 ;;

0 commit comments

Comments
 (0)