Commit a057095
committed
Disable hipBLASLt auto-tune by default, fix warm prompt regression
Auto-tuning benchmarks 8 GEMM algorithms per shape on first call,
adding ~200ms startup overhead. For quantized models the regular
GEMM path is rarely used, so the overhead is wasted. Disable by
default; enable with MLX_ROCM_HIPBLASLT_TUNE=1 for non-quantized.
Warm prompt restored: Qwen3-8B 1092 tok/s, Qwen3.5-35B 795 tok/s.1 parent 25f5912 commit a057095
1 file changed
Lines changed: 4 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
351 | 351 | | |
352 | 352 | | |
353 | 353 | | |
354 | | - | |
355 | | - | |
356 | | - | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
357 | 358 | | |
358 | 359 | | |
359 | 360 | | |
| |||
0 commit comments