@@ -268,21 +268,33 @@ Dequantization and SIMD are fast - the bottleneck is FILE READ.
268268- Subsequent starts use cached model (instant)
269269- Model persists across deploys
270270
271- ### Expected Impact
271+ ### ACTUAL RESULTS (VERIFIED!)
272272
273- | Metric | Before (Ephemeral) | After (Volume) |
274- | --------| -------------------| ----------------|
275- | Load time | 208s | ~ 13s (estimated) |
276- | First deploy | 208s | ~ 60s (download) |
277- | Subsequent | 208s | ~ 13s |
273+ | Metric | Before (Ephemeral) | After (Volume) | Improvement |
274+ | --------| -------------------| ----------------| -------------|
275+ | ** Total load** | ** 208s** | ** 4.82s** | ** 43x faster!** |
276+ | Layer weights | ~ 200s | 4.47s | 45x faster |
277+ | Embeddings | N/A | 341ms | - |
278+ | First deploy | 208s | ~ 60s (download) | - |
279+
280+ ** Profiling breakdown (NVMe Volume):**
281+ ```
282+ ║ Thread pool init: 0.68 ms ( 0.0%)
283+ ║ Embeddings: 341.77 ms ( 7.1%)
284+ ║ RoPE init: 13.76 ms ( 0.3%)
285+ ║ KV cache init: 0.18 ms ( 0.0%)
286+ ║ Layer weights: 4467.82 ms ( 92.6%)
287+ ║ Buffer alloc: 0.05 ms ( 0.0%)
288+ ║ TOTAL: 4824.28 ms
289+ ```
278290
279291---
280292
281293## Version History
282294
283295| Version | Date | Changes |
284296| ---------| ------| ---------|
285- | v1.4.0 | 2026-02-02 | Fly.io Volumes for NVMe SSD storage |
297+ | v1.4.0 | 2026-02-02 | Fly.io Volumes - ** 43x faster load (208s→4.8s) ** |
286298| v1.3.0 | 2026-02-02 | Load profiling - found I/O bottleneck |
287299| v1.2.0 | 2026-02-02 | Parallel dequantization (OPT-003) |
288300| v1.1.0 | 2026-02-02 | SIMD optimization (OPT-001) |
0 commit comments