Skip to content

Commit 81ae910

Browse files
gHashTagona-agent
andcommitted
docs: update with verified NVMe volume results - 43x faster!
VERIFIED RESULTS: - Before (ephemeral): 208 seconds - After (NVMe volume): 4.82 seconds - Improvement: 43x faster load time! Layer weights: 4.47s (92.6% of total) Embeddings: 341ms (7.1%) Co-authored-by: Ona <no-reply@ona.com>
1 parent 99639ff commit 81ae910

1 file changed

Lines changed: 19 additions & 7 deletions

File tree

docs/DISCOVERIES.md

Lines changed: 19 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -268,21 +268,33 @@ Dequantization and SIMD are fast - the bottleneck is FILE READ.
268268
- Subsequent starts use cached model (instant)
269269
- Model persists across deploys
270270

271-
### Expected Impact
271+
### ACTUAL RESULTS (VERIFIED!)
272272

273-
| Metric | Before (Ephemeral) | After (Volume) |
274-
|--------|-------------------|----------------|
275-
| Load time | 208s | ~13s (estimated) |
276-
| First deploy | 208s | ~60s (download) |
277-
| Subsequent | 208s | ~13s |
273+
| Metric | Before (Ephemeral) | After (Volume) | Improvement |
274+
|--------|-------------------|----------------|-------------|
275+
| **Total load** | **208s** | **4.82s** | **43x faster!** |
276+
| Layer weights | ~200s | 4.47s | 45x faster |
277+
| Embeddings | N/A | 341ms | - |
278+
| First deploy | 208s | ~60s (download) | - |
279+
280+
**Profiling breakdown (NVMe Volume):**
281+
```
282+
║ Thread pool init: 0.68 ms ( 0.0%)
283+
║ Embeddings: 341.77 ms ( 7.1%)
284+
║ RoPE init: 13.76 ms ( 0.3%)
285+
║ KV cache init: 0.18 ms ( 0.0%)
286+
║ Layer weights: 4467.82 ms ( 92.6%)
287+
║ Buffer alloc: 0.05 ms ( 0.0%)
288+
║ TOTAL: 4824.28 ms
289+
```
278290

279291
---
280292

281293
## Version History
282294

283295
| Version | Date | Changes |
284296
|---------|------|---------|
285-
| v1.4.0 | 2026-02-02 | Fly.io Volumes for NVMe SSD storage |
297+
| v1.4.0 | 2026-02-02 | Fly.io Volumes - **43x faster load (208s→4.8s)** |
286298
| v1.3.0 | 2026-02-02 | Load profiling - found I/O bottleneck |
287299
| v1.2.0 | 2026-02-02 | Parallel dequantization (OPT-003) |
288300
| v1.1.0 | 2026-02-02 | SIMD optimization (OPT-001) |

0 commit comments

Comments
 (0)