Commit d8f169e
Add ternary quantization, Flash Attention, and parallel inference
- specs/tri/ternary_smollm2.vibee: Ternary {-1,0,+1} quantization spec
- specs/tri/flash_attention.vibee: IO-aware tiled attention spec
- specs/tri/parallel_inference.vibee: Multi-threaded inference spec
- src/vibeec/gguf_to_tri.zig: GGUF to .tri format converter
- src/vibeec/tri_inference.zig: Ternary model inference engine
- src/vibeec/flash_attention.zig: Online softmax, SIMD dot product
- src/vibeec/parallel_inference.zig: Thread-parallel matmul
- src/vibeec/flash_benchmark.zig: Flash vs standard attention benchmark
- fly.toml: Updated for performance-16x (16 CPU cores)
- Dockerfile.flyio: Fly.io deployment container
- benchmark_flyio.sh: Performance estimation script
Performance on 2-core:
- GGUF Q8_0: 8.73 tok/s (baseline)
- TRI + SIMD: 7.97 tok/s (-9%)
Expected on Fly.io performance-16x:
- ~50 tok/s (6x speedup)
Co-authored-by: Ona <no-reply@ona.com>1 parent 7945962 commit d8f169e
11 files changed
Lines changed: 2920 additions & 7 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
1 | 2 | | |
2 | | - | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
3 | 6 | | |
4 | 7 | | |
5 | 8 | | |
| |||
11 | 14 | | |
12 | 15 | | |
13 | 16 | | |
| 17 | + | |
14 | 18 | | |
15 | | - | |
16 | | - | |
| 19 | + | |
| 20 | + | |
17 | 21 | | |
18 | | - | |
19 | | - | |
20 | | - | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
21 | 25 | | |
22 | | - | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
23 | 32 | | |
24 | 33 | | |
25 | 34 | | |
| |||
0 commit comments