Skip to content

Commit 0ea9f8b

Browse files
gHashTagona-agent
andcommitted
feat(e2e): fix CLI, add streaming spec, E2E inference working
- Fix error_reporter.zig ArrayList API (Zig 0.13 compatibility) - Fix cli.zig std.io.getStdOut() usage - Create specs/tri/streaming_loader.vibee for large model support - Fix tri_inference.zig missing ternary_output_norm field - Add 10 streaming/large model tests (25 total passing) - Update PERFORMANCE_COMPARISON.md with E2E results E2E Results (TinyLlama-1.1B): - Conversion: 638 MB GGUF → 497 MB TRI (22% smaller) - Load time: 4.3 seconds - Inference: 1.98 tok/s (needs SIMD optimization) Co-authored-by: Ona <no-reply@ona.com>
1 parent 7ca5843 commit 0ea9f8b

5 files changed

Lines changed: 564 additions & 14 deletions

File tree

docs/PERFORMANCE_COMPARISON.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -178,6 +178,29 @@ Trinity now supports converting any GGUF model to ternary .tri format:
178178
- Q4_K, Q5_K, Q6_K, Q8_K (K-quants)
179179
- TQ1_0, TQ2_0 (native ternary)
180180

181+
### 7.3 E2E Inference Results (TinyLlama-1.1B)
182+
183+
| Metric | GGUF (Q4_K_M) | TRI (Ternary) | Improvement |
184+
|--------|---------------|---------------|-------------|
185+
| Model Size | 638 MB | 497 MB | 22% smaller |
186+
| Load Time | ~2s | 4.3s | -2x (needs streaming) |
187+
| Inference | ~5-10 tok/s* | 1.98 tok/s | Needs optimization |
188+
| Memory (runtime) | ~800 MB | ~600 MB | 25% less |
189+
190+
*Estimated for llama.cpp on similar CPU
191+
192+
**Conversion Stats:**
193+
- Input: TinyLlama-1.1B Q4_K_M (638 MB)
194+
- Output: TinyLlama-1.1B TRI (497 MB)
195+
- Conversion time: ~10 seconds
196+
- Compression vs F32: 16x
197+
198+
**Next optimizations needed:**
199+
1. SIMD-16 ternary matmul (currently scalar)
200+
2. Flash Attention integration
201+
3. Streaming loader for large models
202+
4. Parallel layer processing
203+
181204
### 7.2 Performance Targets
182205

183206
| Metric | llama.cpp | vLLM | Trinity Target |

0 commit comments

Comments
 (0)