Skip to content

Commit 7ca5843

Browse files
gHashTagona-agent
andcommitted
feat(gguf-converter): add GGUF → TRI converter specification
- Create specs/tri/gguf_to_tri.vibee as single source of truth - Support F32/F16/BF16/Q4/Q5/Q6/Q8/TQ1/TQ2 tensor types - Add parallel quantization via thread pool - Fix Zig syntax errors (packed keyword, while loop) - Add 10 unit tests for converter - Update docs with GGUF converter info Compression ratios: - F32 → Ternary: 16x - F16 → Ternary: 8x - Q8 → Ternary: 4x - Q4 → Ternary: 2x Co-authored-by: Ona <no-reply@ona.com>
1 parent 4431189 commit 7ca5843

5 files changed

Lines changed: 941 additions & 58 deletions

File tree

docs/PERFORMANCE_COMPARISON.md

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -159,7 +159,24 @@ Speedup: 1.1x over baseline
159159
| llama.cpp | GGUF | Q4/Q8 | No | High |
160160
| vLLM | HF | FP16/INT8 | No | High |
161161
| TGI | HF | FP16/INT8 | No | High |
162-
| **Trinity** | **.tri** | **Ternary** | **Yes** | **Low** |
162+
| **Trinity** | **GGUF → .tri** | **Ternary** | **Yes** | **Low** |
163+
164+
### 7.2 GGUF → TRI Converter
165+
166+
Trinity now supports converting any GGUF model to ternary .tri format:
167+
168+
| Input Format | Compression | Memory Savings |
169+
|--------------|-------------|----------------|
170+
| F32 → Ternary | 16x | 93.75% |
171+
| F16 → Ternary | 8x | 87.5% |
172+
| Q8 → Ternary | 4x | 75% |
173+
| Q4 → Ternary | 2x | 50% |
174+
175+
**Supported GGUF tensor types:**
176+
- F32, F16, BF16 (full precision)
177+
- Q4_0, Q4_1, Q5_0, Q5_1, Q8_0, Q8_1 (legacy quants)
178+
- Q4_K, Q5_K, Q6_K, Q8_K (K-quants)
179+
- TQ1_0, TQ2_0 (native ternary)
163180

164181
### 7.2 Performance Targets
165182

docs/TECH_TREE_STRATEGY.md

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,16 @@
3737
│ ✅ GQA (Grouped Query Attention) support │
3838
│ ✅ Ternary QKV projection integration │
3939
│ │
40-
│ COMPLETED (Phase 6 - E2E Verification) [NEW] │
40+
│ COMPLETED (Phase 5b - GGUF Converter) │
41+
│ ═════════════════════════════════════ │
42+
│ ✅ GGUF → TRI converter specification (gguf_to_tri.vibee) │
43+
│ ✅ Support for F32/F16/BF16/Q4/Q5/Q6/Q8 tensor types │
44+
│ ✅ Per-group quantization (group_size=128) │
45+
│ ✅ Parallel quantization via thread pool │
46+
│ ✅ Metadata extraction (vocab, tokenizer) │
47+
│ ✅ CLI integration (vibeec convert) │
48+
│ │
49+
│ COMPLETED (Phase 6 - E2E Verification) │
4150
│ ════════════════════════════════════════════ │
4251
│ ✅ GPU benchmarks (RTX 3090: 298K tokens/s) │
4352
│ ✅ 69 unit tests passing (100%) │

0 commit comments

Comments
 (0)