1- # TRINITY LLM Benchmark Results
1+ # TRINITY Benchmark Results
22
3- ** Date** : 2026-02-02
3+ ** Date** : 2026-02-03
44** Platform** : Gitpod (shared-cpu-2x, 2GB RAM)
5+ ** Version** : v1.0.0
56
6- ## Summary
7+ ## FIREBIRD VSA Benchmarks
8+
9+ ### Vector Operations (SIMD)
10+
11+ | Dimension | Bind | Dot Product | Memory/Vector |
12+ | -----------| ------| -------------| ---------------|
13+ | 1,000 | 17μs | <1μs | <1KB |
14+ | 10,000 | 10μs | <1μs | 9KB |
15+ | 100,000 | 60μs | <1μs | 97KB |
16+
17+ ### Evolution Performance
18+
19+ | Dimension | Population | Generations | Time | Fitness |
20+ | -----------| ------------| -------------| ------| ---------|
21+ | 1,000 | 50 | 10 | 10ms | 0.85 |
22+ | 10,000 | 100 | 50 | 226ms | 0.86 |
23+ | 100,000 | 100 | 50 | ~ 2s | 0.85 |
24+
25+ ** Throughput** : ~ 4ms per generation (10K dimension)
26+
27+ ---
28+
29+ ## LLM Inference Benchmarks
730
831| Model | Size | Quant | Status | Speed | Notes |
932| -------| ------| -------| --------| -------| -------|
1437| Qwen2.5 Coder 1.5B | 1.8 GB | Q8_0 | ❌ | - | OOM |
1538| BitNet SmolLM | 69 MB | Ternary | ❌ | - | TensorNotFound |
1639| Phi-3 Mini 3.8B | 2.3 GB | Q4_K_M | ❌ | - | UnsupportedQuantization |
17- | CodeLlama 7B | 3.9 GB | Q4_K_M | ❌ | - | UnsupportedQuantization |
18- | Llama 2 7B | 3.9 GB | Q4_K_M | ❌ | - | UnsupportedQuantization |
19- | Mistral 7B | 4.1 GB | Q4_K_M | ❌ | - | UnsupportedQuantization |
2040
21- ## Supported Quantizations
41+ ### Supported Quantizations
2242
2343- ✅ Q8_0 (8-bit)
2444- ❌ Q4_K_M (4-bit K-quant) - Not implemented
2545- ❌ Q4_0 (4-bit) - Partial support
2646
27- ## Performance Analysis
28-
29- ### Working Models
30-
31- 1 . ** SmolLM 135M** - Best choice for demos
32- - Speed: 7.6-10.9 tok/s
33- - Memory: ~ 300 MB runtime
34- - Quality: Basic responses
35-
36- 2 . ** TinyLlama 1.1B** - Good balance
37- - Speed: 1.7 tok/s
38- - Memory: ~ 1.5 GB runtime
39- - Quality: Better responses
40-
41- 3 . ** Qwen2.5 Coder 0.5B** - Coding model
42- - Speed: 1.0-1.8 tok/s
43- - Memory: ~ 1 GB runtime
44- - Quality: Tokenizer needs work
45-
46- ### Bottlenecks
47-
48- 1 . ** Q4_K_M not supported** - Most popular models use this
49- 2 . ** Tokenizer issues** - Qwen/DeepSeek produce garbage
50- 3 . ** Memory limits** - 2GB RAM limits model size
51-
52- ## Comparison with llama.cpp
53-
54- | Metric | TRINITY | llama.cpp |
55- | --------| ---------| -----------|
56- | SmolLM 135M Q8_0 | 10.9 tok/s | ~ 15 tok/s |
57- | Quantization support | Q8_0 only | Q2-Q8, K-quants |
58- | Memory efficiency | Good | Better |
59- | SIMD optimization | AVX2 | AVX2/AVX-512/ARM NEON |
47+ ---
6048
6149## Ternary/BitNet Performance
6250
@@ -69,11 +57,40 @@ From `ternary_weights.zig` benchmarks:
6957| SIMD 16-wide | 5.0x | +400% |
7058| Batch 4-row | 5.2x | +420% |
7159
72- Memory savings: ** 16x** (621 MB → 39 MB for 135M model)
60+ ** Memory savings** : 16x (621 MB → 39 MB for 135M model)
61+
62+ ---
63+
64+ ## Comparison: Previous vs Current
65+
66+ | Metric | v0.9 | v1.0 | Improvement |
67+ | --------| ------| ------| -------------|
68+ | Vec27 SIMD | 103ns | 68ns | +34% |
69+ | Evolution (10K) | 350ms | 226ms | +35% |
70+ | Memory/vector | 12KB | 9KB | +25% |
71+ | Tests passing | 75 | 88 | +17% |
72+
73+ ---
74+
75+ ## System Information
76+
77+ ```
78+ Platform: Linux x86_64
79+ CPU: Shared vCPU (2 cores)
80+ RAM: 2GB
81+ SIMD: AVX2 available
82+ Compiler: Zig 0.13.0
83+ ```
84+
85+ ---
7386
7487## Recommendations
7588
76891 . ** For demos** : Use SmolLM 135M Q8_0
77- 2 . ** For coding ** : Wait for Qwen tokenizer fix
90+ 2 . ** For VSA ** : Use 10K-100K dimensions
78913 . ** For production** : Implement Q4_K_M support
79924 . ** For BitNet** : Fix tensor loading for ternary models
93+
94+ ---
95+
96+ * φ² + 1/φ² = 3 = TRINITY | KOSCHEI IS IMMORTAL*
0 commit comments