Skip to content

Commit ae8ffa2

Browse files
committed
bench: add BENCH-007b, BENCH-008, BENCH-010 results
References #13, #14, #23
1 parent bdea315 commit ae8ffa2

3 files changed

Lines changed: 225 additions & 0 deletions

File tree

.trinity/results/bench_007b.log

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
BENCH-007b: φ-Distance Extended Range Benchmark
2+
================================================
3+
Formats: all
4+
Cross-reference: whitepaper.md §9.5, issue #12
5+
6+
Range: [-1,1] (BENCH-007 baseline)
7+
----------------------------------------------------------------------
8+
Format MSE MAE MaxAbsErr φ-dist InRange
9+
----------------------------------------------------------------------
10+
f32 0.000000 0.000000 0.000000 0.000 ✓
11+
GF64 0.000000 0.000000 0.000000 0.264 ✓
12+
GF32 0.000000 0.000022 0.000086 0.340 ✓
13+
fp16 0.000000 0.000164 0.000488 0.118 ✓
14+
bf16 0.000003 0.001303 0.003906 0.525 ✓
15+
GF16 0.000005 0.001685 0.006572 0.049 ✓
16+
GF8 0.000635 0.018808 0.072891 0.132 ✓
17+
GFTernary 0.274327 0.463541 0.808981 0.000 ✓
18+
19+
Range: [-10,10] (BENCH-007b target)
20+
----------------------------------------------------------------------
21+
Format MSE MAE MaxAbsErr φ-dist InRange
22+
----------------------------------------------------------------------
23+
f32 0.000000 0.000000 0.000000 0.000 ✓
24+
GF64 0.000000 0.000000 0.000000 0.264 ✓
25+
GF32 0.000000 0.000221 0.000956 0.340 ✓
26+
fp16 0.000006 0.001824 0.007809 0.118 ✓
27+
bf16 0.000410 0.014605 0.062475 0.525 ✓
28+
GF16 0.000520 0.016769 0.072897 0.049 ✓
29+
GF8 6.390662 1.695289 5.763932 0.132 CLIP
30+
GFTernary 19.670186 3.578799 8.381966 0.000 CLIP
31+
32+
Range: [-100,100] (stress test)
33+
----------------------------------------------------------------------
34+
Format MSE MAE MaxAbsErr φ-dist InRange
35+
----------------------------------------------------------------------
36+
f32 0.000000 0.000000 0.000000 0.000 ✓
37+
GF64 0.000000 0.000000 0.000000 0.264 ✓
38+
GF32 0.000008 0.002144 0.010632 0.340 ✓
39+
fp16 0.000592 0.017996 0.062494 0.118 ✓
40+
bf16 0.039039 0.146101 0.499950 0.525 ✓
41+
GF16 0.047455 0.162744 0.801568 0.049 ✓
42+
GF8 2928.042315 45.862040 95.763932 0.132 CLIP
43+
GFTernary 3174.787932 48.406601 98.381966 0.000 CLIP
44+
45+
Range: φ-distributed [-10,10]
46+
----------------------------------------------------------------------
47+
Format MSE MAE MaxAbsErr φ-dist InRange
48+
----------------------------------------------------------------------
49+
f32 0.000000 0.000000 0.000000 0.000 ✓
50+
GF64 0.000000 0.000000 0.000000 0.264 ✓
51+
GF32 0.000000 0.000181 0.000593 0.340 ✓
52+
fp16 0.000003 0.001427 0.003904 0.118 ✓
53+
bf16 0.000203 0.011353 0.031248 0.525 ✓
54+
GF16 0.000299 0.013587 0.045082 0.049 ✓
55+
GF8 0.934011 0.646305 2.279514 0.132 CLIP
56+
GFTernary 8.539686 2.517691 4.897548 0.000 CLIP
57+
58+
φ-Distance vs MSE Correlation (range [-10,10], n=10000)
59+
----------------------------------------------------------------------
60+
Hypothesis: formats with lower φ-distance should have lower quantization error
61+
(on φ-distributed inputs that match the format's mathematical basis)
62+
63+
Pearson r (φ-distance vs MSE) = -0.4200
64+
→ WEAK correlation — MSE dominated by bit-width, not φ-alignment
65+
66+
GF Family MSE Ranking at [-10, 10]:
67+
--------------------------------------------------
68+
🥇 1. GF64 MSE=0.000000 φ-dist=0.264
69+
🥈 2. GF32 MSE=0.000000 φ-dist=0.340
70+
🥉 3. fp16 MSE=0.000006 φ-dist=0.118
71+
4. bf16 MSE=0.000410 φ-dist=0.525
72+
5. GF16 MSE=0.000520 φ-dist=0.049
73+
6. GF8 MSE=6.390662 φ-dist=0.132 ← SATURATES
74+
7. GFTernary MSE=19.670186 φ-dist=0.000 ← SATURATES
75+
76+
Results: .trinity/results/bench_007b_extended_range.log
77+
Next: BENCH-008 Fashion-MNIST validation

.trinity/results/bench_008.log

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
BENCH-008: Fashion-MNIST MLP Quantization Validation
2+
======================================================
3+
Architecture: MLP 784->256->128->10 (He init, synthetic weights)
4+
Cross-reference: whitepaper.md §9.5, issue #14
5+
6+
Post-Training Quantization Results:
7+
--------------------------------------------------------------------------------
8+
Format phi-dist MSE MAE MaxErr Sparse% Acc.Drop%
9+
--------------------------------------------------------------------------------
10+
fp32 0.000 0.000000 0.000000 0.000000 0.2% 0.00%
11+
GF16 0.049 0.000000 0.000016 0.000244 0.2% 0.00%
12+
fp16 0.118 0.000000 0.000019 0.031249 0.2% 0.00%
13+
bf16 0.525 0.000000 0.000063 0.000974 0.2% 0.00%
14+
GFTernary 0.000 0.003346 0.044783 0.426161 100.0% 0.09%
15+
--------------------------------------------------------------------------------
16+
17+
Synthetic Inference Accuracy (1000 samples):
18+
--------------------------------------------------
19+
fp32 synthetic_acc=10.7% (phi-dist=0.000)
20+
GF16 synthetic_acc=10.7% (phi-dist=0.049)
21+
fp16 synthetic_acc=10.6% (phi-dist=0.118)
22+
bf16 synthetic_acc=10.8% (phi-dist=0.525)
23+
GFTernary synthetic_acc=10.0% (phi-dist=0.000)
24+
25+
phi-Distance vs Weight MSE Correlation:
26+
--------------------------------------------------
27+
Pearson r(phi-distance, weight MSE) = -0.3494
28+
-> No correlation — bit-width dominates over phi-alignment
29+
30+
GFTernary Special Analysis:
31+
--------------------------------------------------
32+
Sparsity: 100.0% of weights -> 0
33+
phi-distance: 0.000 (perfect Trinity basis)
34+
Weight MSE: 0.003346
35+
Memory saving: ~16x vs fp32 (2-bit vs 32-bit)
36+
37+
Results: .trinity/results/bench_008_fashion_mnist.log
38+
Next: BENCH-009 Transformer attention pattern analysis

.trinity/results/bench_010.log

Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
BENCH-010: Format Analysis Suite (post BUG-001-BF16 fix)
2+
=========================================================
3+
seed=42, n=10000 per distribution
4+
Cross-reference: issue #23
5+
6+
Distribution: GAUSS_001 (n=10000)
7+
--------------------------------------------------------------------------------
8+
Format MSE MaxAbsErr ULP_obs ULP_th Status
9+
--------------------------------------------------------------------------------
10+
fp32 6.34e-20 1.83e-9 1.83e-9 9.31e-10 FAIL
11+
fp16 7.68e-8 1.56e-2 1.56e-2 7.63e-6 FAIL
12+
bf16 2.69e-10 1.09e-4 1.09e-4 6.10e-5 FAIL
13+
gf16 1.74e-11 2.91e-5 2.91e-5 1.53e-5 FAIL
14+
ternary 1.00e-4 4.61e-2 4.61e-2 1.00e0 FAIL
15+
16+
Distribution: GAUSS_01 (n=10000)
17+
--------------------------------------------------------------------------------
18+
Format MSE MaxAbsErr ULP_obs ULP_th Status
19+
--------------------------------------------------------------------------------
20+
fp32 6.26e-18 1.47e-8 1.47e-8 7.45e-9 FAIL
21+
fp16 9.97e-8 3.12e-2 3.12e-2 6.10e-5 FAIL
22+
bf16 2.78e-8 9.63e-4 9.63e-4 4.88e-4 FAIL
23+
gf16 1.74e-9 2.44e-4 2.44e-4 1.22e-4 FAIL
24+
ternary 1.00e-2 4.61e-1 4.61e-1 1.00e0 FAIL
25+
26+
Distribution: GAUSS_10 (n=10000)
27+
--------------------------------------------------------------------------------
28+
Format MSE MaxAbsErr ULP_obs ULP_th Status
29+
--------------------------------------------------------------------------------
30+
fp32 6.55e-16 2.31e-7 2.31e-7 5.96e-8 FAIL
31+
fp16 1.85e-7 7.81e-3 7.81e-3 4.88e-4 FAIL
32+
bf16 2.83e-6 1.32e-2 1.32e-2 3.91e-3 FAIL
33+
gf16 1.72e-7 2.46e-3 2.46e-3 9.77e-4 FAIL
34+
ternary 2.04e-1 3.61e0 3.61e0 1.00e0 FAIL
35+
36+
Distribution: GAUSS_100 (n=10000)
37+
--------------------------------------------------------------------------------
38+
Format MSE MaxAbsErr ULP_obs ULP_th Status
39+
--------------------------------------------------------------------------------
40+
fp32 6.44e-14 1.74e-6 1.74e-6 9.54e-7 FAIL
41+
fp16 1.80e-5 2.67e-2 2.67e-2 7.81e-3 FAIL
42+
bf16 2.81e-4 1.23e-1 1.23e-1 6.25e-2 FAIL
43+
gf16 1.69e-5 2.67e-2 2.67e-2 1.56e-2 FAIL
44+
ternary 8.53e1 4.51e1 4.51e1 1.00e0 FAIL
45+
46+
Distribution: UNIFORM_1 (n=10000)
47+
--------------------------------------------------------------------------------
48+
Format MSE MaxAbsErr ULP_obs ULP_th Status
49+
--------------------------------------------------------------------------------
50+
fp32 1.67e-16 2.98e-8 2.98e-8 2.98e-8 PASS
51+
fp16 4.45e-8 4.88e-4 4.88e-4 2.44e-4 FAIL
52+
bf16 7.26e-7 1.95e-3 1.95e-3 1.95e-3 PASS
53+
gf16 4.48e-8 4.88e-4 4.88e-4 4.88e-4 PASS
54+
ternary 8.47e-2 5.00e-1 5.00e-1 1.00e0 FAIL
55+
56+
Distribution: UNIFORM_100 (n=10000)
57+
--------------------------------------------------------------------------------
58+
Format MSE MaxAbsErr ULP_obs ULP_th Status
59+
--------------------------------------------------------------------------------
60+
fp32 2.18e-12 3.81e-6 3.81e-6 3.81e-6 PASS
61+
fp16 5.88e-4 6.25e-2 6.25e-2 3.12e-2 FAIL
62+
bf16 9.37e-3 2.50e-1 2.50e-1 2.50e-1 PASS
63+
gf16 5.77e-4 6.24e-2 6.24e-2 6.25e-2 PASS
64+
ternary 3.19e3 9.90e1 9.90e1 1.00e0 FAIL
65+
66+
=== Hypothesis Tests ===
67+
68+
H1: bf16 vs gf16 on Uniform [-100,+100]
69+
bf16 MSE = 9.369421e-3
70+
gf16 MSE = 5.772505e-4
71+
DIVERGED (ratio=0.9384) -> H1 CONFIRMED (fix resolved collision)
72+
73+
H2: bf16 vs gf16 on Gaussian σ=0.1
74+
bf16 MSE = 2.784927e-8
75+
gf16 MSE = 1.740400e-9
76+
DIVERGED (ratio=0.9375) -> H2 FAILED (was genuine bug)
77+
78+
=== Full Result Log ===
79+
RESULT=fp32 @ GAUSS_001 | MSE=6.34e-20 ULP_th=9.31e-10 ULP_obs=1.83e-9 status=FAIL
80+
RESULT=fp16 @ GAUSS_001 | MSE=7.68e-8 ULP_th=7.63e-6 ULP_obs=1.56e-2 status=FAIL
81+
RESULT=bf16 @ GAUSS_001 | MSE=2.69e-10 ULP_th=6.10e-5 ULP_obs=1.09e-4 status=FAIL
82+
RESULT=gf16 @ GAUSS_001 | MSE=1.74e-11 ULP_th=1.53e-5 ULP_obs=2.91e-5 status=FAIL
83+
RESULT=ternary @ GAUSS_001 | MSE=1.00e-4 ULP_th=1.00e0 ULP_obs=4.61e-2 status=FAIL
84+
RESULT=fp32 @ GAUSS_01 | MSE=6.26e-18 ULP_th=7.45e-9 ULP_obs=1.47e-8 status=FAIL
85+
RESULT=fp16 @ GAUSS_01 | MSE=9.97e-8 ULP_th=6.10e-5 ULP_obs=3.12e-2 status=FAIL
86+
RESULT=bf16 @ GAUSS_01 | MSE=2.78e-8 ULP_th=4.88e-4 ULP_obs=9.63e-4 status=FAIL
87+
RESULT=gf16 @ GAUSS_01 | MSE=1.74e-9 ULP_th=1.22e-4 ULP_obs=2.44e-4 status=FAIL
88+
RESULT=ternary @ GAUSS_01 | MSE=1.00e-2 ULP_th=1.00e0 ULP_obs=4.61e-1 status=FAIL
89+
RESULT=fp32 @ GAUSS_10 | MSE=6.55e-16 ULP_th=5.96e-8 ULP_obs=2.31e-7 status=FAIL
90+
RESULT=fp16 @ GAUSS_10 | MSE=1.85e-7 ULP_th=4.88e-4 ULP_obs=7.81e-3 status=FAIL
91+
RESULT=bf16 @ GAUSS_10 | MSE=2.83e-6 ULP_th=3.91e-3 ULP_obs=1.32e-2 status=FAIL
92+
RESULT=gf16 @ GAUSS_10 | MSE=1.72e-7 ULP_th=9.77e-4 ULP_obs=2.46e-3 status=FAIL
93+
RESULT=ternary @ GAUSS_10 | MSE=2.04e-1 ULP_th=1.00e0 ULP_obs=3.61e0 status=FAIL
94+
RESULT=fp32 @ GAUSS_100 | MSE=6.44e-14 ULP_th=9.54e-7 ULP_obs=1.74e-6 status=FAIL
95+
RESULT=fp16 @ GAUSS_100 | MSE=1.80e-5 ULP_th=7.81e-3 ULP_obs=2.67e-2 status=FAIL
96+
RESULT=bf16 @ GAUSS_100 | MSE=2.81e-4 ULP_th=6.25e-2 ULP_obs=1.23e-1 status=FAIL
97+
RESULT=gf16 @ GAUSS_100 | MSE=1.69e-5 ULP_th=1.56e-2 ULP_obs=2.67e-2 status=FAIL
98+
RESULT=ternary @ GAUSS_100 | MSE=8.53e1 ULP_th=1.00e0 ULP_obs=4.51e1 status=FAIL
99+
RESULT=fp32 @ UNIFORM_1 | MSE=1.67e-16 ULP_th=2.98e-8 ULP_obs=2.98e-8 status=PASS
100+
RESULT=fp16 @ UNIFORM_1 | MSE=4.45e-8 ULP_th=2.44e-4 ULP_obs=4.88e-4 status=FAIL
101+
RESULT=bf16 @ UNIFORM_1 | MSE=7.26e-7 ULP_th=1.95e-3 ULP_obs=1.95e-3 status=PASS
102+
RESULT=gf16 @ UNIFORM_1 | MSE=4.48e-8 ULP_th=4.88e-4 ULP_obs=4.88e-4 status=PASS
103+
RESULT=ternary @ UNIFORM_1 | MSE=8.47e-2 ULP_th=1.00e0 ULP_obs=5.00e-1 status=FAIL
104+
RESULT=fp32 @ UNIFORM_100 | MSE=2.18e-12 ULP_th=3.81e-6 ULP_obs=3.81e-6 status=PASS
105+
RESULT=fp16 @ UNIFORM_100 | MSE=5.88e-4 ULP_th=3.12e-2 ULP_obs=6.25e-2 status=FAIL
106+
RESULT=bf16 @ UNIFORM_100 | MSE=9.37e-3 ULP_th=2.50e-1 ULP_obs=2.50e-1 status=PASS
107+
RESULT=gf16 @ UNIFORM_100 | MSE=5.77e-4 ULP_th=6.25e-2 ULP_obs=6.24e-2 status=PASS
108+
RESULT=ternary @ UNIFORM_100 | MSE=3.19e3 ULP_th=1.00e0 ULP_obs=9.90e1 status=FAIL
109+
110+
Results: .trinity/results/bench_010.log

0 commit comments

Comments
 (0)