bench: add BENCH-007b, BENCH-008, BENCH-010 results

gHashTag · gHashTag · commit ae8ffa26784e · 2026-04-30T01:46:09.000+07:00
References #13, #14, #23
diff --git a/.trinity/results/bench_007b.log b/.trinity/results/bench_007b.log
@@ -0,0 +1,77 @@
+BENCH-007b: φ-Distance Extended Range Benchmark
+================================================
+Formats: all
+Cross-reference: whitepaper.md §9.5, issue #12
+
+Range: [-1,1]  (BENCH-007 baseline)
+----------------------------------------------------------------------
+Format              MSE        MAE    MaxAbsErr     φ-dist InRange
+----------------------------------------------------------------------
+f32            0.000000   0.000000     0.000000      0.000      ✓
+GF64           0.000000   0.000000     0.000000      0.264      ✓
+GF32           0.000000   0.000022     0.000086      0.340      ✓
+fp16           0.000000   0.000164     0.000488      0.118      ✓
+bf16           0.000003   0.001303     0.003906      0.525      ✓
+GF16           0.000005   0.001685     0.006572      0.049      ✓
+GF8            0.000635   0.018808     0.072891      0.132      ✓
+GFTernary      0.274327   0.463541     0.808981      0.000      ✓
+
+Range: [-10,10] (BENCH-007b target)
+----------------------------------------------------------------------
+Format              MSE        MAE    MaxAbsErr     φ-dist InRange
+----------------------------------------------------------------------
+f32            0.000000   0.000000     0.000000      0.000      ✓
+GF64           0.000000   0.000000     0.000000      0.264      ✓
+GF32           0.000000   0.000221     0.000956      0.340      ✓
+fp16           0.000006   0.001824     0.007809      0.118      ✓
+bf16           0.000410   0.014605     0.062475      0.525      ✓
+GF16           0.000520   0.016769     0.072897      0.049      ✓
+GF8            6.390662   1.695289     5.763932      0.132   CLIP
+GFTernary     19.670186   3.578799     8.381966      0.000   CLIP
+
+Range: [-100,100] (stress test)
+----------------------------------------------------------------------
+Format              MSE        MAE    MaxAbsErr     φ-dist InRange
+----------------------------------------------------------------------
+f32            0.000000   0.000000     0.000000      0.000      ✓
+GF64           0.000000   0.000000     0.000000      0.264      ✓
+GF32           0.000008   0.002144     0.010632      0.340      ✓
+fp16           0.000592   0.017996     0.062494      0.118      ✓
+bf16           0.039039   0.146101     0.499950      0.525      ✓
+GF16           0.047455   0.162744     0.801568      0.049      ✓
+GF8          2928.042315  45.862040    95.763932      0.132   CLIP
+GFTernary    3174.787932  48.406601    98.381966      0.000   CLIP
+
+Range: φ-distributed [-10,10]
+----------------------------------------------------------------------
+Format              MSE        MAE    MaxAbsErr     φ-dist InRange
+----------------------------------------------------------------------
+f32            0.000000   0.000000     0.000000      0.000      ✓
+GF64           0.000000   0.000000     0.000000      0.264      ✓
+GF32           0.000000   0.000181     0.000593      0.340      ✓
+fp16           0.000003   0.001427     0.003904      0.118      ✓
+bf16           0.000203   0.011353     0.031248      0.525      ✓
+GF16           0.000299   0.013587     0.045082      0.049      ✓
+GF8            0.934011   0.646305     2.279514      0.132   CLIP
+GFTernary      8.539686   2.517691     4.897548      0.000   CLIP
+
+φ-Distance vs MSE Correlation (range [-10,10], n=10000)
+----------------------------------------------------------------------
+Hypothesis: formats with lower φ-distance should have lower quantization error
+(on φ-distributed inputs that match the format's mathematical basis)
+
+Pearson r (φ-distance vs MSE) = -0.4200
+→ WEAK correlation — MSE dominated by bit-width, not φ-alignment
+
+GF Family MSE Ranking at [-10, 10]:
+--------------------------------------------------
+🥇  1. GF64         MSE=0.000000  φ-dist=0.264
+🥈  2. GF32         MSE=0.000000  φ-dist=0.340
+🥉  3. fp16         MSE=0.000006  φ-dist=0.118
+    4. bf16         MSE=0.000410  φ-dist=0.525
+    5. GF16         MSE=0.000520  φ-dist=0.049
+    6. GF8          MSE=6.390662  φ-dist=0.132 ← SATURATES
+    7. GFTernary    MSE=19.670186  φ-dist=0.000 ← SATURATES
+
+Results: .trinity/results/bench_007b_extended_range.log
+Next: BENCH-008 Fashion-MNIST validation
diff --git a/.trinity/results/bench_008.log b/.trinity/results/bench_008.log
@@ -0,0 +1,38 @@
+BENCH-008: Fashion-MNIST MLP Quantization Validation
+======================================================
+Architecture: MLP 784->256->128->10 (He init, synthetic weights)
+Cross-reference: whitepaper.md §9.5, issue #14
+
+Post-Training Quantization Results:
+--------------------------------------------------------------------------------
+Format       phi-dist        MSE        MAE     MaxErr  Sparse%  Acc.Drop%
+--------------------------------------------------------------------------------
+fp32            0.000   0.000000   0.000000   0.000000     0.2%      0.00%
+GF16            0.049   0.000000   0.000016   0.000244     0.2%      0.00%
+fp16            0.118   0.000000   0.000019   0.031249     0.2%      0.00%
+bf16            0.525   0.000000   0.000063   0.000974     0.2%      0.00%
+GFTernary       0.000   0.003346   0.044783   0.426161   100.0%      0.09%
+--------------------------------------------------------------------------------
+
+Synthetic Inference Accuracy (1000 samples):
+--------------------------------------------------
+fp32         synthetic_acc=10.7%  (phi-dist=0.000)
+GF16         synthetic_acc=10.7%  (phi-dist=0.049)
+fp16         synthetic_acc=10.6%  (phi-dist=0.118)
+bf16         synthetic_acc=10.8%  (phi-dist=0.525)
+GFTernary    synthetic_acc=10.0%  (phi-dist=0.000)
+
+phi-Distance vs Weight MSE Correlation:
+--------------------------------------------------
+Pearson r(phi-distance, weight MSE) = -0.3494
+-> No correlation — bit-width dominates over phi-alignment
+
+GFTernary Special Analysis:
+--------------------------------------------------
+Sparsity:       100.0% of weights -> 0
+phi-distance:   0.000 (perfect Trinity basis)
+Weight MSE:     0.003346
+Memory saving:  ~16x vs fp32 (2-bit vs 32-bit)
+
+Results: .trinity/results/bench_008_fashion_mnist.log
+Next: BENCH-009 Transformer attention pattern analysis
diff --git a/.trinity/results/bench_010.log b/.trinity/results/bench_010.log
@@ -0,0 +1,110 @@
+BENCH-010: Format Analysis Suite (post BUG-001-BF16 fix)
+=========================================================
+seed=42, n=10000 per distribution
+Cross-reference: issue #23
+
+Distribution: GAUSS_001 (n=10000)
+--------------------------------------------------------------------------------
+Format              MSE    MaxAbsErr      ULP_obs       ULP_th   Status
+--------------------------------------------------------------------------------
+fp32           6.34e-20      1.83e-9      1.83e-9     9.31e-10     FAIL
+fp16            7.68e-8      1.56e-2      1.56e-2      7.63e-6     FAIL
+bf16           2.69e-10      1.09e-4      1.09e-4      6.10e-5     FAIL
+gf16           1.74e-11      2.91e-5      2.91e-5      1.53e-5     FAIL
+ternary         1.00e-4      4.61e-2      4.61e-2       1.00e0     FAIL
+
+Distribution: GAUSS_01 (n=10000)
+--------------------------------------------------------------------------------
+Format              MSE    MaxAbsErr      ULP_obs       ULP_th   Status
+--------------------------------------------------------------------------------
+fp32           6.26e-18      1.47e-8      1.47e-8      7.45e-9     FAIL
+fp16            9.97e-8      3.12e-2      3.12e-2      6.10e-5     FAIL
+bf16            2.78e-8      9.63e-4      9.63e-4      4.88e-4     FAIL
+gf16            1.74e-9      2.44e-4      2.44e-4      1.22e-4     FAIL
+ternary         1.00e-2      4.61e-1      4.61e-1       1.00e0     FAIL
+
+Distribution: GAUSS_10 (n=10000)
+--------------------------------------------------------------------------------
+Format              MSE    MaxAbsErr      ULP_obs       ULP_th   Status
+--------------------------------------------------------------------------------
+fp32           6.55e-16      2.31e-7      2.31e-7      5.96e-8     FAIL
+fp16            1.85e-7      7.81e-3      7.81e-3      4.88e-4     FAIL
+bf16            2.83e-6      1.32e-2      1.32e-2      3.91e-3     FAIL
+gf16            1.72e-7      2.46e-3      2.46e-3      9.77e-4     FAIL
+ternary         2.04e-1       3.61e0       3.61e0       1.00e0     FAIL
+
+Distribution: GAUSS_100 (n=10000)
+--------------------------------------------------------------------------------
+Format              MSE    MaxAbsErr      ULP_obs       ULP_th   Status
+--------------------------------------------------------------------------------
+fp32           6.44e-14      1.74e-6      1.74e-6      9.54e-7     FAIL
+fp16            1.80e-5      2.67e-2      2.67e-2      7.81e-3     FAIL
+bf16            2.81e-4      1.23e-1      1.23e-1      6.25e-2     FAIL
+gf16            1.69e-5      2.67e-2      2.67e-2      1.56e-2     FAIL
+ternary          8.53e1       4.51e1       4.51e1       1.00e0     FAIL
+
+Distribution: UNIFORM_1 (n=10000)
+--------------------------------------------------------------------------------
+Format              MSE    MaxAbsErr      ULP_obs       ULP_th   Status
+--------------------------------------------------------------------------------
+fp32           1.67e-16      2.98e-8      2.98e-8      2.98e-8     PASS
+fp16            4.45e-8      4.88e-4      4.88e-4      2.44e-4     FAIL
+bf16            7.26e-7      1.95e-3      1.95e-3      1.95e-3     PASS
+gf16            4.48e-8      4.88e-4      4.88e-4      4.88e-4     PASS
+ternary         8.47e-2      5.00e-1      5.00e-1       1.00e0     FAIL
+
+Distribution: UNIFORM_100 (n=10000)
+--------------------------------------------------------------------------------
+Format              MSE    MaxAbsErr      ULP_obs       ULP_th   Status
+--------------------------------------------------------------------------------
+fp32           2.18e-12      3.81e-6      3.81e-6      3.81e-6     PASS
+fp16            5.88e-4      6.25e-2      6.25e-2      3.12e-2     FAIL
+bf16            9.37e-3      2.50e-1      2.50e-1      2.50e-1     PASS
+gf16            5.77e-4      6.24e-2      6.24e-2      6.25e-2     PASS
+ternary          3.19e3       9.90e1       9.90e1       1.00e0     FAIL
+
+=== Hypothesis Tests ===
+
+H1: bf16 vs gf16 on Uniform [-100,+100]
+  bf16 MSE = 9.369421e-3
+  gf16 MSE = 5.772505e-4
+  DIVERGED (ratio=0.9384) -> H1 CONFIRMED (fix resolved collision)
+
+H2: bf16 vs gf16 on Gaussian σ=0.1
+  bf16 MSE = 2.784927e-8
+  gf16 MSE = 1.740400e-9
+  DIVERGED (ratio=0.9375) -> H2 FAILED (was genuine bug)
+
+=== Full Result Log ===
+RESULT=fp32 @ GAUSS_001 | MSE=6.34e-20 ULP_th=9.31e-10 ULP_obs=1.83e-9 status=FAIL
+RESULT=fp16 @ GAUSS_001 | MSE=7.68e-8 ULP_th=7.63e-6 ULP_obs=1.56e-2 status=FAIL
+RESULT=bf16 @ GAUSS_001 | MSE=2.69e-10 ULP_th=6.10e-5 ULP_obs=1.09e-4 status=FAIL
+RESULT=gf16 @ GAUSS_001 | MSE=1.74e-11 ULP_th=1.53e-5 ULP_obs=2.91e-5 status=FAIL
+RESULT=ternary @ GAUSS_001 | MSE=1.00e-4 ULP_th=1.00e0 ULP_obs=4.61e-2 status=FAIL
+RESULT=fp32 @ GAUSS_01 | MSE=6.26e-18 ULP_th=7.45e-9 ULP_obs=1.47e-8 status=FAIL
+RESULT=fp16 @ GAUSS_01 | MSE=9.97e-8 ULP_th=6.10e-5 ULP_obs=3.12e-2 status=FAIL
+RESULT=bf16 @ GAUSS_01 | MSE=2.78e-8 ULP_th=4.88e-4 ULP_obs=9.63e-4 status=FAIL
+RESULT=gf16 @ GAUSS_01 | MSE=1.74e-9 ULP_th=1.22e-4 ULP_obs=2.44e-4 status=FAIL
+RESULT=ternary @ GAUSS_01 | MSE=1.00e-2 ULP_th=1.00e0 ULP_obs=4.61e-1 status=FAIL
+RESULT=fp32 @ GAUSS_10 | MSE=6.55e-16 ULP_th=5.96e-8 ULP_obs=2.31e-7 status=FAIL
+RESULT=fp16 @ GAUSS_10 | MSE=1.85e-7 ULP_th=4.88e-4 ULP_obs=7.81e-3 status=FAIL
+RESULT=bf16 @ GAUSS_10 | MSE=2.83e-6 ULP_th=3.91e-3 ULP_obs=1.32e-2 status=FAIL
+RESULT=gf16 @ GAUSS_10 | MSE=1.72e-7 ULP_th=9.77e-4 ULP_obs=2.46e-3 status=FAIL
+RESULT=ternary @ GAUSS_10 | MSE=2.04e-1 ULP_th=1.00e0 ULP_obs=3.61e0 status=FAIL
+RESULT=fp32 @ GAUSS_100 | MSE=6.44e-14 ULP_th=9.54e-7 ULP_obs=1.74e-6 status=FAIL
+RESULT=fp16 @ GAUSS_100 | MSE=1.80e-5 ULP_th=7.81e-3 ULP_obs=2.67e-2 status=FAIL
+RESULT=bf16 @ GAUSS_100 | MSE=2.81e-4 ULP_th=6.25e-2 ULP_obs=1.23e-1 status=FAIL
+RESULT=gf16 @ GAUSS_100 | MSE=1.69e-5 ULP_th=1.56e-2 ULP_obs=2.67e-2 status=FAIL
+RESULT=ternary @ GAUSS_100 | MSE=8.53e1 ULP_th=1.00e0 ULP_obs=4.51e1 status=FAIL
+RESULT=fp32 @ UNIFORM_1 | MSE=1.67e-16 ULP_th=2.98e-8 ULP_obs=2.98e-8 status=PASS
+RESULT=fp16 @ UNIFORM_1 | MSE=4.45e-8 ULP_th=2.44e-4 ULP_obs=4.88e-4 status=FAIL
+RESULT=bf16 @ UNIFORM_1 | MSE=7.26e-7 ULP_th=1.95e-3 ULP_obs=1.95e-3 status=PASS
+RESULT=gf16 @ UNIFORM_1 | MSE=4.48e-8 ULP_th=4.88e-4 ULP_obs=4.88e-4 status=PASS
+RESULT=ternary @ UNIFORM_1 | MSE=8.47e-2 ULP_th=1.00e0 ULP_obs=5.00e-1 status=FAIL
+RESULT=fp32 @ UNIFORM_100 | MSE=2.18e-12 ULP_th=3.81e-6 ULP_obs=3.81e-6 status=PASS
+RESULT=fp16 @ UNIFORM_100 | MSE=5.88e-4 ULP_th=3.12e-2 ULP_obs=6.25e-2 status=FAIL
+RESULT=bf16 @ UNIFORM_100 | MSE=9.37e-3 ULP_th=2.50e-1 ULP_obs=2.50e-1 status=PASS
+RESULT=gf16 @ UNIFORM_100 | MSE=5.77e-4 ULP_th=6.25e-2 ULP_obs=6.24e-2 status=PASS
+RESULT=ternary @ UNIFORM_100 | MSE=3.19e3 ULP_th=1.00e0 ULP_obs=9.90e1 status=FAIL
+
+Results: .trinity/results/bench_010.log