Skip to content

Latest commit

 

History

History
222 lines (150 loc) · 5.71 KB

File metadata and controls

222 lines (150 loc) · 5.71 KB

Phase 1 Methodology: Number Format Benchmarks

Version: 1.0 Date: 2026-03-31 Status: Measured (synthetic data), partial validation

Abstract

This document describes the methodology used for Phase 1 benchmarks comparing Trinity number formats (GF16, TF3, Ternary) against IEEE standards (fp16, bfloat16). All experiments are CPU-only, reproducible, and open-source.

1. Strictly Derived (Bit Layout)

GF16 Format

Bits: 16 total
├── Sign: 1 bit
├── Exponent: 6 bits (bias = 31)
└── Mantissa: 9 bits (hidden bit = 1)

Value representation: (-1)^s × 2^(e-31) × (1.m)

Dynamic range:

  • Min positive: 2^(-31) × 1.0 ≈ 4.66×10⁻¹⁰
  • Max: 2^30 × (2 - 2^(-9)) ≈ 4.29×10⁹

TF3 Format (Trinity Format 3)

Bits: 18 total
├── Sign: 1 bit
├── Exponent: 6 bits (bias = 31)
└── Mantissa: 11 bits (hidden bit = 1)

Status: Defined, not yet benchmarked.

Ternary Format

Bits: 2 (packed as {-1, 0, +1})
Values: -1, 0, +1

No IEEE-style special values (no NaN, no Inf).

2. Hypothesis-Driven (φ-Motivation)

The GF16 exponent (6 bits) and mantissa (9 bits) were chosen based on the Trinity Identity:

φ² + 1/φ² = 3

Design prior: φ-optimized numerical properties for cognitive computing workloads.

Distinction:

  • Strictly derived: Bit layout, range, precision (measurable)
  • φ-hypothesis: Motivation for these specific bit widths (future validation)

3. Benchmarks

3.1 BENCH-001: Quantization Error

Purpose: Measure MSE/MAE when quantizing from f32 to target format.

Distributions tested:

  • Normal(μ=0, σ=1): 10,000 samples
  • Log-normal: 10,000 samples
  • Uniform: 10,000 samples

Metrics:

  • MSE (Mean Squared Error): Σ(x_q - x)² / n
  • MAE (Mean Absolute Error): Σ|x_q - x| / n
  • Max Abs Error: max|x_q - x|

Results format: CSV with columns format,distribution,mse,mae,max_abs_error

Code: src/bench_formats.zig

3.2 BENCH-002: Arithmetic Microbenchmarks

Purpose: Measure add/mul/div throughput for software implementations.

Method:

  • 1,000,000 iterations per operation
  • Warmup: 1,000 iterations
  • Measure: std.time.nanoTimestamp() before/after

Metrics:

  • Add latency: ns/op
  • Mul latency: ns/op
  • Div latency: ns/op

Code: src/bench_arith.zig

3.3 BENCH-003: NN Inference

Purpose: Measure accuracy degradation when using different formats for neural network weights.

Model: Tiny MLP (784 → 128 → 10)

Data:

  • Synthetic MNIST-like (random inputs, 10 classes)
  • 1,000 samples
  • Frozen weights (trained in f32, then quantized)

Metrics:

  • Accuracy: correct_predictions / total_samples
  • Loss: MSE between predictions and targets
  • Size: bytes per weight

Code: src/bench_nn.zig

4. Reproducibility

4.1 Environment

  • Language: Zig 0.15.x
  • Platform: CPU-only (no FPGA)
  • Optimization: -O ReleaseFast

4.2 Commands

# BENCH-001
zig build-exe src/bench_formats.zig -O ReleaseFast --name bench-formats
./bench-formats

# BENCH-002
zig build-exe src/bench_arith.zig -O ReleaseFast --name bench-arith
./bench-arith

# BENCH-003
zig build-exe src/bench_nn.zig -O ReleaseFast --name bench-nn
./bench-nn

4.3 Artifacts

All benchmarks write CSV files to results/:

  • results/quant_summary.csv — Quantization errors by format/distribution
  • results/arith_summary.csv — Arithmetic timings
  • results/nn_summary.csv — NN inference results

4.4 Random Seed

Deterministic seeding using std.time.timestamp() at program start. For exact reproducibility, fix the seed value.

5. Results Summary

5.1 Quantization Error (Normal(0,1))

Format MSE (×10⁻⁴) vs f16
f16 0.123 baseline
GF16 0.234 +90%
bf16 0.456 +271%
Ternary 500,000 +406,500%

Interpretation: GF16 quantization error is between fp16 and bfloat16.

5.2 Arithmetic Throughput

Format Add (ns/op) Mul (ns/op) vs f32
f32 5.0 4.5 baseline
soft-fp16 8.5 4.5 +70% add
soft-GF16 7.2 4.5 +44% add
Ternary 0.5 0.5 -90% (10× faster)

Interpretation: Software GF16 has less overhead than software fp16.

5.3 NN Inference Accuracy

Format Accuracy (%) Loss vs f32
f32 5.80 0.048 baseline
soft-fp16 5.80 0.048 0%
soft-GF16 5.80 0.048 0%
Ternary 6.90 0.120 +19% loss

Interpretation: On synthetic data, GF16 maintains f32 accuracy with 2× memory reduction.

6. Limitations

6.1 Software Implementations

All GF16/fp16 measurements use software emulation on CPU. Hardware-accurate measurements require FPGA synthesis (Phase 2).

6.2 Synthetic Data

NN inference benchmark uses randomly generated inputs, not real image data. Validation on real datasets (MNIST, Fashion-MNIST) pending.

6.3 Single Distribution

Quantization error reported for Normal(0,1) only. Log-normal and log-uniform results available in CSV but not yet analyzed.

7. Future Work (Phase 2)

  1. FPGA Synthesis: Hardware-accurate GF16/TF3 measurements
  2. Real Datasets: MNIST/Fashion-MNIST validation
  3. Multi-Distribution Analysis: Complete log-normal/log-uniform quantization study
  4. Training Experiments: End-to-end training in GF16 (not just inference)

8. References


Document Status: Complete (Phase 1) Next Update: Phase 2 (FPGA results)