Skip to content

fix(bf16): rewrite encoder/decoder as standard IEEE 754 bit ops#51

Closed
gHashTag wants to merge 5 commits into
mainfrom
fix/issue-22-bf16-encoder
Closed

fix(bf16): rewrite encoder/decoder as standard IEEE 754 bit ops#51
gHashTag wants to merge 5 commits into
mainfrom
fix/issue-22-bf16-encoder

Conversation

@gHashTag
Copy link
Copy Markdown
Owner

Summary

Replaces the broken f32ToBf16/bf16ToF32 implementation with the standard 2-line bit manipulation approach.

Bugs Fixed

The previous implementation had 4 bugs identified in #22:

  1. Exponent clamped to ±7 instead of IEEE-754 ±127 (treated 8-bit exp as 4-bit)
  2. Wrong mantissa width — 8 bits instead of 7
  3. Wrong bit layout in decoder — extracted 7-bit exp + 8-bit mantissa instead of 8+7
  4. Wrong bias handling in the frexp-based approach

Fix

Standard BF16 is simply the top 16 bits of an IEEE 754 f32:

[S:1][E:8][M:7] = bits 31..16 of f32
  • f32ToBf16(a) = @bitCast(a) >> 16
  • bf16ToF32(x) = @bitCast(x << 16)

This matches every major BF16 implementation (PyTorch, TensorFlow, MLX).

Impact

This will change bf16 benchmark results — the broken codec artificially made bf16 look like GF16 on narrow ranges. With the fix, bf16 will correctly show its full IEEE-754 dynamic range (±127 exponent), making the GF16 vs bf16 comparison honest.

Tests Added

  • BF16: roundtrip 1.0 — verifies exact encoding (0x3F80)
  • BF16: roundtrip 100.0 — verifies dynamic range
  • BF16: roundtrip 1e10 — verifies large values work (was broken before)
  • BF16: roundtrip small values — 8 values from ±0.5 to ±3.14 to ±1e-10
  • BF16: special values — inf, NaN, ±0 preservation

Closes #22

The test file does #include <goldenfloat/gf16.hpp> but the cpp/include
directory was not in the include path, causing a build failure.

Closes #36
- Fix root Cargo.toml malformed cfg: cfg(unix(all(not(...)))) -> cfg(unix)
- Add #include "gf16.h" to Go cgo preamble (was missing, all C names unresolved)
- Add zig-out/bin to Python library search paths for CI compatibility
Closes #28 — replace goto-bus/setup-zig with mlugg/setup-zig in both
test-bindings.yml and release.yml.

Closes #35 — add #![allow(non_camel_case_types)] to goldenfloat-sys
lib.rs for FFI-style type names.

Closes #38 — GF16.fromF32 now maps NaN to exp=0x3F,mant=1 (not inf).
GF16.toF32 returns std.math.nan when exp=0x3F and mant!=0.

Closes #36 — add cpp/include to CMakeLists include_directories so
<goldenfloat/gf16.hpp> resolves.

Closes #37 — remove import "C" from gf16_test.go (Go forbids cgo
in test files). Rewrite tests as exported Test* functions using the
Go wrapper API from gf16.go.
Previous implementation had multiple bugs:
- Exponent clamped to ±7 instead of IEEE-754 ±127 (7-bit range vs 8-bit)
- Wrong mantissa width (8 bits instead of 7)
- Wrong bit layout for decode (7-bit exp + 8-bit mantissa instead of 8+7)
- Wrong bias handling in frexp path

Standard BF16 is simply the top 16 bits of an IEEE 754 f32:
  [S:1][E:8][M:7] = bits 31..16 of f32

This replaces 54 lines of broken frexp-based code with 2 lines of
correct bit manipulation, matching every major BF16 implementation
(PyTorch, TensorFlow, MLX, etc.).

Added 5 new BF16 tests covering: 1.0, 100.0, 1e10, small values,
and special values (inf, NaN, ±0).

Closes #22
@gHashTag gHashTag force-pushed the fix/issue-22-bf16-encoder branch from accc088 to 733470f Compare April 29, 2026 18:05
@gHashTag
Copy link
Copy Markdown
Owner Author

Replaced by #52 (clean rebase on main)

@gHashTag gHashTag closed this Apr 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG-001-BF16 · f32ToBf16 clamps exponent to ±7 instead of IEEE-754 ±127

1 participant