Reword Phase 9 docs to match Phase 14 capability-gate principle

danmcleran · claude · danmcleran · commit 1ce55ee54d60 · 2026-05-11T13:41:48.000-06:00
Two doc nits flagged after the Phase 14 audit. No code path implication.

cpp/qbridge.hpp: replace "A target with FPU (M4F, M7, R82, A55, x86)"
with capability-based phrasing. R82 and A55 ship FPU as optional silicon
per Arm's published RTL configurations; the bridges' FPU requirement is
a capability, not a CPU model.

QUANTIZATION.md Phase 9: replace "__fp16 (ARMv8.2) / _Float16 (gcc) /
bf16 storage typedefs" with the shipped reality (software-only fp16_t /
bf16_t wrapping uint16_t via __builtin_memcpy). Notes the matching SIMD
vector specialization lives in Phase 14 simd_neon_fp16.hpp.

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/QUANTIZATION.md b/QUANTIZATION.md
@@ -148,7 +148,7 @@ Each phase = self-contained PR. Builds on prior. Ships with regression corner +
   - `qValueToAffine<QV>(QValue, AffineParams)` → `int8_t`
   - `affineToFloat` / `floatToAffine` promoted from calibration helpers to runtime (freestanding-safe when `FLOAT=1`)
 - New gate `TINYMIND_ENABLE_FP16` in `cpp/include/tinymind_platform.hpp`.
-- `cpp/include/tinymind_fp16.hpp` — `__fp16` (ARMv8.2) / `_Float16` (gcc) / `bf16` storage typedefs + scalar promote-to-float ops. Pure storage tier, no SIMD yet.
+- `cpp/include/tinymind_fp16.hpp` — software-only `fp16_t` (IEEE 754 binary16) and `bf16_t` (bfloat16) storage structs wrapping `uint16_t`, plus scalar promote-to-float ops via `__builtin_memcpy`. No compiler-builtin `__fp16` / `_Float16` dependency — keeps the header capability-gate-clean per the Phase 14 design rule, and the storage tier compiles on any toolchain regardless of ISA. Hosts that natively support `_Float16` / `__fp16` may add a thin adapter without disturbing this header. Pure storage tier; the matching SIMD vector specialization lands in Phase 14 as `cpp/include/simd/simd_neon_fp16.hpp`.
 - `unit_test/embedded/Makefile` — expand to 4-way `(FLOAT, STD, QUANT, FP16)` matrix. Add `fp16_hosted` corner.
 
 **Tests:** `unit_test/quantization/test_qbridge.cpp` — round-trip Q8.8 ↔ int8 ↔ Q8.8 within tolerance. fp16 ↔ fp32 round-trip.
diff --git a/cpp/qbridge.hpp b/cpp/qbridge.hpp
@@ -41,9 +41,11 @@
  * All conversions run scalar at the layer boundary; the inner loops of
  * each pipeline stay native to their own type system. The conversions
  * themselves use float arithmetic, so this whole file is gated on
- * TINYMIND_ENABLE_FLOAT. A target with FPU (M4F, M7, R82, A55, x86) can
- * use the bridges at runtime; a pure-integer M0+ build keeps both
- * pipelines siloed by simply not including this header.
+ * TINYMIND_ENABLE_FLOAT. Any target whose silicon ships an FPU (capability,
+ * not CPU model — Arm publishes thousands of RTL configurations per core
+ * and FPU is often an optional component) can use the bridges at runtime;
+ * a pure-integer build keeps both pipelines siloed by simply not
+ * including this header.
  *
  * No <cmath> dependency: rounding uses sign-aware float-to-int casting.
  * No <type_traits>. Freestanding-safe at FLOAT=1, STD=0.