You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Half float->half: IEEE round-to-nearest-even on every backend (was truncating) - 4.14.0-local.3
Found by validating against numpy.float16 / ml_dtypes.bfloat16: ILGPU.Half
float->half TRUNCATED toward zero (managed von der Zijp table) and the WebGPU/
WebGL/Wasm emitters truncated AND flushed all subnormals to signed zero - diverging
from numpy/PyTorch in ~half of all values AND from ILGPU's own CUDA/OpenCL (which
were already round-to-nearest). So a Half model gave different results on WebGPU vs
CUDA, and every non-exact conversion lost up to 1/2 ULP.
Fix: replaced the managed conversion (HalfConversion.tt) with a direct RNE bit-manip
(rebias + RNE mantissa rounding + proper subnormal rounding + overflow->Inf, mirrors
the bf16/FP8 conversions) and rewrote WGSL/GLSL _f32_to_f16 + the Wasm EmitF32ToF16
inline bytecode to match. CUDA (cvt.rn) + OpenCL (vstore_half) unchanged - already
correct. Corrected the false "lossless / flush-to-zero" f16 doc claims.
bf16 was already bit-exact to ml_dtypes.bfloat16 (verified, no change).
Validation: new DemoConsole -- bf16-f16-oracle (all 65536 patterns, decode +
round-trip + RNE/subnormal/overflow probes): Half now decode 65536/65536, round-trip
65536/65536, probes 64060/64060 (was ~32294); bf16 perfect. New PMT
Half_FloatToHalf_RoundToNearestEven 9/0 all backend lanes; PMT_FILTER=Half 204/0/8
(no regression). Forks 2.0.30.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: CHANGELOG.md
+7Lines changed: 7 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,6 +2,13 @@
2
2
3
3
This file tracks notable changes per release. The README's "Recent Highlights" section links here for the full version history.
4
4
5
+
## 4.14.0-local.3 (2026-06-17) - Half float→half is now IEEE round-to-nearest-even on every backend (was truncating)
6
+
7
+
Fixes a real conversion-correctness + cross-backend-consistency bug in the most-used low-precision type, found by validating against the authoritative references (numpy.float16 / ml_dtypes.bfloat16). Forks bump to `2.0.30`.
8
+
9
+
- **`ILGPU.Half` `float→half` now uses IEEE round-to-nearest-even (incl. proper subnormal rounding + overflow→Inf) on CPU + WebGPU + WebGL + Wasm** - bit-exact to `numpy.float16` / PyTorch / CUDA (`cvt.rn.f16.f32`) / OpenCL (`vstore_half`). **Before:** the managed conversion used the von der Zijp TABLE method which **truncates toward zero** (`HalfConversion.tt`: shift with no round bit), and the WebGPU/WebGL/Wasm emitters **truncated AND flushed every subnormal to signed zero**. That diverged from numpy/PyTorch in ~half of all values (every non-exact conversion lost up to ½ ULP) AND from ILGPU's own CUDA/OpenCL backends (which were already round-to-nearest) - so a Half model produced different results on WebGPU vs CUDA. Replaced the managed conversion with a direct RNE bit-manip (mirrors the bf16/FP8 conversions) and rewrote the WGSL/GLSL `_f32_to_f16` + the Wasm `EmitF32ToF16` inline bytecode to match. CUDA/OpenCL unchanged (already correct). The "f16 emulation is lossless / matches numpy byte-for-byte" doc claims (which were false for encode) are corrected.
10
+
-**Validated exhaustively:** new `DemoConsole -- bf16-f16-oracle` checks managed BFloat16 + Half vs `ml_dtypes.bfloat16` / `numpy.float16` over **all 65536 patterns** (decode + round-trip identity) + RNE/overflow/subnormal probes. `BFloat16`: bit-exact (decode 65536/65536, round-trip 65536/65536, probes 67503/67503) - was already correct. `Half`: now decode 65536/65536, round-trip 65536/65536, probes 64060/64060 (was ~32294/64060 - the subnormal region + RNE midpoints). Cross-backend gate: new PMT `Half_FloatToHalf_RoundToNearestEven` (kernel `(Half)x` over subnormals/midpoints/overflow/specials, bit-exact vs the managed=numpy reference) **9/0 all backend lanes**; existing Half suite `PMT_FILTER=Half`**204/0/8** (no regression).
11
+
5
12
## 4.14.0-local.2 (2026-06-17) - Float8E4M3 is now bit-exact to float8_e4m3fn (overflow → NaN), saturating opt-in
6
13
7
14
`Float8E4M3` float→fp8 conversion changed from saturating to the `fn` (`float8_e4m3fn`) convention as the DEFAULT, matching the dtype it is named after. Forks bump to `2.0.29`.
Copy file name to clipboardExpand all lines: SpawnDev.ILGPU/SpawnDev.ILGPU.csproj
+4-4Lines changed: 4 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -4,9 +4,9 @@
4
4
<TargetFramework>net10.0</TargetFramework>
5
5
<ImplicitUsings>enable</ImplicitUsings>
6
6
<Nullable>enable</Nullable>
7
-
<Version>4.14.0-local.2</Version>
7
+
<Version>4.14.0-local.3</Version>
8
8
<!-- Brief current-version highlights only. Full per-version history with code samples lives in CHANGELOG.md (linked from the README). -->
9
-
<PackageReleaseNotes>4.14.0 makes Float8E4M3 bit-exact to PyTorch/JAX/ml_dtypes float8_e4m3fn (the dtype it is named after): the cast operator and the IR-level convert now use the fn convention - finite overflow AND +-Inf map to NaN (was: saturate to +-448) - verified against the ml_dtypes oracle and on all 6 backends. The saturating (NVIDIA TE / OCP) cast is available opt-in via Float8E4M3.FromSingleSaturating / FromSingle(x, saturate: true). 4.13.2 is a packaging fix over 4.13.1: removes stray Wasm/repro JSON files that the Razor SDK swept into the package, and bundles the precompiled-shaders precompiler tool (tools/) that 4.13.0/4.13.1 were missing. The 4.13.x line brings full low-precision floating-point support across ALL 6 backends (CPU, OpenCL, WebGPU, WebGL, Wasm, CUDA): Half, BFloat16, and FP8 (Float8E4M3 + Float8E5M2) - including FP8 radix-sort keys (4.13.1) - plus generic INumber<T> mixed-precision kernels, PrecisionConvert, and bf16/FP8 portability to pre-Ampere CUDA cards (GTX 1080 / RTX 2060). Full per-version history with code samples: CHANGELOG.md at https://github.com/LostBeard/SpawnDev.ILGPU/blob/master/CHANGELOG.md</PackageReleaseNotes>
9
+
<PackageReleaseNotes>4.14.0 fixes low-precision float CONVERSION CORRECTNESS against the references. (1) Half (float->half) is now IEEE round-to-nearest-even on every backend (CPU + WebGPU + WebGL + Wasm), bit-exact to numpy.float16 / PyTorch / CUDA / OpenCL - it previously truncated toward zero and flushed subnormals to zero (diverging from numpy AND from CUDA/OpenCL). (2) Float8E4M3 is now bit-exact to PyTorch/JAX/ml_dtypes float8_e4m3fn: the cast + IR convert use the fn convention (finite overflow AND +-Inf -> NaN; was saturate to +-448); saturating is opt-in via Float8E4M3.FromSingleSaturating. BFloat16 was already bit-exact to ml_dtypes.bfloat16 (verified). All validated exhaustively against ml_dtypes/numpy oracles + cross-backend PMT gates. 4.13.2 is a packaging fix over 4.13.1: removes stray Wasm/repro JSON files that the Razor SDK swept into the package, and bundles the precompiled-shaders precompiler tool (tools/) that 4.13.0/4.13.1 were missing. The 4.13.x line brings full low-precision floating-point support across ALL 6 backends (CPU, OpenCL, WebGPU, WebGL, Wasm, CUDA): Half, BFloat16, and FP8 (Float8E4M3 + Float8E5M2) - including FP8 radix-sort keys (4.13.1) - plus generic INumber<T> mixed-precision kernels, PrecisionConvert, and bf16/FP8 portability to pre-Ampere CUDA cards (GTX 1080 / RTX 2060). Full per-version history with code samples: CHANGELOG.md at https://github.com/LostBeard/SpawnDev.ILGPU/blob/master/CHANGELOG.md</PackageReleaseNotes>
0 commit comments