Skip to content

Commit 25fbbd9

Browse files
LostBeardclaude
andcommitted
Promote 4.13.1-local.1 -> 4.13.1 stable (FP8 radix keys on all 6 backends)
Full PMT sweep green: 3613 pass / 0 fail / 224 skip (browser lanes confirmed genuinely executed; Fp8Radix 24/24 Success on browser backends). Forks stay 2.0.27. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
1 parent e815bb7 commit 25fbbd9

2 files changed

Lines changed: 3 additions & 3 deletions

File tree

CHANGELOG.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,9 @@
22

33
This file tracks notable changes per release. The README's "Recent Highlights" section links here for the full version history.
44

5-
## 4.13.1 (unreleased) - FP8 radix-sort keys on all 6 backends
5+
## 4.13.1 (2026-06-16) - FP8 radix-sort keys on all 6 backends
66

7-
### local.1 - FP8 (Float8E4M3 / Float8E5M2) radix-sort keys
7+
### FP8 (Float8E4M3 / Float8E5M2) radix-sort keys
88

99
- **FP8 arrays can now be radix-sorted on all 6 backends** (keys-only + key/value pairs, ascending + descending) - closing the tracked 4.13.0 follow-up. Added: `Interop.FloatAsInt(Float8E4M3)` / `(Float8E5M2)` (the raw 8-bit pattern, like the `Half`/`BFloat16` twins); the IR `FloatAsIntCast` lowering for FP8 across all backends (constant-fold + `Int8` result sizing in `IR/Construction/Cast.cs`; per-backend codegen on PTX `EmitF32ToFP8Bits`, OpenCL `_f32_to_e4m3_bits`, WGSL/GLSL `_f32_to_e4m3`, Wasm `EmitF32ToFP8`); and `Ascending`/`DescendingFloat8E4M3`/`E5M2` radix operations (the sign-flip + ones-complement float key transform at 8-bit width - both E4M3 and E5M2 are magnitude-monotonic, exponent above mantissa). On WebGL FP8 keys sort via the unpacked-f32 working representation (same as Half/bf16, since the whole-texel scatter can't move a sub-word value); on the other 5 backends as native 1-byte keys.
1010
- **WebGPU packed-sub-word fix (the hard part).** `Float8E4M3`/`Float8E5M2` are their OWN `BasicValueType` (NOT `Int8`), so they were silently skipped by every `case Int8/Int16/BFloat16` switch in the WGSL codegen and fell to a default that maps FP8 -> `f32`. For a packed FP8 key buffer this meant: the binding was declared `array<f32>` instead of `array<atomic<u32>>`, and the kernel read each key via a raw whole-word deref instead of a 4-per-word byte extract + `_e4m3_to_f32` - so the radix sort read garbage and corrupted the result (WebGPU only; the 5 other backends were correct). Fixed by adding FP8 to all four WGSL sub-word classification switches (body-struct binding-type, body-struct LEA, direct-param LEA, direct-param coalesce) so FP8 is declared packed `array<atomic<u32>>` and extracted+converted at load/store - exactly the path bf16 (2-per-word) already used. Localized with the Dawn `dump_shaders` Tint-output dump (`PMT_DAWN_DUMP=1`), not by staring at the WGSL.

SpawnDev.ILGPU/SpawnDev.ILGPU.csproj

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
<TargetFramework>net10.0</TargetFramework>
55
<ImplicitUsings>enable</ImplicitUsings>
66
<Nullable>enable</Nullable>
7-
<Version>4.13.1-local.1</Version>
7+
<Version>4.13.1</Version>
88
<!-- Brief current-version highlights only. Full per-version history with code samples lives in CHANGELOG.md (linked from the README). -->
99
<PackageReleaseNotes>4.13.0 brings full low-precision floating-point support across ALL 6 backends (CPU, OpenCL, WebGPU, WebGL, Wasm, CUDA): Half, BFloat16, and now FP8 (Float8E4M3 + Float8E5M2), plus generic INumber&lt;T&gt; mixed-precision kernels and PrecisionConvert for transpilable generic float&lt;-&gt;T conversion inside a kernel. This release also fixes bf16 on PRE-AMPERE CUDA cards (GTX 1080 / RTX 2060 etc.): the PTX bf16 path used sm_80+ cvt instructions and failed to compile on older cards; it now uses portable bit-manipulation that works on every CUDA architecture (FP8 likewise). Full per-version history with code samples: CHANGELOG.md at https://github.com/LostBeard/SpawnDev.ILGPU/blob/master/CHANGELOG.md</PackageReleaseNotes>
1010
<GeneratePackageOnBuild>True</GeneratePackageOnBuild>

0 commit comments

Comments
 (0)