Stage 4.13.0 stable: version flip 4.13.0-local.10 -> 4.13.0 + CHANGELOG release date

LostBeard · claude · LostBeard · commit bb2c8d538973 · 2026-06-16T20:10:16.000-04:00
Promotes the wrapper to the stable 4.13.0 (forks stay at the clean 2.0.26 = the PackageReference, so
the four-package version-sync is satisfied). CHANGELOG header dated 2026-06-16.

4.13.0 = low-precision floats on all 6 backends (Half + BFloat16 + FP8 Float8E4M3/E5M2) + generic
INumber&lt;T&gt; mixed-precision kernels + PrecisionConvert + bf16/FP8 portability to pre-Ampere CUDA.

Release gate: full PMT sweep 3569 pass / 1 transient (OpenCL RadixSort2M GPU-contention flake, passes
9/9 isolated, not a regression) / 224 skip; FP8 PrecisionConvert round-trip + relu 257/257; BFloat16
107/0 incl CUDA; AcceleratorRequirements 19/0/1. Consumed by ML (Tuvok bumped to local.10, GGUFDecode
KVCache 8/0, bf16 KV path validated on the pre-Ampere fix).

Source staged to master FIRST per the nuget.org hard-gate; the nuget.org push (3 packages: forks 2.0.26
+ SpawnDev.ILGPU 4.13.0) awaits Captain per-push sign-off.

Co-Authored-By: Claude Opus 4.8 &lt;noreply@anthropic.com&gt;
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -2,7 +2,9 @@
 
 This file tracks notable changes per release. The README's "Recent Highlights" section links here for the full version history.
 
-## 4.13.0 (unreleased) - BFloat16 (bfloat16) Phases 0-3b: core type (CPU) + WebGPU + WebGL + Wasm + OpenCL + CUDA codegen (all 6 backends)
+## 4.13.0 (2026-06-16) - Low-precision floats on all 6 backends: BFloat16 + FP8 (Float8E4M3 / Float8E5M2), generic INumber<T> mixed-precision kernels, PrecisionConvert, and bf16/FP8 portability to pre-Ampere CUDA cards
+
+> 4.13.0 was developed across the local.5 -> local.10 series; the dated headline above is the stable cut. Per-milestone detail follows.
 
 ### local.10 - FP8 complete on ALL 6 backends + bf16 pre-Ampere CUDA fix
 
diff --git a/SpawnDev.ILGPU/SpawnDev.ILGPU.csproj b/SpawnDev.ILGPU/SpawnDev.ILGPU.csproj
@@ -4,7 +4,7 @@
 		<TargetFramework>net10.0</TargetFramework>
 		<ImplicitUsings>enable</ImplicitUsings>
 		<Nullable>enable</Nullable>
-		<Version>4.13.0-local.10</Version>
+		<Version>4.13.0</Version>
 		<!-- Brief current-version highlights only. Full per-version history with code samples lives in CHANGELOG.md (linked from the README). -->
 		<PackageReleaseNotes>4.13.0 brings full low-precision floating-point support across ALL 6 backends (CPU, OpenCL, WebGPU, WebGL, Wasm, CUDA): Half, BFloat16, and now FP8 (Float8E4M3 + Float8E5M2), plus generic INumber&lt;T&gt; mixed-precision kernels and PrecisionConvert for transpilable generic float&lt;-&gt;T conversion inside a kernel. This release also fixes bf16 on PRE-AMPERE CUDA cards (GTX 1080 / RTX 2060 etc.): the PTX bf16 path used sm_80+ cvt instructions and failed to compile on older cards; it now uses portable bit-manipulation that works on every CUDA architecture (FP8 likewise). Full per-version history with code samples: CHANGELOG.md at https://github.com/LostBeard/SpawnDev.ILGPU/blob/master/CHANGELOG.md</PackageReleaseNotes>
 		<GeneratePackageOnBuild>True</GeneratePackageOnBuild>