Skip to content

Commit 4bf6fb5

Browse files
LostBeardclaude
andcommitted
Wasm: dispose per-dispatch MessageEvent/Event in worker-response handlers (4.12.1-local.10)
WasmAccelerator.EnsurePersistentHandlers installs persistent per-worker OnMessage/OnError handlers. Each worker response delivers a MessageEvent (and each error an Event) JSObject that the handler OWNS - SpawnDev.BlazorJS does not auto-dispose an ActionEvent handler's argument (ActionCallback<T1>.Invoke calls the delegate and never disposes the arg; confirmed by the library author). The handlers never disposed msg/err, so every (dispatch x worker) response left a MessageEvent reclaimable only by the finalizer (disposal-breakdown over a TurboQuant lane: MessageEvent created=9971, proper=0, finalizer=9969). Between GCs this transient pile-up spikes the main-thread V8 heap during a heavy dispatch storm - the likely trigger of the ML late-lane heavy-test timeouts. Fix: `using` the MessageEvent/Event arg in both handlers so each disposes deterministically on every path (including the stray-message early return). WasmDispatchResponse is a plain DTO, so it stays valid after msg is disposed. Also: corrected the WasmMemoryBuffer header comment (data buffers are staged through the shared linear memory on the main thread, NOT zero-copy shared to workers - the inaccurate line had seeded a worker-pinned-SAB hypothesis). Guard: WasmTests.Wasm_DispatchResponse_DoesNotLeakMessageEvent (alive-MessageEvent count via BlazorJS IDisposableTracker, kept off the tracker's Console/verbose paths). NOTE: a separate, slower persistent retained-object climb (non-MessageEvent, main-thread V8) remains under investigation via a CDP bytes-by-type + retainers heap snapshot - tracked, distinct from this transient fix. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
1 parent 3784cfb commit 4bf6fb5

5 files changed

Lines changed: 96 additions & 5 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ Wrapper-only (forks stay **2.0.16**). Adds a new selection-gate capability flag:
1313
- **Wasm SIMD128 emitter foundation (Phase 1 of the SIMD port).** Additive groundwork only - no production kernel emits v128 yet, so the scalar path is byte-identical. Adds the v128 value type and the 0xFD-prefixed SIMD opcode set to `WasmOpCodes` (spec-verified; sub-opcodes are u32-LEB128 after the prefix, so multi-byte ones like `f32x4.add`=228 encode correctly), v128 emit helpers in `WasmModuleBuilder` (`EmitSimd`/`EmitSimdMem`/`EmitSimdLane`/`EmitV128Const`/`EmitI8x16Shuffle`), and the runtime SIMD capability surface: `WasmBackend.RuntimeSupportsWasmSimd` (via `System.Runtime.Intrinsics.Wasm.PackedSimd.IsSupported` - if the running Blazor WASM build has SIMD enabled, the browser/workers accept v128), `ForceScalar`/`ForceSimd` test overrides, `EffectiveWasmSimd`, `WasmCapabilityContext.WasmSimd`, and `WasmAccelerator.SupportsSimd`. **Non-SIMD devices stay first-class forever** (the scalar path is a supported mode, not a deprecated fallback - real hardware/browsers without wasm SIMD are common; see the dual-build technique in `BlazorWASMSIMDDetectExample`). Verified by the offline `DemoConsole -- wasm-simd-probe`: a hand-built v128 module is `wasm-validate`-clean and `wasm2wat`-decodes to the intended instructions.
1414
- **Wasm: bound the persistent-worker module cache (late-lane memory-pressure fix).** The process-persistent worker pool keeps every distinct kernel's compiled `WebAssembly.Module` in a per-worker cache (`_modulesById`) for the tab's life. Across a long test lane each per-test accelerator's kernels get fresh ids, so the cache accumulated unbounded (measured 2 -> 1057 across a ~570-test lane) until late, heavy tests hit process-memory pressure and timed out (the committed shared linear memory was flat/small - the module cache was the driver). Fix: when cumulative kernels compiled since the last flush cross `WasmBackend.ModuleCacheFlushThreshold` (default 256; 0 disables), the host instructs the workers to drop their module/instance caches at the next fresh accelerator's FIRST dispatch (safe - that accelerator re-sends its own kernels; the cleared modules are disposed accelerators' dead weight). Bounds peak modules to ~the threshold. Short workloads never reach it -> never flush -> kernels stay fully warm. Diagnostics `WasmAccelerator.TotalKernelsCompiled` / `SharedWasmMemoryPages`; guard `WasmTests.Wasm_ModuleCacheFlush_DoesNotBreakCorrectness` (flushes every accelerator, asserts CPU-oracle).
1515
- **Wasm: fixed a host-write SNAPSHOT SharedArrayBuffer leak (the real ML-lane heavy-test memory leak).** `WasmMemoryBuffer.PrepareHostWrite` allocates a full-buffer-size SharedArrayBuffer when a host write lands while a dispatch is in flight on that buffer (the lazy copy-out race defense). `CompleteDispatchIntent` removed the snapshot from its tracking dict but **never `Dispose()`d the SharedArrayBuffer** (despite its own doc claiming "that tier's SAB is freed"), and the all-intents-complete path dropped the dict without disposing either - so every materialized snapshot leaked a full-buffer-size JS SharedArrayBuffer. Under a long heavy-workload lane (ML's CopyFromCPU+dispatch pattern) this accumulated to ~1.5 GiB of JS heap, slowing late tests into timeouts (root-caused via a resident-memory trace: heap 154->1644 MiB; worker pool flat, linear memory flat, module cache flat by magnitude). Fix: dispose the snapshot SAB on release + on buffer dispose (`DisposeAllSnapshots`). New diagnostic `WasmMemoryBuffer.LiveSnapshotBytes`; guard `WasmTests.Wasm_HostWriteSnapshot_DoesNotLeakSAB` (deterministically materializes snapshots, asserts the resident bytes return to baseline). Also adds resident-count diagnostics `WasmMemoryBuffer.LiveBufferCount`/`LiveBufferBytes` + `WasmAccelerator.LiveAcceleratorCount`.
16+
- **Wasm: dispatch-response handlers now dispose the per-dispatch `MessageEvent`/`Event` JSObject.** `WasmAccelerator.EnsurePersistentHandlers` installs persistent per-worker `OnMessage`/`OnError` handlers; each worker response delivers a `MessageEvent` (and each error an `Event`) JSObject that the handler **owns** - SpawnDev.BlazorJS does not auto-dispose an `ActionEvent` handler's argument (`ActionCallback<T1>.Invoke` calls the delegate and never disposes the arg; confirmed by the library author). The handlers never disposed `msg`/`err`, so every (dispatch x worker) response created a `MessageEvent` that was reclaimed only by the finalizer (disposal-breakdown over a TurboQuant lane: `MessageEvent created=9971, proper=0, finalizer=9969`). Between GCs this transient pile-up spikes the main-thread V8 heap during a heavy dispatch storm - the likely trigger of the late-lane heavy-test timeouts. Fix: `using` the `MessageEvent`/`Event` arg in both handlers so each disposes deterministically on every path (including the stray-message early return). Guard `WasmTests.Wasm_DispatchResponse_DoesNotLeakMessageEvent` (alive-`MessageEvent` count via BlazorJS `IDisposableTracker`, kept off the tracker's verbose/Console paths). NOTE: a separate, slower persistent retained-object climb (non-`MessageEvent`) remains under investigation via a CDP bytes-by-type + retainers heap snapshot - tracked, distinct from this transient fix.
1617

1718
## 4.12.0 (2026-06-13) - Sync/async contract: async-only where it waits/observes, sync for fire-and-forget
1819

SpawnDev.ILGPU.Demo/UnitTests/WasmTests.cs

Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1884,6 +1884,80 @@ public async Task Wasm_HostWriteSnapshot_DoesNotLeakSAB()
18841884
finally { accelerator.Dispose(); context.Dispose(); }
18851885
}
18861886

1887+
// Wasm per-dispatch MessageEvent leak guard (2026-06-15, Geordi). EnsurePersistentHandlers installs
1888+
// persistent OnMessage/OnError handlers on each worker; every worker response delivers a MessageEvent
1889+
// JSObject that the handler OWNS — SpawnDev.BlazorJS ActionCallback<T1>.Invoke calls the delegate and
1890+
// does NOT dispose the arg (verified ActionCallback.cs:59-63). Before the fix the handler never disposed
1891+
// msg/err, so every (dispatch x worker) response pinned a MessageEvent (+ its .data graph) in the V8 JS
1892+
// heap until finalization — which JS-heap growth alone never triggers under a long Wasm lane → the
1893+
// ~1.6 GiB ML-lane late-test memory-pressure leak (Tuvok CDP). Fix = `using` on msg+err so each disposes
1894+
// in-handler on every path (incl. the stray-message early return).
1895+
//
1896+
// This guard uses BlazorJS IDisposableTracker to count ALIVE MessageEvent JSObjects after N dispatches.
1897+
// It enables ONLY UndisposedHandleVerboseMode (NOT CreatedHandleVerboseMode) so the tracker's
1898+
// Console.WriteLine paths — which trip #blazor-error-ui and would false-FAIL the run — never fire: the
1899+
// created-notice (line 95) is gated on CreatedHandleVerboseMode (kept off), and the finalizer-warning
1900+
// (line 37) only fires for TRACKED objects disposed via finalizer; with the fix every MessageEvent is
1901+
// DisposedProper in-handler, and objects created while the flag was off carry a null tracker so their
1902+
// disposal short-circuits before the Console path. We never force GC inside the measured window.
1903+
// With the fix: alive MessageEvents ≈ 0. Without it: ≈ dispatches * workerCount (hundreds).
1904+
[TestMethod(Timeout = 120000)]
1905+
public async Task Wasm_DispatchResponse_DoesNotLeakMessageEvent()
1906+
{
1907+
const int count = 4096;
1908+
const int dispatches = 40;
1909+
bool savedUndisposed = SpawnDev.BlazorJS.IDisposableTracker.UndisposedHandleVerboseMode;
1910+
bool savedCreated = SpawnDev.BlazorJS.IDisposableTracker.CreatedHandleVerboseMode;
1911+
var context = Context.Create().Wasm().ToContext();
1912+
var accelerator = await context.CreateWasmAcceleratorAsync();
1913+
try
1914+
{
1915+
// Warm-up dispatch (tracking OFF): installs the persistent handlers + compiles the worker module
1916+
// so their one-time JSObjects are not in the measured window.
1917+
using (var warm = accelerator.Allocate1D<int>(count))
1918+
{
1919+
var wk = accelerator.LoadAutoGroupedStreamKernel<Index1D, ArrayView<int>>((i, v) => v[i] = i);
1920+
wk((Index1D)count, warm.View);
1921+
await accelerator.SynchronizeAsync();
1922+
}
1923+
1924+
// Enable tracking with the Console-safe flag only, then clear for a clean baseline.
1925+
SpawnDev.BlazorJS.IDisposableTracker.CreatedHandleVerboseMode = false;
1926+
SpawnDev.BlazorJS.IDisposableTracker.UndisposedHandleVerboseMode = true;
1927+
SpawnDev.BlazorJS.IDisposableTracker.JSObjectTraces.Clear();
1928+
1929+
var k = accelerator.LoadAutoGroupedStreamKernel<Index1D, ArrayView<int>>((i, v) => v[i] = i * 3);
1930+
for (int r = 0; r < dispatches; r++)
1931+
{
1932+
using var buf = accelerator.Allocate1D<int>(count);
1933+
k((Index1D)count, buf.View);
1934+
await accelerator.SynchronizeAsync();
1935+
}
1936+
// Let any inline TCS continuation unwind so the final handler lambda exits and its `using` disposes.
1937+
await Task.Yield();
1938+
1939+
long aliveMsgEvents = 0;
1940+
foreach (var t in SpawnDev.BlazorJS.IDisposableTracker.JSObjectTraces.Values)
1941+
if (t.Type != null && t.Type.Contains("MessageEvent"))
1942+
aliveMsgEvents += t.AliveCount;
1943+
1944+
// Fix → ~0 (at most a straggler); bug → dispatches*workerCount (>=160). Bound cleanly separates.
1945+
const long bound = 8;
1946+
if (aliveMsgEvents > bound)
1947+
throw new Exception(
1948+
$"Per-dispatch MessageEvent JSObjects leaked: {aliveMsgEvents} alive after {dispatches} dispatches " +
1949+
$"(bound {bound}). WasmAccelerator.EnsurePersistentHandlers must Dispose the MessageEvent/Event " +
1950+
$"arg in MsgHandler/ErrHandler on every path (the ML-lane ~1.6 GiB V8-heap leak has regressed).");
1951+
}
1952+
finally
1953+
{
1954+
SpawnDev.BlazorJS.IDisposableTracker.UndisposedHandleVerboseMode = savedUndisposed;
1955+
SpawnDev.BlazorJS.IDisposableTracker.CreatedHandleVerboseMode = savedCreated;
1956+
SpawnDev.BlazorJS.IDisposableTracker.JSObjectTraces.Clear();
1957+
accelerator.Dispose(); context.Dispose();
1958+
}
1959+
}
1960+
18871961
// Wasm SIMD128 emitter foundation (Phase 1, 2026-06-14, Geordi). Pure-CPU regression guard on
18881962
// the v128 encoding — NO browser/GPU needed, just byte assertions. Locks the part most likely
18891963
// to silently break: SIMD sub-opcodes are u32-LEB128 after the 0xFD prefix (NOT single bytes

SpawnDev.ILGPU/SpawnDev.ILGPU.csproj

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,9 @@
44
<TargetFramework>net10.0</TargetFramework>
55
<ImplicitUsings>enable</ImplicitUsings>
66
<Nullable>enable</Nullable>
7-
<Version>4.12.1-local.9</Version>
7+
<Version>4.12.1-local.10</Version>
88
<!-- Brief current-version highlights only. Full per-version history with code samples lives in CHANGELOG.md (linked from the README). -->
9-
<PackageReleaseNotes>4.12.1: WebGPU cooperative GEMV grid-stride fix; ±inf/NaN scalar kernel params on WebGL+Wasm; AcceleratorRequirements.RequiresScatterStores flag; Wasm process-persistent shared Web Worker pool AND shared linear memory keyed per MaxLinearMemoryPages (default-WorkerCount accelerators share one pool + one WebAssembly.Memory per distinct max per tab, fixing worker-churn starvation and the WebAssembly.Memory-reservation accumulation across long test lanes — at both the default 1 GiB and custom maxes like 2 GiB); Wasm SIMD128 emitter foundation (additive groundwork, scalar path unchanged). Forks stay 2.0.16. Full per-version history with details: CHANGELOG.md at https://github.com/LostBeard/SpawnDev.ILGPU/blob/master/CHANGELOG.md</PackageReleaseNotes>
9+
<PackageReleaseNotes>4.12.1: Wasm process-persistent shared worker pool + shared linear memory (per MaxLinearMemoryPages) — fixes worker-churn starvation and WebAssembly.Memory accumulation across long test lanes; Wasm dispatch-response handlers now dispose the per-dispatch MessageEvent/Event JSObject (removes per-dispatch JS-object churn); WebGPU GEMV grid-stride fix; ±inf/NaN scalar kernel params on WebGL+Wasm; AcceleratorRequirements.RequiresScatterStores; Wasm SIMD128 emitter foundation (additive, scalar path unchanged). Forks stay 2.0.16. Full per-version history: CHANGELOG.md at https://github.com/LostBeard/SpawnDev.ILGPU/blob/master/CHANGELOG.md</PackageReleaseNotes>
1010
<GeneratePackageOnBuild>True</GeneratePackageOnBuild>
1111
<GenerateDocumentationFile>true</GenerateDocumentationFile>
1212
<EmbedAllSources>true</EmbedAllSources>

SpawnDev.ILGPU/Wasm/WasmAccelerator.cs

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1869,6 +1869,15 @@ private void EnsurePersistentHandlers(Worker worker)
18691869

18701870
state.MsgHandler = new Action<MessageEvent>((msg) =>
18711871
{
1872+
// The MessageEvent JSObject is created per worker-response by the SpawnDev.BlazorJS
1873+
// callback marshaller and is OWNED by this handler: ActionCallback<T1>.Invoke calls
1874+
// the delegate and does NOT dispose the arg (verified in SpawnDev.BlazorJS/ActionCallback.cs).
1875+
// Without disposing it, every (dispatch x worker) response pins a MessageEvent (and its
1876+
// .data graph) in the V8 JS heap until finalization - which JS-heap growth alone never
1877+
// triggers under a long Wasm dispatch lane -> the late-lane memory-pressure leak. Dispose
1878+
// on EVERY exit path, including the stray-message early return below. WasmDispatchResponse
1879+
// is a plain C# DTO (no JSObjects), so it stays valid after msg is disposed.
1880+
using var _msgScope = msg;
18721881
var tcs = state.CurrentTcs;
18731882
if (tcs == null) return; // No in-flight dispatch; ignore late or stray message.
18741883
state.CurrentTcs = null;
@@ -1898,6 +1907,9 @@ private void EnsurePersistentHandlers(Worker worker)
18981907

18991908
state.ErrHandler = new Action<Event>((err) =>
19001909
{
1910+
// Same ownership rule as MsgHandler: the Event JSObject is handler-owned and must be
1911+
// disposed on every path (rare - only fires on a worker-level error - but still leaks if not).
1912+
using var _errScope = err;
19011913
var tcs = state.CurrentTcs;
19021914
if (tcs == null) return;
19031915
state.CurrentTcs = null;

SpawnDev.ILGPU/Wasm/WasmMemoryBuffer.cs

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,11 @@
44
//
55
// File: WasmMemoryBuffer.cs
66
//
7-
// Manages GPU memory buffers backed by SharedArrayBuffer regions.
8-
// Each buffer is a slice of a SharedArrayBuffer for zero-copy sharing across workers.
7+
// Manages GPU memory buffers, each backed by its OWN SharedArrayBuffer (persistent device storage).
8+
// Per dispatch the SAB is staged into/out of the shared WebAssembly linear memory on the MAIN thread
9+
// (copy-IN before, copy-OUT after); workers run against the linear memory, NOT against per-buffer SABs
10+
// (only the shared linear memory is PostMessage'd to workers). The SharedArrayBuffer backing enables
11+
// zero-copy host-side reads/writes and SAB-to-SAB (Wasm-to-Wasm) device copies.
912
// ---------------------------------------------------------------------------------------
1013

1114
using global::ILGPU;
@@ -17,7 +20,8 @@
1720
namespace SpawnDev.ILGPU.Wasm
1821
{
1922
/// <summary>
20-
/// Wasm memory buffer backed by a SharedArrayBuffer for zero-copy sharing across workers.
23+
/// Wasm memory buffer backed by its own SharedArrayBuffer. Staged into/out of the shared
24+
/// WebAssembly linear memory per dispatch (workers run against the linear memory, not this SAB).
2125
/// </summary>
2226
public class WasmMemoryBuffer : MemoryBuffer, IBrowserMemoryBuffer
2327
{

0 commit comments

Comments
 (0)