test(#1352 + #1353): pin no-side-effects-on-failed-JIT-trace contract (#1387)

ooples · franklinic · web-flow · commit 4eb7cd74f3b1 · 2026-05-19T15:05:15.000-04:00
* test(#1352 + #1353): pin no-side-effects-on-failed-JIT-trace contract Both #1352 (JIT compiled-replay trace fails inside LayerNorm with "Destination is too short") and #1353 (the failed trace silently mutates trace-time captured tensors via the LayerNorm lazy callback's mean/variance copyback) closed as not-reproducible on AiDotNet 0.204 + AiDotNet.Tensors 0.81.3. Investigation evidence: 1. The textbook repro from #1352 (per-sample-trained Transformer<float> V=256 dModel=128 L=2 ctx=64, with both the issue's no-explicit- optimizer config so Vaswani-Adam + NoamSchedule installs, and a thicker 2-epoch / 512-sample training driver) does NOT throw "Destination is too short" through GetOrCompileInference. Post-JIT logits match baseline byte-for-byte. 2. A focused architectural probe with a CpuEngine subclass counting eager LayerNorm invocations -- two chained LayerNorms inside the forward closure (both survive DCE -- node A has consumer, node B is leaf), closure throws after recording both -- shows delta=0 across scope.Dispose. The lazy callbacks do NOT re-fire during the safety-net Realize() on the partial graph. Root cause of the silent fix: CpuEngine.LayerNorm's GraphMode branch ends with `eagerResult.AsSpan().CopyTo(lazyResult.AsWritableSpan())`. AsWritableSpan() on a tensor with a non-null LazySource auto- materializes the node, setting IsRealized=true and running the callback exactly once at trace time. By the time CompiledModelCache's using- scope Dispose hits the safety-net Realize() on a failed trace, every recorded node short-circuits via `if (IsRealized) return;`. The mutation-on-failure channel both issues describe is closed off as a side effect of this pre-realization pattern (which exists for the ooples/AiDotNet.Tensors#1331-family correctness reasons -- LayerNorm savedState mean/variance refresh). What this PR ships: a single regression-pin test that asserts the no-replay invariant via the spy-engine counter. If a future AiDotNet or AiDotNet.Tensors change moves the eager-copy elsewhere (or otherwise breaks the trace-time pre-realization that closes off the mutation channel), this test fails immediately -- before the regression reaches consumers wrapping GetOrCompileInference in try/catch (the documented JIT fallback pattern). Closes #1352 Closes #1353 * fix(PR #1387 review): explicit spy-count precondition + clarify DCE comment * fix(PR #1387 follow-up): tighten spy precondition from >0 to >=2 to match two-call invariant --------- Co-authored-by: franklinic <franklin@ivorycloud.com>
diff --git a/tests/AiDotNet.Tests/IntegrationTests/Jit/CompiledInferenceLazyCallbackSideEffectRegressionTests.cs b/tests/AiDotNet.Tests/IntegrationTests/Jit/CompiledInferenceLazyCallbackSideEffectRegressionTests.cs
@@ -0,0 +1,221 @@
+using System.Threading;
+using System.Threading.Tasks;
+using AiDotNet.Tensors.Engines;
+using AiDotNet.Tensors.Engines.Compilation;
+using AiDotNet.Tensors.LinearAlgebra;
+using Xunit;
+using Xunit.Abstractions;
+
+namespace AiDotNet.Tests.IntegrationTests.Jit;
+
+/// <summary>
+/// Regression pin for the side-effect contract that closed issues #1352 and
+/// #1353 (failed JIT trace inside LayerNorm and the associated trace-time
+/// mean/variance state-mutation channel). Both issues were closed as
+/// not-reproducible on AiDotNet 0.204 + AiDotNet.Tensors 0.81.3 — the
+/// textbook repros pass cleanly because the #1331-family shape-tracking
+/// fixes neutralized the upstream "Destination is too short" trigger AND
+/// because <see cref="CpuEngine"/>'s LayerNorm/RmsNorm/BatchNorm/GroupNorm/
+/// InstanceNorm/Dropout lazy callbacks now pre-realize their nodes at trace
+/// time via <c>eagerResult.AsSpan().CopyTo(lazyResult.AsWritableSpan())</c>:
+/// <c>AsWritableSpan()</c> on a tensor with a non-null <c>LazySource</c>
+/// auto-materializes the node, setting <c>IsRealized=true</c> and running
+/// the callback exactly once. By the time
+/// <see cref="CompiledModelCache{T}.GetOrCompileInference(Tensor{T}, System.Func{Tensor{T}})"/>'s
+/// <c>using</c>-scope <c>Dispose</c> hits the safety-net <c>Realize()</c>
+/// on a failed trace, every recorded node short-circuits via
+/// <c>if (IsRealized) return;</c>.
+///
+/// <para>
+/// The contract this test pins: a <c>forward</c> closure that throws after
+/// recording lazy LayerNorm nodes must NOT cause the scope's auto-realize
+/// to re-execute those callbacks during dispose. The signal is a spy
+/// engine's eager-LayerNorm counter — if the count after dispose exceeds
+/// the count at the throw point, a future change has regressed the
+/// pre-realization optimization and reopened the #1352/#1353 mutation
+/// channel. Without that channel closed, every consumer that wraps
+/// <c>GetOrCompileInference</c> in try/catch (the documented JIT fallback
+/// pattern) leaks state corruption back into their model when the trace
+/// fails.
+/// </para>
+/// </summary>
+[Collection("NonParallelIntegration")]
+public class CompiledInferenceLazyCallbackSideEffectRegressionTests
+{
+    private readonly ITestOutputHelper _output;
+
+    public CompiledInferenceLazyCallbackSideEffectRegressionTests(ITestOutputHelper output)
+    {
+        _output = output;
+    }
+
+    /// <summary>
+    /// When the <c>forward</c> closure passed to
+    /// <see cref="CompiledModelCache{T}.GetOrCompileInference(Tensor{T}, System.Func{Tensor{T}})"/>
+    /// throws after recording two chained lazy LayerNorm nodes, the lazy-
+    /// graph scope's auto-realize on disposal must not re-execute any of
+    /// the recorded callbacks. Two chained nodes are recorded so neither
+    /// can be removed by the graph compiler's
+    /// <c>DeadCodeEliminationPass</c> (node A has a consumer, node B is a
+    /// leaf — both survive DCE). A spy engine subclasses
+    /// <see cref="CpuEngine"/> and counts each invocation of the eager
+    /// LayerNorm kernel; the count snapshotted at the throw point must
+    /// equal the count after <see cref="CompiledModelCache{T}"/> has
+    /// disposed its internal scope and propagated the original exception.
+    /// </summary>
+    [Fact(Timeout = 60_000)]
+    public async Task GetOrCompileInference_ForwardThrowsAfterTwoLayerNorms_DoesNotReplayLazyCallbacksOnDispose()
+    {
+        await Task.Yield();
+
+        var spy = new LayerNormSpyEngine();
+        var previousEngine = AiDotNetEngine.Current;
+        AiDotNetEngine.Current = spy;
+        try
+        {
+            const int B = 2;
+            const int F = 8;
+            var input = MakeInput(B, F);
+            var gamma = MakeGamma(F);
+            var beta = MakeBeta(F);
+
+            int eagerCountAtThrow = -1;
+            bool twoLayerNormsRecorded = false;
+
+            using var cache = new CompiledModelCache<float>();
+
+            var thrown = Assert.ThrowsAny<System.Exception>(() =>
+                cache.GetOrCompileInference(input, () =>
+                {
+                    // Two chained LayerNorms — neither is eliminated by
+                    // DeadCodeEliminationPass (node A has node B as
+                    // consumer, node B is a graph leaf, and the pass
+                    // keeps both consumers AND leaves). Both stay in
+                    // the realized node list and would BOTH re-fire if
+                    // scope.Dispose's safety-net Realize ran on the
+                    // partial graph.
+                    var ln1 = AiDotNetEngine.Current.LayerNorm(
+                        input, gamma, beta, 1e-5,
+                        out _, out _);
+                    _ = AiDotNetEngine.Current.LayerNorm(
+                        ln1, gamma, beta, 1e-5,
+                        out _, out _);
+
+                    eagerCountAtThrow = spy.EagerInvocationCount;
+                    twoLayerNormsRecorded = true;
+
+                    // Force partial-trace failure. The scope's auto-
+                    // realize fires from the using-block's implicit
+                    // finally — see LazyTensorScope.Dispose.
+                    throw new System.InvalidOperationException(
+                        "AIDN-1352-1353 forced-trace-failure sentinel");
+                }));
+
+            Assert.True(
+                twoLayerNormsRecorded,
+                "Test precondition failed: lazy LayerNorm ops never ran, " +
+                "so the scope's auto-realize channel can't be exercised.");
+
+            // PR #1387 review C8XnD: also pin that the spy ACTUALLY
+            // observed `LayerNorm` invocations. Without this guard, a
+            // future change that stopped routing `AiDotNetEngine.Current`
+            // through `LayerNormSpyEngine` (e.g. a static-Current-cache
+            // change, or an engine-binding refactor that captures the
+            // pre-test engine reference) would leave both counters at 0
+            // and the delta assertion below would pass vacuously —
+            // turning this regression pin into a no-op signal.
+            //
+            // Follow-up review C9TmK: tightened the threshold from > 0
+            // to >= 2 — this test makes exactly two user-visible
+            // LayerNorm calls, so anything less means at least one
+            // didn't route through the spy. The exact count varies
+            // (~6 per visible call due to GraphMode-recursive entry +
+            // AsWritableSpan auto-materialization — see the spy class
+            // XML doc) so we only assert the lower bound, not the
+            // precise number.
+            Assert.True(
+                eagerCountAtThrow >= 2,
+                $"Test precondition failed: the spy engine observed " +
+                $"{eagerCountAtThrow} LayerNorm invocations at the throw " +
+                "point, but the two user-visible calls should produce at " +
+                "least 2 hits. The `AiDotNetEngine.Current` override may " +
+                "not be reaching `LayerNormSpyEngine.LayerNorm` — fix the " +
+                "spy wiring before trusting the delta check below.");
+
+            // The original exception must propagate unmasked. The
+            // partial-trace path's safety-net Realize can only mask this
+            // by throwing its own exception during dispose; with the
+            // pre-realization optimization in place it short-circuits
+            // every node and exits cleanly.
+            Assert.IsType<System.InvalidOperationException>(thrown);
+            Assert.Equal(
+                "AIDN-1352-1353 forced-trace-failure sentinel",
+                thrown.Message);
+
+            // The decisive regression signal: did scope.Dispose's
+            // Realize() re-execute any lazy LayerNorm callback?
+            int eagerCountAfterDispose = spy.EagerInvocationCount;
+            _output.WriteLine(
+                $"eager LayerNorm calls: at throw={eagerCountAtThrow}, " +
+                $"post-Dispose={eagerCountAfterDispose}, " +
+                $"delta={eagerCountAfterDispose - eagerCountAtThrow}");
+            Assert.Equal(eagerCountAtThrow, eagerCountAfterDispose);
+        }
+        finally
+        {
+            AiDotNetEngine.Current = previousEngine;
+        }
+    }
+
+    /// <summary>
+    /// <see cref="CpuEngine"/> subclass that counts every invocation of
+    /// the LayerNorm entry point. Each user-visible LayerNorm under
+    /// GraphMode currently produces three spy hits (the outer dispatch,
+    /// the GraphMode-branch recursive eager call after scope is nulled,
+    /// and the trace-time auto-materialization triggered by
+    /// <c>AsWritableSpan</c>), so two chained LayerNorms produce six
+    /// hits at the throw point. The exact factor is implementation-
+    /// dependent and not what the test asserts on — the assertion is
+    /// on the DELTA across scope.Dispose, which must remain zero.
+    /// </summary>
+    private sealed class LayerNormSpyEngine : CpuEngine
+    {
+        private int _eagerCount;
+        public int EagerInvocationCount => Volatile.Read(ref _eagerCount);
+
+        public override Tensor<T> LayerNorm<T>(
+            Tensor<T> input,
+            Tensor<T> gamma,
+            Tensor<T> beta,
+            double epsilon,
+            out Tensor<T> mean,
+            out Tensor<T> variance)
+        {
+            Interlocked.Increment(ref _eagerCount);
+            return base.LayerNorm(input, gamma, beta, epsilon, out mean, out variance);
+        }
+    }
+
+    private static Tensor<float> MakeInput(int batch, int features)
+    {
+        var t = new Tensor<float>(new[] { batch, features });
+        for (int b = 0; b < batch; b++)
+            for (int f = 0; f < features; f++)
+                t[b, f] = (b * 13 + f * 7 + 1) * 0.1f;
+        return t;
+    }
+
+    private static Tensor<float> MakeGamma(int features)
+    {
+        var t = new Tensor<float>(new[] { features });
+        for (int f = 0; f < features; f++) t[f] = 1.0f;
+        return t;
+    }
+
+    private static Tensor<float> MakeBeta(int features)
+    {
+        var t = new Tensor<float>(new[] { features });
+        for (int f = 0; f < features; f++) t[f] = 0.0f;
+        return t;
+    }
+}