Skip to content

Commit 4eb7cd7

Browse files
ooplesfranklinic
andauthored
test(#1352 + #1353): pin no-side-effects-on-failed-JIT-trace contract (#1387)
* test(#1352 + #1353): pin no-side-effects-on-failed-JIT-trace contract Both #1352 (JIT compiled-replay trace fails inside LayerNorm with "Destination is too short") and #1353 (the failed trace silently mutates trace-time captured tensors via the LayerNorm lazy callback's mean/variance copyback) closed as not-reproducible on AiDotNet 0.204 + AiDotNet.Tensors 0.81.3. Investigation evidence: 1. The textbook repro from #1352 (per-sample-trained Transformer<float> V=256 dModel=128 L=2 ctx=64, with both the issue's no-explicit- optimizer config so Vaswani-Adam + NoamSchedule installs, and a thicker 2-epoch / 512-sample training driver) does NOT throw "Destination is too short" through GetOrCompileInference. Post-JIT logits match baseline byte-for-byte. 2. A focused architectural probe with a CpuEngine subclass counting eager LayerNorm invocations -- two chained LayerNorms inside the forward closure (both survive DCE -- node A has consumer, node B is leaf), closure throws after recording both -- shows delta=0 across scope.Dispose. The lazy callbacks do NOT re-fire during the safety-net Realize() on the partial graph. Root cause of the silent fix: CpuEngine.LayerNorm's GraphMode branch ends with `eagerResult.AsSpan().CopyTo(lazyResult.AsWritableSpan())`. AsWritableSpan() on a tensor with a non-null LazySource auto- materializes the node, setting IsRealized=true and running the callback exactly once at trace time. By the time CompiledModelCache's using- scope Dispose hits the safety-net Realize() on a failed trace, every recorded node short-circuits via `if (IsRealized) return;`. The mutation-on-failure channel both issues describe is closed off as a side effect of this pre-realization pattern (which exists for the ooples/AiDotNet.Tensors#1331-family correctness reasons -- LayerNorm savedState mean/variance refresh). What this PR ships: a single regression-pin test that asserts the no-replay invariant via the spy-engine counter. If a future AiDotNet or AiDotNet.Tensors change moves the eager-copy elsewhere (or otherwise breaks the trace-time pre-realization that closes off the mutation channel), this test fails immediately -- before the regression reaches consumers wrapping GetOrCompileInference in try/catch (the documented JIT fallback pattern). Closes #1352 Closes #1353 * fix(PR #1387 review): explicit spy-count precondition + clarify DCE comment * fix(PR #1387 follow-up): tighten spy precondition from >0 to >=2 to match two-call invariant --------- Co-authored-by: franklinic <franklin@ivorycloud.com>
1 parent df96cde commit 4eb7cd7

1 file changed

Lines changed: 221 additions & 0 deletions

File tree

Lines changed: 221 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,221 @@
1+
using System.Threading;
2+
using System.Threading.Tasks;
3+
using AiDotNet.Tensors.Engines;
4+
using AiDotNet.Tensors.Engines.Compilation;
5+
using AiDotNet.Tensors.LinearAlgebra;
6+
using Xunit;
7+
using Xunit.Abstractions;
8+
9+
namespace AiDotNet.Tests.IntegrationTests.Jit;
10+
11+
/// <summary>
12+
/// Regression pin for the side-effect contract that closed issues #1352 and
13+
/// #1353 (failed JIT trace inside LayerNorm and the associated trace-time
14+
/// mean/variance state-mutation channel). Both issues were closed as
15+
/// not-reproducible on AiDotNet 0.204 + AiDotNet.Tensors 0.81.3 — the
16+
/// textbook repros pass cleanly because the #1331-family shape-tracking
17+
/// fixes neutralized the upstream "Destination is too short" trigger AND
18+
/// because <see cref="CpuEngine"/>'s LayerNorm/RmsNorm/BatchNorm/GroupNorm/
19+
/// InstanceNorm/Dropout lazy callbacks now pre-realize their nodes at trace
20+
/// time via <c>eagerResult.AsSpan().CopyTo(lazyResult.AsWritableSpan())</c>:
21+
/// <c>AsWritableSpan()</c> on a tensor with a non-null <c>LazySource</c>
22+
/// auto-materializes the node, setting <c>IsRealized=true</c> and running
23+
/// the callback exactly once. By the time
24+
/// <see cref="CompiledModelCache{T}.GetOrCompileInference(Tensor{T}, System.Func{Tensor{T}})"/>'s
25+
/// <c>using</c>-scope <c>Dispose</c> hits the safety-net <c>Realize()</c>
26+
/// on a failed trace, every recorded node short-circuits via
27+
/// <c>if (IsRealized) return;</c>.
28+
///
29+
/// <para>
30+
/// The contract this test pins: a <c>forward</c> closure that throws after
31+
/// recording lazy LayerNorm nodes must NOT cause the scope's auto-realize
32+
/// to re-execute those callbacks during dispose. The signal is a spy
33+
/// engine's eager-LayerNorm counter — if the count after dispose exceeds
34+
/// the count at the throw point, a future change has regressed the
35+
/// pre-realization optimization and reopened the #1352/#1353 mutation
36+
/// channel. Without that channel closed, every consumer that wraps
37+
/// <c>GetOrCompileInference</c> in try/catch (the documented JIT fallback
38+
/// pattern) leaks state corruption back into their model when the trace
39+
/// fails.
40+
/// </para>
41+
/// </summary>
42+
[Collection("NonParallelIntegration")]
43+
public class CompiledInferenceLazyCallbackSideEffectRegressionTests
44+
{
45+
private readonly ITestOutputHelper _output;
46+
47+
public CompiledInferenceLazyCallbackSideEffectRegressionTests(ITestOutputHelper output)
48+
{
49+
_output = output;
50+
}
51+
52+
/// <summary>
53+
/// When the <c>forward</c> closure passed to
54+
/// <see cref="CompiledModelCache{T}.GetOrCompileInference(Tensor{T}, System.Func{Tensor{T}})"/>
55+
/// throws after recording two chained lazy LayerNorm nodes, the lazy-
56+
/// graph scope's auto-realize on disposal must not re-execute any of
57+
/// the recorded callbacks. Two chained nodes are recorded so neither
58+
/// can be removed by the graph compiler's
59+
/// <c>DeadCodeEliminationPass</c> (node A has a consumer, node B is a
60+
/// leaf — both survive DCE). A spy engine subclasses
61+
/// <see cref="CpuEngine"/> and counts each invocation of the eager
62+
/// LayerNorm kernel; the count snapshotted at the throw point must
63+
/// equal the count after <see cref="CompiledModelCache{T}"/> has
64+
/// disposed its internal scope and propagated the original exception.
65+
/// </summary>
66+
[Fact(Timeout = 60_000)]
67+
public async Task GetOrCompileInference_ForwardThrowsAfterTwoLayerNorms_DoesNotReplayLazyCallbacksOnDispose()
68+
{
69+
await Task.Yield();
70+
71+
var spy = new LayerNormSpyEngine();
72+
var previousEngine = AiDotNetEngine.Current;
73+
AiDotNetEngine.Current = spy;
74+
try
75+
{
76+
const int B = 2;
77+
const int F = 8;
78+
var input = MakeInput(B, F);
79+
var gamma = MakeGamma(F);
80+
var beta = MakeBeta(F);
81+
82+
int eagerCountAtThrow = -1;
83+
bool twoLayerNormsRecorded = false;
84+
85+
using var cache = new CompiledModelCache<float>();
86+
87+
var thrown = Assert.ThrowsAny<System.Exception>(() =>
88+
cache.GetOrCompileInference(input, () =>
89+
{
90+
// Two chained LayerNorms — neither is eliminated by
91+
// DeadCodeEliminationPass (node A has node B as
92+
// consumer, node B is a graph leaf, and the pass
93+
// keeps both consumers AND leaves). Both stay in
94+
// the realized node list and would BOTH re-fire if
95+
// scope.Dispose's safety-net Realize ran on the
96+
// partial graph.
97+
var ln1 = AiDotNetEngine.Current.LayerNorm(
98+
input, gamma, beta, 1e-5,
99+
out _, out _);
100+
_ = AiDotNetEngine.Current.LayerNorm(
101+
ln1, gamma, beta, 1e-5,
102+
out _, out _);
103+
104+
eagerCountAtThrow = spy.EagerInvocationCount;
105+
twoLayerNormsRecorded = true;
106+
107+
// Force partial-trace failure. The scope's auto-
108+
// realize fires from the using-block's implicit
109+
// finally — see LazyTensorScope.Dispose.
110+
throw new System.InvalidOperationException(
111+
"AIDN-1352-1353 forced-trace-failure sentinel");
112+
}));
113+
114+
Assert.True(
115+
twoLayerNormsRecorded,
116+
"Test precondition failed: lazy LayerNorm ops never ran, " +
117+
"so the scope's auto-realize channel can't be exercised.");
118+
119+
// PR #1387 review C8XnD: also pin that the spy ACTUALLY
120+
// observed `LayerNorm` invocations. Without this guard, a
121+
// future change that stopped routing `AiDotNetEngine.Current`
122+
// through `LayerNormSpyEngine` (e.g. a static-Current-cache
123+
// change, or an engine-binding refactor that captures the
124+
// pre-test engine reference) would leave both counters at 0
125+
// and the delta assertion below would pass vacuously —
126+
// turning this regression pin into a no-op signal.
127+
//
128+
// Follow-up review C9TmK: tightened the threshold from > 0
129+
// to >= 2 — this test makes exactly two user-visible
130+
// LayerNorm calls, so anything less means at least one
131+
// didn't route through the spy. The exact count varies
132+
// (~6 per visible call due to GraphMode-recursive entry +
133+
// AsWritableSpan auto-materialization — see the spy class
134+
// XML doc) so we only assert the lower bound, not the
135+
// precise number.
136+
Assert.True(
137+
eagerCountAtThrow >= 2,
138+
$"Test precondition failed: the spy engine observed " +
139+
$"{eagerCountAtThrow} LayerNorm invocations at the throw " +
140+
"point, but the two user-visible calls should produce at " +
141+
"least 2 hits. The `AiDotNetEngine.Current` override may " +
142+
"not be reaching `LayerNormSpyEngine.LayerNorm` — fix the " +
143+
"spy wiring before trusting the delta check below.");
144+
145+
// The original exception must propagate unmasked. The
146+
// partial-trace path's safety-net Realize can only mask this
147+
// by throwing its own exception during dispose; with the
148+
// pre-realization optimization in place it short-circuits
149+
// every node and exits cleanly.
150+
Assert.IsType<System.InvalidOperationException>(thrown);
151+
Assert.Equal(
152+
"AIDN-1352-1353 forced-trace-failure sentinel",
153+
thrown.Message);
154+
155+
// The decisive regression signal: did scope.Dispose's
156+
// Realize() re-execute any lazy LayerNorm callback?
157+
int eagerCountAfterDispose = spy.EagerInvocationCount;
158+
_output.WriteLine(
159+
$"eager LayerNorm calls: at throw={eagerCountAtThrow}, " +
160+
$"post-Dispose={eagerCountAfterDispose}, " +
161+
$"delta={eagerCountAfterDispose - eagerCountAtThrow}");
162+
Assert.Equal(eagerCountAtThrow, eagerCountAfterDispose);
163+
}
164+
finally
165+
{
166+
AiDotNetEngine.Current = previousEngine;
167+
}
168+
}
169+
170+
/// <summary>
171+
/// <see cref="CpuEngine"/> subclass that counts every invocation of
172+
/// the LayerNorm entry point. Each user-visible LayerNorm under
173+
/// GraphMode currently produces three spy hits (the outer dispatch,
174+
/// the GraphMode-branch recursive eager call after scope is nulled,
175+
/// and the trace-time auto-materialization triggered by
176+
/// <c>AsWritableSpan</c>), so two chained LayerNorms produce six
177+
/// hits at the throw point. The exact factor is implementation-
178+
/// dependent and not what the test asserts on — the assertion is
179+
/// on the DELTA across scope.Dispose, which must remain zero.
180+
/// </summary>
181+
private sealed class LayerNormSpyEngine : CpuEngine
182+
{
183+
private int _eagerCount;
184+
public int EagerInvocationCount => Volatile.Read(ref _eagerCount);
185+
186+
public override Tensor<T> LayerNorm<T>(
187+
Tensor<T> input,
188+
Tensor<T> gamma,
189+
Tensor<T> beta,
190+
double epsilon,
191+
out Tensor<T> mean,
192+
out Tensor<T> variance)
193+
{
194+
Interlocked.Increment(ref _eagerCount);
195+
return base.LayerNorm(input, gamma, beta, epsilon, out mean, out variance);
196+
}
197+
}
198+
199+
private static Tensor<float> MakeInput(int batch, int features)
200+
{
201+
var t = new Tensor<float>(new[] { batch, features });
202+
for (int b = 0; b < batch; b++)
203+
for (int f = 0; f < features; f++)
204+
t[b, f] = (b * 13 + f * 7 + 1) * 0.1f;
205+
return t;
206+
}
207+
208+
private static Tensor<float> MakeGamma(int features)
209+
{
210+
var t = new Tensor<float>(new[] { features });
211+
for (int f = 0; f < features; f++) t[f] = 1.0f;
212+
return t;
213+
}
214+
215+
private static Tensor<float> MakeBeta(int features)
216+
{
217+
var t = new Tensor<float>(new[] { features });
218+
for (int f = 0; f < features; f++) t[f] = 0.0f;
219+
return t;
220+
}
221+
}

0 commit comments

Comments
 (0)