Commit 900134c
When CompiledTapeTrainingStep.TryStepWithFusedOptimizer caught an
exception (anything thrown by plan.Step or ConfigureOptimizer), it
logged to Trace and returned false. The caller (NeuralNetworkBase) saw
ran=false plus _fusedTrainingCommitted=true and threw a generic
InvalidOperationException listing 'common causes' but NOT the actual
exception text. Tests failing on this path got an opaque error;
debugging required reproducing the failure locally just to see the
Trace output.
## Fix
CompiledTapeTrainingStep now stashes the caught exception in a
[ThreadStatic] field on the catch path, cleared on entry to each call.
NeuralNetworkBase's fused-committed throw at line 6240 reads it back
via GetLastFallbackException() and quotes the type+message inline,
plus attaches the underlying exception as innerException so callers
that introspect ex.InnerException get the full stack.
Three concrete exception types observed under parallel test load
(traced during AiDotNet#1395 investigation) that now surface inline
instead of being hidden:
- InvalidOperationException 'Parameter N has a layout that does not
expose a live CPU backing array' (ConfigureOptimizerDouble)
- ArgumentException 'gradOutput shape [...] must be [...]' (kernel
backward shape mismatch)
- InvalidOperationException 'Lazy tensor produced NaN at index 0'
(DifferentiableNeuralComputer numerical guard)
Paired with the AiDotNet.Tensors-side
fix(AiDotNet#1395): drop IsDeterministicMode from CompiledModelCache
shape-key, which removes the cross-test cache-key drift that caused
the throw in the first place.
Build clean on net10.0 + net471.
Co-authored-by: franklinic <franklin@ivorycloud.com>
1 parent befe892 commit 900134c
2 files changed
Lines changed: 57 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
6262 | 6262 | | |
6263 | 6263 | | |
6264 | 6264 | | |
| 6265 | + | |
| 6266 | + | |
6265 | 6267 | | |
6266 | 6268 | | |
6267 | 6269 | | |
6268 | 6270 | | |
6269 | 6271 | | |
| 6272 | + | |
| 6273 | + | |
| 6274 | + | |
| 6275 | + | |
| 6276 | + | |
| 6277 | + | |
| 6278 | + | |
| 6279 | + | |
| 6280 | + | |
| 6281 | + | |
| 6282 | + | |
| 6283 | + | |
| 6284 | + | |
| 6285 | + | |
6270 | 6286 | | |
6271 | 6287 | | |
6272 | 6288 | | |
6273 | 6289 | | |
6274 | 6290 | | |
6275 | | - | |
| 6291 | + | |
| 6292 | + | |
6276 | 6293 | | |
6277 | 6294 | | |
6278 | | - | |
| 6295 | + | |
| 6296 | + | |
| 6297 | + | |
6279 | 6298 | | |
6280 | 6299 | | |
6281 | 6300 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
102 | 102 | | |
103 | 103 | | |
104 | 104 | | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
105 | 129 | | |
106 | 130 | | |
107 | 131 | | |
| |||
282 | 306 | | |
283 | 307 | | |
284 | 308 | | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
285 | 314 | | |
286 | 315 | | |
287 | 316 | | |
| |||
505 | 534 | | |
506 | 535 | | |
507 | 536 | | |
| 537 | + | |
| 538 | + | |
| 539 | + | |
| 540 | + | |
| 541 | + | |
| 542 | + | |
| 543 | + | |
508 | 544 | | |
509 | 545 | | |
510 | 546 | | |
| |||
0 commit comments