fix(neuralnetworks): restore Predict override bypass for compiled-replay bug by ooples · Pull Request #1167 · ooples/AiDotNet

ooples · 2026-04-20T02:17:32Z

Summary

PR #1155 changed public override Tensor<T> Predict(Tensor<T> input) => Forward(input) to protected override Tensor<T> PredictEager(Tensor<T> input) => Forward(input) in 11 network classes. That reroutes Predict through the compiled-replay path (NeuralNetworkBase.PredictCompiled → CompiledModelHost.Predict), which silently truncates the forward pass for these models.

On a ResNet-18 with [3, 32, 32] input, the compiled plan returns a shape-[64] tensor (first-conv-block's 64 channels) instead of the shape-[10] logits. Diagnostic:

Forward direct shape:        [10]  ✓
Predict (compile OFF) shape: [10]  ✓
Predict (compile ON) shape:  [64]  ✗  regression

This PR restores master's previous behavior — Predict calls Forward directly — in all 11 affected classes.

Root cause

The compiled-replay tracer does not capture shape-conditional control flow in Forward (rank-3 → rank-4 batch promotion, final Reshape, etc.). Fixing that is a larger infrastructure change; this hotfix closes the immediate CI failures so PR #1163 and PR #1165 can merge, and any future PR that merges master.

Once the tracer is hardened, the Predict overrides restored here can be removed.

Affected models

ResNetNetwork, VGGNetwork, EfficientNetNetwork, MobileNetV2Network, ConvolutionalNeuralNetwork, UNet3D, VoxelCNN, FastText, GloVe, Word2Vec, SiameseNeuralNetwork.

Test plan

All 4 ResNetNetworkTests.ResNetNetwork_*_ReturnsCorrectShape tests pass locally on net10.0
All 2 VGGNetworkTests.VGGNetwork_Predict_* tests pass locally on net10.0
CI: the full 08a NN-Classic shard turns green
CI: the ModelFamily shards (Classification, NN, Diffusion A-Z, Regression, Generated Layers) turn green

🤖 Generated with Claude Code

Summary by CodeRabbit

Refactor
- Updated inference across 11 neural-network models to use a single, direct prediction pathway. Predictions now consistently follow the model's direct execution route, bypassing the previous compiled-replay path to address shape-conditional inference differences. This standardizes runtime behavior across models and reduces discrepancies between eager and compiled execution.

vercel · 2026-04-20T02:17:36Z

The latest updates on your projects. Learn more about Vercel for GitHub.

2 Skipped Deployments

Project	Deployment	Actions	Updated (UTC)
aidotnet_website	Ignored		Apr 20, 2026 3:03am
aidotnet-playground-api	Ignored	Preview	Apr 20, 2026 3:03am

coderabbitai · 2026-04-20T02:17:54Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: ec5c0aae-322b-4249-b972-10d378462571

📥 Commits

Reviewing files that changed from the base of the PR and between 257c5fc and 98b7fcd.

📒 Files selected for processing (11)

src/NeuralNetworks/ConvolutionalNeuralNetwork.cs
src/NeuralNetworks/EfficientNetNetwork.cs
src/NeuralNetworks/FastText.cs
src/NeuralNetworks/GloVe.cs
src/NeuralNetworks/MobileNetV2Network.cs
src/NeuralNetworks/ResNetNetwork.cs
src/NeuralNetworks/SiameseNeuralNetwork.cs
src/NeuralNetworks/UNet3D.cs
src/NeuralNetworks/VGGNetwork.cs
src/NeuralNetworks/VoxelCNN.cs
src/NeuralNetworks/Word2Vec.cs

Walkthrough

Eleven neural-network classes changed their prediction override from protected PredictEager to public Predict, routing all calls to Forward(input) and bypassing the base class's compiled-plan / compiled-replay dispatch due to shape-conditional tracer issues.

Changes

Cohort / File(s)	Summary
Conv/Classic CNNs `src/NeuralNetworks/ConvolutionalNeuralNetwork.cs`, `src/NeuralNetworks/VGGNetwork.cs`, `src/NeuralNetworks/ResNetNetwork.cs`	Replaced `protected override Tensor<T> PredictEager(...)` with `public override Tensor<T> Predict(...) => Forward(input)` to bypass `NeuralNetworkBase<T>.PredictCompiled`; removed/updated XML docs describing compiled-replay routing.
Efficient / Mobile / Voxel / 3D `src/NeuralNetworks/EfficientNetNetwork.cs`, `src/NeuralNetworks/MobileNetV2Network.cs`, `src/NeuralNetworks/VoxelCNN.cs`, `src/NeuralNetworks/UNet3D.cs`	Same override change to force direct `Forward` execution (hotfix comments added citing shape-conditional control flow).
Embedding / NLP models `src/NeuralNetworks/FastText.cs`, `src/NeuralNetworks/GloVe.cs`, `src/NeuralNetworks/Word2Vec.cs`	Changed visibility/override from `PredictEager`→`Predict`, routed inference to `Forward` directly and replaced compiled-path docs with comments about shape mismatches.
Specialty networks `src/NeuralNetworks/SiameseNeuralNetwork.cs`, `src/NeuralNetworks/VGGNetwork.cs`	Overridden public `Predict` to call `Forward`, removing the eager-only hook and bypassing compiled replay. (VGG also included doc cleanup.)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

feat: wire JIT compilation through AiModelBuilder #1142 — Introduced/modified NeuralNetworkBase.Predict / compiled-plan replay behavior that these subclasses now bypass.
feat: complete all AiDotNet-side JIT items for #1015 (6 of 6) #1149 — Adds CompileForward / compiled-dispatch support in base/unet paths; touches the same compiled-inference machinery.
fix: ConditionalGAN, MeshCNN, NEAT custom test setup (#1042) #1055 — Earlier changes to model inference dispatch for related networks; overlaps override sites and behavior.

⚠️ Production Readiness Concerns

BLOCKING: Systematically overriding the public Predict to short-circuit PredictCompiled is a hotfix that raises architectural and API-surface issues that must be addressed before merging:

Subclasses now break the intended base-class dispatch contract by bypassing compiled planning; this is an API/behavioral change and must be intentional and documented in the public API.
There is no remediation of the tracer or compiled-plan machinery; this defers a core infrastructure defect rather than fixing it.
Performance/optimization paths (compiled plans, caching) are permanently disabled for these models without a toggle/fallback, potentially regressing production performance.
The change may silently alter outputs for callers expecting compiled-path behavior (different intermediate shapes); migration guidance/tests are missing.
Any TODOs, stubs, or simplified comments that imply temporary fixes must be flagged and resolved prior to release.

Addressing these requires either: restoring a safe compiled fallback, surfacing a clear opt-out/opt-in, or fixing the tracer so compiled replay is reliable. Treat this as blocking until a design decision and corresponding tests/documentation are added.

Poem

🧩 Models once traced along the replay track,
Now stride straight through Forward — no turning back.
Shapes that fooled the tracer, left plans in a fog,
So Forward stands guard, a pragmatic cog.
⚙️ Correctness first; optimization may jog.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the core change: restoring public Predict overrides to bypass the compiled-replay path for affected neural network classes due to a tracer bug.
Docstring Coverage	✅ Passed	Docstring coverage is 94.12% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/predict-override-restore

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/NeuralNetworks/MobileNetV2Network.cs`:
- Around line 292-298: The override Predict currently bypasses the compiled path
by calling Forward directly, which omits the shape-promotion/squeeze behavior
under compilation; modify MobileNetV2Network<T>.Predict to call the compiled
execution path when compilation is enabled (use
NeuralNetworkBase<T>.PredictCompiled or the existing compiled dispatcher) and
ensure that the compiled path applies the same 3D→4D promotion and final squeeze
as Forward so rank-3 inputs produce a final [numClasses] logits shape; update
the Predict implementation to detect compilation mode, route to PredictCompiled
with the same shape-transform logic, and keep Forward as the uncompiled
reference to avoid regression in compiled-mode tests.

In `@src/NeuralNetworks/UNet3D.cs`:
- Around line 239-246: Remove the duplicate XML <summary> on the Predict method
and replace it with a brief normal code comment explaining the hotfix: that
Predict calls Forward directly to avoid tracer issues observed when routing
through NeuralNetworkBase{T}.PredictCompiled; keep the method signature public
and its implementation Predict(Tensor<T> input) => Forward(input), but move the
rationale into a single-line or block comment (non-XML) so public API docs
remain the single user-facing summary in the earlier XML on the class.

In `@src/NeuralNetworks/VGGNetwork.cs`:
- Around line 342-350: The public XML <summary> on the Predict override
currently exposes tracer/compiled-replay internals; replace that XML doc with a
brief, user-facing summary (or remove it) and move the detailed regression
rationale into a regular non-XML code comment immediately above the Predict
override (e.g., referencing Predict, Forward, and
NeuralNetworkBase{T}.PredictCompiled) so the hotfix behavior (public override
Tensor<T> Predict(...) => Forward(...)) remains but implementation details are
not part of the public API docs.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: b69c6eb1-de0e-4504-b7e8-0b52cebcb245

📥 Commits

Reviewing files that changed from the base of the PR and between 4fe35b0 and 257c5fc.

📒 Files selected for processing (11)

src/NeuralNetworks/ConvolutionalNeuralNetwork.cs
src/NeuralNetworks/EfficientNetNetwork.cs
src/NeuralNetworks/FastText.cs
src/NeuralNetworks/GloVe.cs
src/NeuralNetworks/MobileNetV2Network.cs
src/NeuralNetworks/ResNetNetwork.cs
src/NeuralNetworks/SiameseNeuralNetwork.cs
src/NeuralNetworks/UNet3D.cs
src/NeuralNetworks/VGGNetwork.cs
src/NeuralNetworks/VoxelCNN.cs
src/NeuralNetworks/Word2Vec.cs

ooples · 2026-04-20T02:29:47Z

Upstream root-cause issue filed: ooples/AiDotNet.Tensors#228

This PR is a band-aid — the real bug is in CompiledModelCache.GetOrCompileInference + CompiledInferencePlan.Compile (the forward lambda's return is discarded; the plan picks optimizedSteps[last].OutputBuffer as the output and that heuristic fails when the forward ends with a metadata-view op like Reshape on contiguous data).

Once Tensors #228 is fixed and a new Tensors package is bumped here, the 11 public override Tensor<T> Predict(...) => Forward(input); overrides this PR restores should be reverted so the affected models pick up compiled replay again.

Tensors 0.50.1 is the release the open compile-tracer bugs point to; 0.50.0 was the one I initially pinned but it's strictly a prerequisite step. The API change that matters for us: GetOrCompileInference / GetOrCompileTraining accept Func<Tensor<T>> (return = plan output tensor) instead of Action (tail-op inferred). The explicit return fixes the shape-conditional truncation that forced PR #1167's Predict-bypass workaround: the tracer no longer guesses which op produced the plan's output, so rank-3 -> rank-4 promotions and post-forward reshapes no longer get silently dropped. Three call sites updated to return the last tensor explicitly: src/AiModelBuilder.cs:1344 — return nnModel.Predict(input) src/Training/CompiledTapeTrainingStep.cs:139 — return computeLoss(...) src/Training/CompiledTapeTrainingStep.cs:269 — return computeLoss(...) Follow-up (separate PR): revert the 11 Predict-override-restore sites from PR #1167 back to PredictEager now that the tracer actually captures their forward behaviour correctly. Build: dotnet build -f net10.0 + net471 both green, 0 errors.

ooples · 2026-04-20T13:37:00Z

Superseded by #1179 (Tensors 0.50.1 bump). The upstream tracer now captures shape-conditional control flow in Forward (rank-3 → rank-4 batch promotion, trailing Reshape) correctly, which was the root cause this workaround was papering over. No code change needed — the 11 models keep the PredictEager routing from PR #1155, and the compiled-replay path no longer truncates their output.

Closing this PR + deleting the branch.

ooples · 2026-04-20T13:37:25Z

Auto-closed when I reset the branch to master. This is the intended outcome: Tensors 0.50.1 (bumped in #1179) fixes the compiled-replay tracer upstream, so the 11 models keep their PredictEager routing from #1155 and no code change is needed here. Workaround obsolete.

* deps: bump AiDotNet.Tensors 0.46.1 -> 0.50.0 and AiDotNet.Native.OpenBLAS 0.28.0 -> 0.50.0 Unifies the AiDotNet ecosystem on 0.50.0: AiDotNet.Tensors 0.46.1 -> 0.50.0 AiDotNet.Native.OneDNN 0.50.0 (already on 0.50.0) AiDotNet.Native.OpenBLAS 0.28.0 -> 0.50.0 AiDotNet.Native.CLBlast 0.50.0 (already on 0.50.0) Supersedes dependabot PRs #1170 (OpenBLAS) and #1171 (Tensors), which dependabot auto-closed when the branches were rebased onto the current master that already had OneDNN + CLBlast at 0.50.0. Bundling both into one human-authored PR avoids the tamper-detection closure and keeps all four native backends at the same release line. Build: dotnet build -f net10.0 + net471 both green, 0 errors. * deps: Tensors 0.50.0 -> 0.50.1 + adapt to Func<Tensor<T>> trace lambdas Tensors 0.50.1 is the release the open compile-tracer bugs point to; 0.50.0 was the one I initially pinned but it's strictly a prerequisite step. The API change that matters for us: GetOrCompileInference / GetOrCompileTraining accept Func<Tensor<T>> (return = plan output tensor) instead of Action (tail-op inferred). The explicit return fixes the shape-conditional truncation that forced PR #1167's Predict-bypass workaround: the tracer no longer guesses which op produced the plan's output, so rank-3 -> rank-4 promotions and post-forward reshapes no longer get silently dropped. Three call sites updated to return the last tensor explicitly: src/AiModelBuilder.cs:1344 — return nnModel.Predict(input) src/Training/CompiledTapeTrainingStep.cs:139 — return computeLoss(...) src/Training/CompiledTapeTrainingStep.cs:269 — return computeLoss(...) Follow-up (separate PR): revert the 11 Predict-override-restore sites from PR #1167 back to PredictEager now that the tracer actually captures their forward behaviour correctly. Build: dotnet build -f net10.0 + net471 both green, 0 errors. --------- Co-authored-by: franklinic <franklin@ivorycloud.com>

coderabbitai Bot requested changes Apr 20, 2026

View reviewed changes

Comment thread src/NeuralNetworks/MobileNetV2Network.cs Outdated

Comment thread src/NeuralNetworks/UNet3D.cs Outdated

Comment thread src/NeuralNetworks/VGGNetwork.cs Outdated

ooples mentioned this pull request Apr 20, 2026

Compiled inference plan returns wrong output tensor when forward ends in a metadata view (Reshape/Squeeze/etc.) ooples/AiDotNet.Tensors#228

Closed

ooples mentioned this pull request Apr 20, 2026

fix(compilation): propagate caller's returned tensor as final output (closes #228) ooples/AiDotNet.Tensors#229

Merged

9 tasks

coderabbitai Bot approved these changes Apr 20, 2026

View reviewed changes

ooples closed this Apr 20, 2026

ooples force-pushed the fix/predict-override-restore branch from 98b7fcd to 8ffb878 Compare April 20, 2026 13:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(neuralnetworks): restore Predict override bypass for compiled-replay bug#1167

fix(neuralnetworks): restore Predict override bypass for compiled-replay bug#1167
ooples wants to merge 0 commit intomasterfrom
fix/predict-override-restore

ooples commented Apr 20, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

vercel Bot commented Apr 20, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Apr 20, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ooples commented Apr 20, 2026

Uh oh!

ooples commented Apr 20, 2026

Uh oh!

ooples commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

ooples commented Apr 20, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root cause

Affected models

Test plan

Summary by CodeRabbit

Uh oh!

vercel Bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

⚠️ Production Readiness Concerns

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ooples commented Apr 20, 2026

Uh oh!

ooples commented Apr 20, 2026

Uh oh!

ooples commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ooples commented Apr 20, 2026 •

edited by coderabbitai Bot

Loading

vercel Bot commented Apr 20, 2026 •

edited

Loading

coderabbitai Bot commented Apr 20, 2026 •

edited

Loading