Skip to content

fix(neuralnetworks): restore Predict override bypass for compiled-replay bug#1167

Closed
ooples wants to merge 0 commit intomasterfrom
fix/predict-override-restore
Closed

fix(neuralnetworks): restore Predict override bypass for compiled-replay bug#1167
ooples wants to merge 0 commit intomasterfrom
fix/predict-override-restore

Conversation

@ooples
Copy link
Copy Markdown
Owner

@ooples ooples commented Apr 20, 2026

Summary

PR #1155 changed public override Tensor<T> Predict(Tensor<T> input) => Forward(input) to protected override Tensor<T> PredictEager(Tensor<T> input) => Forward(input) in 11 network classes. That reroutes Predict through the compiled-replay path (NeuralNetworkBase.PredictCompiledCompiledModelHost.Predict), which silently truncates the forward pass for these models.

On a ResNet-18 with [3, 32, 32] input, the compiled plan returns a shape-[64] tensor (first-conv-block's 64 channels) instead of the shape-[10] logits. Diagnostic:

Forward direct shape:        [10]  ✓
Predict (compile OFF) shape: [10]  ✓
Predict (compile ON) shape:  [64]  ✗  regression

This PR restores master's previous behavior — Predict calls Forward directly — in all 11 affected classes.

Root cause

The compiled-replay tracer does not capture shape-conditional control flow in Forward (rank-3 → rank-4 batch promotion, final Reshape, etc.). Fixing that is a larger infrastructure change; this hotfix closes the immediate CI failures so PR #1163 and PR #1165 can merge, and any future PR that merges master.

Once the tracer is hardened, the Predict overrides restored here can be removed.

Affected models

ResNetNetwork, VGGNetwork, EfficientNetNetwork, MobileNetV2Network, ConvolutionalNeuralNetwork, UNet3D, VoxelCNN, FastText, GloVe, Word2Vec, SiameseNeuralNetwork.

Test plan

  • All 4 ResNetNetworkTests.ResNetNetwork_*_ReturnsCorrectShape tests pass locally on net10.0
  • All 2 VGGNetworkTests.VGGNetwork_Predict_* tests pass locally on net10.0
  • CI: the full 08a NN-Classic shard turns green
  • CI: the ModelFamily shards (Classification, NN, Diffusion A-Z, Regression, Generated Layers) turn green

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Refactor
    • Updated inference across 11 neural-network models to use a single, direct prediction pathway. Predictions now consistently follow the model's direct execution route, bypassing the previous compiled-replay path to address shape-conditional inference differences. This standardizes runtime behavior across models and reduces discrepancies between eager and compiled execution.

@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 20, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

2 Skipped Deployments
Project Deployment Actions Updated (UTC)
aidotnet_website Ignored Ignored Apr 20, 2026 3:03am
aidotnet-playground-api Ignored Ignored Preview Apr 20, 2026 3:03am

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 20, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: ec5c0aae-322b-4249-b972-10d378462571

📥 Commits

Reviewing files that changed from the base of the PR and between 257c5fc and 98b7fcd.

📒 Files selected for processing (11)
  • src/NeuralNetworks/ConvolutionalNeuralNetwork.cs
  • src/NeuralNetworks/EfficientNetNetwork.cs
  • src/NeuralNetworks/FastText.cs
  • src/NeuralNetworks/GloVe.cs
  • src/NeuralNetworks/MobileNetV2Network.cs
  • src/NeuralNetworks/ResNetNetwork.cs
  • src/NeuralNetworks/SiameseNeuralNetwork.cs
  • src/NeuralNetworks/UNet3D.cs
  • src/NeuralNetworks/VGGNetwork.cs
  • src/NeuralNetworks/VoxelCNN.cs
  • src/NeuralNetworks/Word2Vec.cs

Walkthrough

Eleven neural-network classes changed their prediction override from protected PredictEager to public Predict, routing all calls to Forward(input) and bypassing the base class's compiled-plan / compiled-replay dispatch due to shape-conditional tracer issues.

Changes

Cohort / File(s) Summary
Conv/Classic CNNs
src/NeuralNetworks/ConvolutionalNeuralNetwork.cs, src/NeuralNetworks/VGGNetwork.cs, src/NeuralNetworks/ResNetNetwork.cs
Replaced protected override Tensor<T> PredictEager(...) with public override Tensor<T> Predict(...) => Forward(input) to bypass NeuralNetworkBase<T>.PredictCompiled; removed/updated XML docs describing compiled-replay routing.
Efficient / Mobile / Voxel / 3D
src/NeuralNetworks/EfficientNetNetwork.cs, src/NeuralNetworks/MobileNetV2Network.cs, src/NeuralNetworks/VoxelCNN.cs, src/NeuralNetworks/UNet3D.cs
Same override change to force direct Forward execution (hotfix comments added citing shape-conditional control flow).
Embedding / NLP models
src/NeuralNetworks/FastText.cs, src/NeuralNetworks/GloVe.cs, src/NeuralNetworks/Word2Vec.cs
Changed visibility/override from PredictEagerPredict, routed inference to Forward directly and replaced compiled-path docs with comments about shape mismatches.
Specialty networks
src/NeuralNetworks/SiameseNeuralNetwork.cs, src/NeuralNetworks/VGGNetwork.cs
Overridden public Predict to call Forward, removing the eager-only hook and bypassing compiled replay. (VGG also included doc cleanup.)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

⚠️ Production Readiness Concerns

BLOCKING: Systematically overriding the public Predict to short-circuit PredictCompiled is a hotfix that raises architectural and API-surface issues that must be addressed before merging:

  • Subclasses now break the intended base-class dispatch contract by bypassing compiled planning; this is an API/behavioral change and must be intentional and documented in the public API.
  • There is no remediation of the tracer or compiled-plan machinery; this defers a core infrastructure defect rather than fixing it.
  • Performance/optimization paths (compiled plans, caching) are permanently disabled for these models without a toggle/fallback, potentially regressing production performance.
  • The change may silently alter outputs for callers expecting compiled-path behavior (different intermediate shapes); migration guidance/tests are missing.
  • Any TODOs, stubs, or simplified comments that imply temporary fixes must be flagged and resolved prior to release.

Addressing these requires either: restoring a safe compiled fallback, surfacing a clear opt-out/opt-in, or fixing the tracer so compiled replay is reliable. Treat this as blocking until a design decision and corresponding tests/documentation are added.

Poem

🧩 Models once traced along the replay track,
Now stride straight through Forward — no turning back.
Shapes that fooled the tracer, left plans in a fog,
So Forward stands guard, a pragmatic cog.
⚙️ Correctness first; optimization may jog.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the core change: restoring public Predict overrides to bypass the compiled-replay path for affected neural network classes due to a tracer bug.
Docstring Coverage ✅ Passed Docstring coverage is 94.12% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/predict-override-restore

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/NeuralNetworks/MobileNetV2Network.cs`:
- Around line 292-298: The override Predict currently bypasses the compiled path
by calling Forward directly, which omits the shape-promotion/squeeze behavior
under compilation; modify MobileNetV2Network<T>.Predict to call the compiled
execution path when compilation is enabled (use
NeuralNetworkBase<T>.PredictCompiled or the existing compiled dispatcher) and
ensure that the compiled path applies the same 3D→4D promotion and final squeeze
as Forward so rank-3 inputs produce a final [numClasses] logits shape; update
the Predict implementation to detect compilation mode, route to PredictCompiled
with the same shape-transform logic, and keep Forward as the uncompiled
reference to avoid regression in compiled-mode tests.

In `@src/NeuralNetworks/UNet3D.cs`:
- Around line 239-246: Remove the duplicate XML <summary> on the Predict method
and replace it with a brief normal code comment explaining the hotfix: that
Predict calls Forward directly to avoid tracer issues observed when routing
through NeuralNetworkBase{T}.PredictCompiled; keep the method signature public
and its implementation Predict(Tensor<T> input) => Forward(input), but move the
rationale into a single-line or block comment (non-XML) so public API docs
remain the single user-facing summary in the earlier XML on the class.

In `@src/NeuralNetworks/VGGNetwork.cs`:
- Around line 342-350: The public XML <summary> on the Predict override
currently exposes tracer/compiled-replay internals; replace that XML doc with a
brief, user-facing summary (or remove it) and move the detailed regression
rationale into a regular non-XML code comment immediately above the Predict
override (e.g., referencing Predict, Forward, and
NeuralNetworkBase{T}.PredictCompiled) so the hotfix behavior (public override
Tensor<T> Predict(...) => Forward(...)) remains but implementation details are
not part of the public API docs.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: b69c6eb1-de0e-4504-b7e8-0b52cebcb245

📥 Commits

Reviewing files that changed from the base of the PR and between 4fe35b0 and 257c5fc.

📒 Files selected for processing (11)
  • src/NeuralNetworks/ConvolutionalNeuralNetwork.cs
  • src/NeuralNetworks/EfficientNetNetwork.cs
  • src/NeuralNetworks/FastText.cs
  • src/NeuralNetworks/GloVe.cs
  • src/NeuralNetworks/MobileNetV2Network.cs
  • src/NeuralNetworks/ResNetNetwork.cs
  • src/NeuralNetworks/SiameseNeuralNetwork.cs
  • src/NeuralNetworks/UNet3D.cs
  • src/NeuralNetworks/VGGNetwork.cs
  • src/NeuralNetworks/VoxelCNN.cs
  • src/NeuralNetworks/Word2Vec.cs

Comment thread src/NeuralNetworks/MobileNetV2Network.cs Outdated
Comment thread src/NeuralNetworks/UNet3D.cs Outdated
Comment thread src/NeuralNetworks/VGGNetwork.cs Outdated
@ooples
Copy link
Copy Markdown
Owner Author

ooples commented Apr 20, 2026

Upstream root-cause issue filed: ooples/AiDotNet.Tensors#228

This PR is a band-aid — the real bug is in CompiledModelCache.GetOrCompileInference + CompiledInferencePlan.Compile (the forward lambda's return is discarded; the plan picks optimizedSteps[last].OutputBuffer as the output and that heuristic fails when the forward ends with a metadata-view op like Reshape on contiguous data).

Once Tensors #228 is fixed and a new Tensors package is bumped here, the 11 public override Tensor<T> Predict(...) => Forward(input); overrides this PR restores should be reverted so the affected models pick up compiled replay again.

ooples pushed a commit that referenced this pull request Apr 20, 2026
Tensors 0.50.1 is the release the open compile-tracer bugs point to;
0.50.0 was the one I initially pinned but it's strictly a prerequisite
step. The API change that matters for us:

  GetOrCompileInference / GetOrCompileTraining accept Func<Tensor<T>>
  (return = plan output tensor) instead of Action (tail-op inferred).

The explicit return fixes the shape-conditional truncation that forced
PR #1167's Predict-bypass workaround: the tracer no longer guesses
which op produced the plan's output, so rank-3 -> rank-4 promotions
and post-forward reshapes no longer get silently dropped.

Three call sites updated to return the last tensor explicitly:
  src/AiModelBuilder.cs:1344           — return nnModel.Predict(input)
  src/Training/CompiledTapeTrainingStep.cs:139  — return computeLoss(...)
  src/Training/CompiledTapeTrainingStep.cs:269  — return computeLoss(...)

Follow-up (separate PR): revert the 11 Predict-override-restore sites
from PR #1167 back to PredictEager now that the tracer actually
captures their forward behaviour correctly.

Build: dotnet build -f net10.0 + net471 both green, 0 errors.
@ooples
Copy link
Copy Markdown
Owner Author

ooples commented Apr 20, 2026

Superseded by #1179 (Tensors 0.50.1 bump). The upstream tracer now captures shape-conditional control flow in Forward (rank-3 → rank-4 batch promotion, trailing Reshape) correctly, which was the root cause this workaround was papering over. No code change needed — the 11 models keep the PredictEager routing from PR #1155, and the compiled-replay path no longer truncates their output.

Closing this PR + deleting the branch.

@ooples ooples closed this Apr 20, 2026
@ooples ooples force-pushed the fix/predict-override-restore branch from 98b7fcd to 8ffb878 Compare April 20, 2026 13:37
@ooples
Copy link
Copy Markdown
Owner Author

ooples commented Apr 20, 2026

Auto-closed when I reset the branch to master. This is the intended outcome: Tensors 0.50.1 (bumped in #1179) fixes the compiled-replay tracer upstream, so the 11 models keep their PredictEager routing from #1155 and no code change is needed here. Workaround obsolete.

ooples added a commit that referenced this pull request Apr 21, 2026
* deps: bump AiDotNet.Tensors 0.46.1 -> 0.50.0 and AiDotNet.Native.OpenBLAS 0.28.0 -> 0.50.0

Unifies the AiDotNet ecosystem on 0.50.0:
  AiDotNet.Tensors              0.46.1 -> 0.50.0
  AiDotNet.Native.OneDNN        0.50.0 (already on 0.50.0)
  AiDotNet.Native.OpenBLAS      0.28.0 -> 0.50.0
  AiDotNet.Native.CLBlast       0.50.0 (already on 0.50.0)

Supersedes dependabot PRs #1170 (OpenBLAS) and #1171 (Tensors), which
dependabot auto-closed when the branches were rebased onto the current
master that already had OneDNN + CLBlast at 0.50.0. Bundling both into
one human-authored PR avoids the tamper-detection closure and keeps
all four native backends at the same release line.

Build: dotnet build -f net10.0 + net471 both green, 0 errors.

* deps: Tensors 0.50.0 -> 0.50.1 + adapt to Func<Tensor<T>> trace lambdas

Tensors 0.50.1 is the release the open compile-tracer bugs point to;
0.50.0 was the one I initially pinned but it's strictly a prerequisite
step. The API change that matters for us:

  GetOrCompileInference / GetOrCompileTraining accept Func<Tensor<T>>
  (return = plan output tensor) instead of Action (tail-op inferred).

The explicit return fixes the shape-conditional truncation that forced
PR #1167's Predict-bypass workaround: the tracer no longer guesses
which op produced the plan's output, so rank-3 -> rank-4 promotions
and post-forward reshapes no longer get silently dropped.

Three call sites updated to return the last tensor explicitly:
  src/AiModelBuilder.cs:1344           — return nnModel.Predict(input)
  src/Training/CompiledTapeTrainingStep.cs:139  — return computeLoss(...)
  src/Training/CompiledTapeTrainingStep.cs:269  — return computeLoss(...)

Follow-up (separate PR): revert the 11 Predict-override-restore sites
from PR #1167 back to PredictEager now that the tracer actually
captures their forward behaviour correctly.

Build: dotnet build -f net10.0 + net471 both green, 0 errors.

---------

Co-authored-by: franklinic <franklin@ivorycloud.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant