fix(neuralnetworks): restore Predict override bypass for compiled-replay bug#1167
fix(neuralnetworks): restore Predict override bypass for compiled-replay bug#1167
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub. 2 Skipped Deployments
|
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (11)
WalkthroughEleven neural-network classes changed their prediction override from protected Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
|
| Check name | Status | Explanation |
|---|---|---|
| Description Check | ✅ Passed | Check skipped - CodeRabbit’s high-level summary is enabled. |
| Title check | ✅ Passed | The title accurately describes the core change: restoring public Predict overrides to bypass the compiled-replay path for affected neural network classes due to a tracer bug. |
| Docstring Coverage | ✅ Passed | Docstring coverage is 94.12% which is sufficient. The required threshold is 80.00%. |
✏️ Tip: You can configure your own custom pre-merge checks in the settings.
✨ Finishing Touches
📝 Generate docstrings
- Create stacked PR
- Commit on current branch
🧪 Generate unit tests (beta)
- Create PR with unit tests
- Commit unit tests in branch
fix/predict-override-restore
Comment @coderabbitai help to get the list of available commands and usage tips.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/NeuralNetworks/MobileNetV2Network.cs`:
- Around line 292-298: The override Predict currently bypasses the compiled path
by calling Forward directly, which omits the shape-promotion/squeeze behavior
under compilation; modify MobileNetV2Network<T>.Predict to call the compiled
execution path when compilation is enabled (use
NeuralNetworkBase<T>.PredictCompiled or the existing compiled dispatcher) and
ensure that the compiled path applies the same 3D→4D promotion and final squeeze
as Forward so rank-3 inputs produce a final [numClasses] logits shape; update
the Predict implementation to detect compilation mode, route to PredictCompiled
with the same shape-transform logic, and keep Forward as the uncompiled
reference to avoid regression in compiled-mode tests.
In `@src/NeuralNetworks/UNet3D.cs`:
- Around line 239-246: Remove the duplicate XML <summary> on the Predict method
and replace it with a brief normal code comment explaining the hotfix: that
Predict calls Forward directly to avoid tracer issues observed when routing
through NeuralNetworkBase{T}.PredictCompiled; keep the method signature public
and its implementation Predict(Tensor<T> input) => Forward(input), but move the
rationale into a single-line or block comment (non-XML) so public API docs
remain the single user-facing summary in the earlier XML on the class.
In `@src/NeuralNetworks/VGGNetwork.cs`:
- Around line 342-350: The public XML <summary> on the Predict override
currently exposes tracer/compiled-replay internals; replace that XML doc with a
brief, user-facing summary (or remove it) and move the detailed regression
rationale into a regular non-XML code comment immediately above the Predict
override (e.g., referencing Predict, Forward, and
NeuralNetworkBase{T}.PredictCompiled) so the hotfix behavior (public override
Tensor<T> Predict(...) => Forward(...)) remains but implementation details are
not part of the public API docs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: b69c6eb1-de0e-4504-b7e8-0b52cebcb245
📒 Files selected for processing (11)
src/NeuralNetworks/ConvolutionalNeuralNetwork.cssrc/NeuralNetworks/EfficientNetNetwork.cssrc/NeuralNetworks/FastText.cssrc/NeuralNetworks/GloVe.cssrc/NeuralNetworks/MobileNetV2Network.cssrc/NeuralNetworks/ResNetNetwork.cssrc/NeuralNetworks/SiameseNeuralNetwork.cssrc/NeuralNetworks/UNet3D.cssrc/NeuralNetworks/VGGNetwork.cssrc/NeuralNetworks/VoxelCNN.cssrc/NeuralNetworks/Word2Vec.cs
|
Upstream root-cause issue filed: ooples/AiDotNet.Tensors#228 This PR is a band-aid — the real bug is in Once Tensors #228 is fixed and a new Tensors package is bumped here, the 11 |
Tensors 0.50.1 is the release the open compile-tracer bugs point to; 0.50.0 was the one I initially pinned but it's strictly a prerequisite step. The API change that matters for us: GetOrCompileInference / GetOrCompileTraining accept Func<Tensor<T>> (return = plan output tensor) instead of Action (tail-op inferred). The explicit return fixes the shape-conditional truncation that forced PR #1167's Predict-bypass workaround: the tracer no longer guesses which op produced the plan's output, so rank-3 -> rank-4 promotions and post-forward reshapes no longer get silently dropped. Three call sites updated to return the last tensor explicitly: src/AiModelBuilder.cs:1344 — return nnModel.Predict(input) src/Training/CompiledTapeTrainingStep.cs:139 — return computeLoss(...) src/Training/CompiledTapeTrainingStep.cs:269 — return computeLoss(...) Follow-up (separate PR): revert the 11 Predict-override-restore sites from PR #1167 back to PredictEager now that the tracer actually captures their forward behaviour correctly. Build: dotnet build -f net10.0 + net471 both green, 0 errors.
|
Superseded by #1179 (Tensors 0.50.1 bump). The upstream tracer now captures shape-conditional control flow in Forward (rank-3 → rank-4 batch promotion, trailing Reshape) correctly, which was the root cause this workaround was papering over. No code change needed — the 11 models keep the Closing this PR + deleting the branch. |
98b7fcd to
8ffb878
Compare
* deps: bump AiDotNet.Tensors 0.46.1 -> 0.50.0 and AiDotNet.Native.OpenBLAS 0.28.0 -> 0.50.0 Unifies the AiDotNet ecosystem on 0.50.0: AiDotNet.Tensors 0.46.1 -> 0.50.0 AiDotNet.Native.OneDNN 0.50.0 (already on 0.50.0) AiDotNet.Native.OpenBLAS 0.28.0 -> 0.50.0 AiDotNet.Native.CLBlast 0.50.0 (already on 0.50.0) Supersedes dependabot PRs #1170 (OpenBLAS) and #1171 (Tensors), which dependabot auto-closed when the branches were rebased onto the current master that already had OneDNN + CLBlast at 0.50.0. Bundling both into one human-authored PR avoids the tamper-detection closure and keeps all four native backends at the same release line. Build: dotnet build -f net10.0 + net471 both green, 0 errors. * deps: Tensors 0.50.0 -> 0.50.1 + adapt to Func<Tensor<T>> trace lambdas Tensors 0.50.1 is the release the open compile-tracer bugs point to; 0.50.0 was the one I initially pinned but it's strictly a prerequisite step. The API change that matters for us: GetOrCompileInference / GetOrCompileTraining accept Func<Tensor<T>> (return = plan output tensor) instead of Action (tail-op inferred). The explicit return fixes the shape-conditional truncation that forced PR #1167's Predict-bypass workaround: the tracer no longer guesses which op produced the plan's output, so rank-3 -> rank-4 promotions and post-forward reshapes no longer get silently dropped. Three call sites updated to return the last tensor explicitly: src/AiModelBuilder.cs:1344 — return nnModel.Predict(input) src/Training/CompiledTapeTrainingStep.cs:139 — return computeLoss(...) src/Training/CompiledTapeTrainingStep.cs:269 — return computeLoss(...) Follow-up (separate PR): revert the 11 Predict-override-restore sites from PR #1167 back to PredictEager now that the tracer actually captures their forward behaviour correctly. Build: dotnet build -f net10.0 + net471 both green, 0 errors. --------- Co-authored-by: franklinic <franklin@ivorycloud.com>
Summary
PR #1155 changed
public override Tensor<T> Predict(Tensor<T> input) => Forward(input)toprotected override Tensor<T> PredictEager(Tensor<T> input) => Forward(input)in 11 network classes. That reroutesPredictthrough the compiled-replay path (NeuralNetworkBase.PredictCompiled→CompiledModelHost.Predict), which silently truncates the forward pass for these models.On a ResNet-18 with
[3, 32, 32]input, the compiled plan returns a shape-[64]tensor (first-conv-block's 64 channels) instead of the shape-[10]logits. Diagnostic:This PR restores master's previous behavior —
PredictcallsForwarddirectly — in all 11 affected classes.Root cause
The compiled-replay tracer does not capture shape-conditional control flow in
Forward(rank-3 → rank-4 batch promotion, finalReshape, etc.). Fixing that is a larger infrastructure change; this hotfix closes the immediate CI failures so PR #1163 and PR #1165 can merge, and any future PR that merges master.Once the tracer is hardened, the
Predictoverrides restored here can be removed.Affected models
ResNetNetwork, VGGNetwork, EfficientNetNetwork, MobileNetV2Network, ConvolutionalNeuralNetwork, UNet3D, VoxelCNN, FastText, GloVe, Word2Vec, SiameseNeuralNetwork.
Test plan
ResNetNetworkTests.ResNetNetwork_*_ReturnsCorrectShapetests pass locally onnet10.0VGGNetworkTests.VGGNetwork_Predict_*tests pass locally onnet10.0🤖 Generated with Claude Code
Summary by CodeRabbit