fix(quantization): validate bias scale in QDQ Conv → QLinearConv fusion by Rishi-Dave · Pull Request #28229 · microsoft/onnxruntime

Rishi-Dave · 2026-04-25T11:39:25Z

Summary

Add CheckConvBiasScale validator inside ConvNodeGroupSelector::Check
Skip QDQ Conv → QLinearConv fusion when bias DQ scale ≠ input_scale × weight_scale (within 1% relative tolerance)
Adds Python test coverage for both matching and mismatched bias scales

Motivation

The ONNX QLinearConv spec requires the int32 bias to use scale x_scale × w_scale[i] so the fused kernel can reuse it directly. The current QDQ selector only verifies the bias dtype is INT32 — it never checks that the bias DQ's scale satisfies this relationship. When a model is constructed with an arbitrary bias scale (e.g. user-supplied or from a non-canonical quantizer), the selector still fuses the subgraph and the QLinearConv kernel produces silently wrong outputs at ORT_ENABLE_EXTENDED and above on CPU EP. CUDA and disabled-optimization paths produce correct results, making the bug particularly hard to diagnose.

Changes

onnxruntime/core/optimizer/qdq_transformer/selectors_actions/qdq_selectors.cc: add CheckConvBiasScale static helper. Returns false (skip fusion) when:
- any of x/w/b scales is not a constant initializer
- any scale dtype is not float32
- x_scale is not a scalar / 1-element rank-1 tensor
- b_scale length is neither 1 nor num_channels
- any per-channel bias scale differs from x_scale × w_scale[i] by more than atol=1e-6 + rtol=1e-2 × |expected|
onnxruntime/test/python/quantization/test_qdq.py: new TestConvBiasScaleValidation class with two cases — mismatched bias scale (asserts optimized output matches unoptimized) and matching bias scale (asserts correctness preserved when fusion is allowed).

Test Plan

python -m pytest onnxruntime/test/python/quantization/test_qdq.py::TestConvBiasScaleValidation -v
Existing QDQ Conv tests (verify_quantize_conv family) should continue to pass — fusion is unchanged for canonical quantizer-produced models where bias_scale equals input_scale × weight_scale exactly.
Reproduce the issue with the model from Conv + Q/DQ produces bad output in CPUExecutionProvider #24711 and confirm CPU ORT_ENABLE_ALL output now matches ORT_DISABLE_ALL.

ONNX QLinearConv reuses the int32 bias values directly without re-scaling, which requires bias_scale[i] == x_scale * w_scale[i] for each output channel. Previously ConvNodeGroupSelector::Check did not verify this invariant, so a QDQ model whose bias DQ used an arbitrary scale (e.g. produced by a non-conformant quantizer) would be silently fused into QLinearConv with wrong outputs. Add a static helper CheckConvBiasScale() that reads the three scale initializers, handles both per-tensor (scalar) and per-channel weight scales, and rejects the fusion when the tolerance check |b_scale - x_scale*w_scale| > atol + rtol*|x_scale*w_scale| (atol=1e-6, rtol=1e-2) fails for any channel. If any scale is not a constant initializer the helper returns false conservatively. Add two Python test cases in TestConvBiasScaleValidation: - test_mismatched_bias_scale_skips_fusion: verifies that optimized and unoptimized sessions agree when bias_scale is 2x the correct value, proving fusion was safely skipped. - test_matching_bias_scale_allows_fusion: verifies the same agreement when bias_scale is exact, ensuring no regression on valid models. Fixes microsoft#24711

Copilot

Pull request overview

This PR hardens the QDQ Conv → QLinearConv fusion in the QDQ transformer by validating that the bias dequantization scale conforms to the ONNX QLinearConv requirement (bias_scale == x_scale * w_scale[i]), preventing silent wrong results when models use non-canonical bias scales.

Changes:

Add CheckConvBiasScale validation to ConvNodeGroupSelector::Check to skip fusion when bias scale does not match x_scale * w_scale[i] within tolerance.
Add Python tests that run the same model with optimizations enabled vs disabled to detect bias-scale-related corruption.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
`onnxruntime/core/optimizer/qdq_transformer/selectors_actions/qdq_selectors.cc`	Adds bias scale validation gate to prevent incorrect QDQ Conv fusion into QLinearConv.
`onnxruntime/test/python/quantization/test_qdq.py`	Adds regression tests for matched vs mismatched bias scales under different optimization levels.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…scale check CheckConvBiasScale derived num_channels from w_scales.size() alone, which is 1 for per-tensor weight scale even when bias_scale is a per-channel vector. That caused valid graphs (per-channel bias with scalar weight scale) to be rejected and only one channel to be validated when num_channels was inferred wrongly. Use the larger of w_num/b_num for num_channels and broadcast either scale across channels when scalar. Reject only when both are vectors with mismatched lengths. Also extend the matching/mismatched-bias-scale tests to save the optimized graph and assert QLinearConv presence/absence directly via op counts, so the tests fail fast if fusion silently disables for an unrelated reason.

tianleiwu

Thanks for tightening this fusion path. I found one remaining correctness gap in the same bias-DQ area: the scale is now validated, but the bias zero point is still discarded by the QLinearConv rewrite, so a matching-scale model with nonzero bias zero point can still be fused into silent wrong output.

…fusion CheckConvBiasScale only validated the bias dequantize scale, but ConvMoves discards both the bias DQ scale and zero-point during the rewrite to QLinearConv (which has no bias_zp input and assumes implicit zero). A model with matching b_scale = x_scale * w_scale but nonzero b_zp would therefore be silently fused into a graph computing bias_q * b_scale instead of the spec-correct (bias_q - b_zp) * b_scale, producing shifted outputs. Add a sibling helper CheckConvBiasZeroPoint that, when the optional bias zero-point input is present and constant, requires it to be int32 with every element zero. Absent or empty-name inputs are treated as zero (matching the DequantizeLinear default). Non-constant or non-int32 zero points are conservatively rejected so fusion is skipped rather than producing silently wrong results. Wire the new check into ConvNodeGroupSelector::Check immediately after the scale check, only on the bias-DQ branch (dq_nodes.size() == 3). Add a regression test test_nonzero_bias_zp_skips_fusion that builds a conformant-scale graph with b_zp=1, asserts the optimized graph contains no QLinearConv (the primary structural guarantee), and additionally compares optimized vs unoptimized outputs as a defense-in-depth check.

Rishi-Dave · 2026-05-03T12:58:29Z

Thanks for catching the bias zero-point gap. Pushed 0f7900d adding CheckConvBiasZeroPoint in qdq_selectors.cc, invoked from ConvNodeGroupSelector::Check right after CheckConvBiasScale. The helper rejects fusion unless the bias DQ's zero-point input is absent/empty (implicit 0) or a constant int32 initializer with every element equal to 0 — covering per-tensor and per-channel cases. Non-constant or non-int32 zero points are conservatively rejected.

Regression test TestConvBiasScaleValidation::test_nonzero_bias_zp_skips_fusion in test/python/quantization/test_qdq.py builds a QDQ Conv with matching b_scale == x_scale * w_scale but b_zp = 1 and asserts the optimized graph contains zero QLinearConv nodes. Happy to fold followups (stronger numeric assertion, positive control) into this PR if preferred.

tianleiwu

The latest push addresses all prior review concerns:

Bias scale validation (CheckConvBiasScale): correctly handles scalar vs per-channel broadcasting for both weight and bias scales, with conservative early-return when initializers are non-constant.
Bias zero-point validation (CheckConvBiasZeroPoint): rejects fusion when the bias DQ zero-point is present but not a constant int32 zero — closing the last known silent-corruption path.
Test coverage: three well-structured tests covering mismatched scale, matching scale, and nonzero zero-point, each with both structural (op count) and numeric assertions.

One minor nit on the tolerance comment (see inline). LGTM otherwise.

Concern addressed in latest push: CheckConvBiasZeroPoint now validates bias zero-point. Superseded by APPROVE review.

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Address review feedback on the QDQ Conv -> QLinearConv fusion validator: - Replace the inaccurate 'matching convention in optimizer/utils.cc' comment in CheckConvBiasScale. The actual helper optimizer_utils::IsInitializerWithExpectedValue uses atol=1e-8 / rtol=1e-5, which is tighter than the tolerances used here. Document why looser tolerances are intentional (bias_scale = input_scale * weight_scale accumulates fp rounding from upstream tooling). - Add a clarifying comment in CheckConvBiasZeroPoint explaining that fusion is skipped when the bias DequantizeLinear has a nonzero zero-point, since QLinearConv has no bias_zero_point input. Comments only; no behavior change.

Rishi-Dave · 2026-05-04T11:27:57Z

Thanks for the careful read. Addressed both points in 2c2e3ed (comments only — no behavior change):

Tolerance comment in CheckConvBiasScale. You're right that "matching convention in optimizer/utils.cc" was misleading — optimizer_utils::IsInitializerWithExpectedValue uses atol=1e-8, rtol=1e-5, which is tighter than what we use here. Replaced the comment to make it explicit that the looser tolerances (atol=1e-6, rtol=1e-2) are intentional: bias_scale is computed as input_scale * weight_scale and tends to accumulate fp rounding from upstream quantization tooling, so a tighter check would reject otherwise valid fusions. Kept the existing helper untouched since its tolerances aren't appropriate for this case.
Bias zero-point gating. Added an inline comment above the zero-point block in CheckConvBiasZeroPoint noting that fusion assumes symmetrically-quantized bias and is skipped on a nonzero zero_point because QLinearConv has no bias_zero_point input. Will mirror this in the PR description as well.

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…te scales Address copilot-pull-request-reviewer feedback: harden the QDQ Conv -> QLinearConv fusion gating against malformed initializers. - CheckConvBiasScale: return false when x_scale, w_scale, or b_scale initializers are empty (would otherwise out-of-bounds on x_scales[0], w_scales[i], or b_scales[i]). - CheckConvBiasScale: reject non-finite scale values via std::isfinite. NaN compares unequal to itself so the tolerance check could pass or fail unpredictably; Inf arithmetic is similarly unreliable. Be conservative and refuse fusion. - CheckConvBiasZeroPoint: reject empty zero-point initializers rather than silently treating an empty span as "all zeros". Include <cmath> for std::isfinite.

Rishi-Dave · 2026-05-05T11:44:18Z

Pushed 37f30de addressing the four review points from the latest pass:

CheckConvBiasScale: added empty-span guards on x_scales, w_scales, b_scales before any indexing — prevents the out-of-bounds path you flagged at the DataAsSpan<float>() call sites.
CheckConvBiasScale: added std::isfinite checks on x_scale, w_scale[i], b_scale[i] inside the per-channel loop, before the atol + rtol * |expected| tolerance comparison. NaN/Inf scales now conservatively refuse fusion instead of producing an unpredictable comparison result.
CheckConvBiasZeroPoint: extracted the int32 span into zp_values and reject when empty — an empty initializer is no longer silently treated as "all zeros".
Included <cmath> for std::isfinite.

lintrunner -a clean on the touched file. PR title/description updated separately to reflect the broadened input validation.

tianleiwu

Summary

The fix is correct, well-defended, and addresses a real silent-corruption bug. Previous review concerns (empty initializer guard, NaN/Inf rejection, empty zero-point check, tolerance comment accuracy) have all been addressed in the latest push.

Positive highlights:

Conservative fusion-gate design (returns false when verification is uncertain).
Proper handling of per-tensor vs per-channel broadcasting.
Good edge-case coverage: empty spans, non-finite values, absent zero-point.
Clear comments explaining the intentionally looser tolerance.
CheckConvBiasZeroPoint correctly identifies that QLinearConv has no bias_zero_point input.

Two non-blocking suggestions inline.

tianleiwu · 2026-05-20T16:38:16Z

+    const float expected = x_scale * w_scale;
+    if (std::abs(b_scale - expected) > (atol + rtol * std::abs(expected))) {
+      return false;
+    }


Nit: x_scale is loop-invariant — the std::isfinite(x_scale) check could be hoisted above the loop to avoid redundant evaluation on every channel iteration. Not a correctness issue (optimizer-time cost is negligible), just a clarity improvement:

if (!std::isfinite(x_scale)) { return false; } for (size_t i = 0; i < num_channels; ++i) { const float w_scale = (w_num == 1) ? w_scales[0] : w_scales[i]; const float b_scale = (b_num == 1) ? b_scales[0] : b_scales[i]; if (!std::isfinite(w_scale) || !std::isfinite(b_scale)) { return false; } ... }

tianleiwu · 2026-05-20T16:38:16Z

+        """
+        inp_shape = [1, 1, 4, 4]
+        weight_shape = [1, 1, 1, 1]
+


Suggestion: all three test cases use weight_shape = [1, 1, 1, 1] (single output channel). Consider adding a per-channel test (e.g., weight_shape = [4, 1, 1, 1] with a per-channel w_scale array and matching/mismatched per-channel b_scale) to exercise the broadcasting loop in CheckConvBiasScale. This would also complement the existing C++ tests in onnxruntime/test/optimizer/qdq_transformer_test.cc which test per-channel weight quantization.

tianleiwu · 2026-05-20T21:40:54Z

Many CI pipeline failed. Please take a look.

Rishi-Dave · 2026-05-21T11:34:33Z

Thanks for the approval, @tianleiwu. On the CI failures — they span Linux/Windows/macOS/Android/WebGPU/WASM across unrelated EPs (CUDA, TensorRT, OpenVINO, QNN, DML, XNNPACK), which is consistent with an upstream main breakage rather than anything in this two-file change. The diff only touches qdq_selectors.cc (new CheckConvBiasScale) and test_qdq.py. Could a maintainer trigger a fresh run once upstream is green? Happy to dig in if any failing job actually exercises the QDQ optimizer path.

On the two nits: I can address both in a follow-up — hoist the std::isfinite(x_scale) guard above the per-channel loop in CheckConvBiasScale, and add a per-channel test variant (weight_shape=[4,1,1,1] with 4-element w_scale/b_scale) to exercise the broadcasting branch. Let me know if you'd prefer those folded into this PR or kept as a separate change.

…er-channel coverage - Hoist isfinite(x_scale) above the per-channel loop in CheckConvBiasScale so it is evaluated once per Conv node rather than once per output channel. - Update bias scales in QDQ_Selector_Test_ConvClip / ConvClipNonScalar / Conv_Relu test cases so b_scale == x_scale * w_scale, matching the new QLinearConv fusion invariant. - Regenerate qdq_conv.onnx and qdq_convs.onnx with Scale=1.0 (opset 13 pinned) so the shared scale in those static fixtures also satisfies b_scale == x_scale * w_scale. - Add a per-channel test in test_qdq.py exercising a non-finite element in a 1-D w_scale to ensure fusion is rejected when any channel's scale is non-finite.

Rishi-Dave · 2026-05-22T11:48:28Z

Thanks for the review. Pushed 6d1fa1a addressing all three items:

Nit 1 (qdq_selectors.cc) — hoisted the std::isfinite(x_scale) guard above the per-channel loop; only w_scale / b_scale are checked per iteration now.

Nit 2 (test_qdq.py) — added test_per_channel_nonfinite_w_scale_skips_fusion with weight_shape=[4,1,1,1] and a 1-D w_scale containing a NaN, asserting fusion is rejected.

CI failures — root cause was test fixtures that violated the new invariant b_scale ≈ x_scale * w_scale:

qdq_conv.onnx / qdq_convs.onnx shared Scale=256.0 for x/w/b (off by 256×). Regenerated with Scale=1.0, opset 13 pinned.
QDQ_Selector_Test_ConvClip{,NonScalar} and Conv_Relu used b_scale=0.007 against 0.02348 * 0.307 ≈ 0.00720836 (~2.9% off, exceeded the 1% rtol). Updated to the exact product.

One open question: the committed .ort artifacts under runtime_optimization/ were not regenerated (requires a full C++ ORT build with save config). If CI still flags GraphRuntimeOptimizationTest.QDQConv after this push due to byte-level .ort mismatch, please let me know and I'll coordinate on regenerating them.

tianleiwu · 2026-05-22T22:00:32Z

@Rishi-Dave, see Ci failures.

Rishi-Dave · 2026-05-23T11:14:03Z

Thanks @tianleiwu. Took a closer look — the failures appear unrelated to this PR's diff:

The failing jobs are spread across Android, Linux arm64, Windows CUDA/DML/TensorRT, WebGPU, CoreML, WASM, QNN, OpenVINO, and XNNPACK — i.e. EPs and platforms that don't touch the QDQ Conv → QLinearConv selector path.
lintrunner, Python format, CodeQL, Validation, and the minimal-build checks are all green.
One Linux x64 run did show a QDQTransformerTests.QDQ_Selector_Test failure, but that was on the GitHub auto-generated PR-merge commit, which seems to have pulled in the pre-fix qdq_conv.onnx (Scale=256.0) from upstream during the synthetic merge. On the branch HEAD (6d1fa1a), the fixture is regenerated with Scale=1.0 so bias_scale == x_scale * w_scale holds and the selector accepts the match — sibling builder-based tests (ConvClip, ConvClipNonScalar, Conv_Relu) confirm the path.

Could you re-trigger the failed jobs when convenient? Happy to dig into any specific check if you'd like.

tianleiwu · 2026-05-26T15:10:02Z

Could you re-trigger the failed jobs when convenient? Happy to dig into any specific check if you'd like.

@Rishi-Dave, I re-triggered the jobs by reopening the PR. It seems that not help. Could you try merge main branch?

tianleiwu requested a review from Copilot April 25, 2026 16:42

Copilot started reviewing on behalf of tianleiwu April 25, 2026 16:43 View session

Copilot AI reviewed Apr 25, 2026

View reviewed changes

Comment thread onnxruntime/core/optimizer/qdq_transformer/selectors_actions/qdq_selectors.cc

Comment thread onnxruntime/test/python/quantization/test_qdq.py

tianleiwu previously requested changes May 3, 2026

View reviewed changes

Comment thread onnxruntime/core/optimizer/qdq_transformer/selectors_actions/qdq_selectors.cc

tianleiwu previously approved these changes May 3, 2026

View reviewed changes

Comment thread onnxruntime/core/optimizer/qdq_transformer/selectors_actions/qdq_selectors.cc Outdated

tianleiwu requested a review from Copilot May 3, 2026 17:22

Copilot started reviewing on behalf of tianleiwu May 3, 2026 17:23 View session

Copilot AI reviewed May 3, 2026

View reviewed changes

Comment thread onnxruntime/core/optimizer/qdq_transformer/selectors_actions/qdq_selectors.cc Outdated

Comment thread onnxruntime/core/optimizer/qdq_transformer/selectors_actions/qdq_selectors.cc

Rishi-Dave dismissed tianleiwu’s stale review via 2c2e3ed May 4, 2026 11:27

tianleiwu requested a review from Copilot May 5, 2026 01:06

Copilot started reviewing on behalf of tianleiwu May 5, 2026 01:07 View session

Copilot AI reviewed May 5, 2026

View reviewed changes

tianleiwu previously approved these changes May 20, 2026

View reviewed changes

Rishi-Dave dismissed tianleiwu’s stale review via 6d1fa1a May 22, 2026 11:48

tianleiwu closed this May 25, 2026

tianleiwu reopened this May 25, 2026

Conversation

Rishi-Dave commented Apr 25, 2026

Summary

Motivation

Changes

Test Plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

tianleiwu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Rishi-Dave commented May 3, 2026

Uh oh!

tianleiwu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Rishi-Dave commented May 4, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Rishi-Dave commented May 5, 2026

Uh oh!

tianleiwu left a comment

Choose a reason for hiding this comment

Summary

Uh oh!

tianleiwu May 20, 2026

Choose a reason for hiding this comment

Uh oh!

tianleiwu May 20, 2026

Choose a reason for hiding this comment

Uh oh!

tianleiwu commented May 20, 2026

Uh oh!

Rishi-Dave commented May 21, 2026

Uh oh!

Rishi-Dave commented May 22, 2026

Uh oh!

tianleiwu commented May 22, 2026

Uh oh!

Rishi-Dave commented May 23, 2026

Uh oh!

tianleiwu commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants