fix(quantization): validate bias scale in QDQ Conv → QLinearConv fusion#28229
fix(quantization): validate bias scale in QDQ Conv → QLinearConv fusion#28229Rishi-Dave wants to merge 6 commits into
Conversation
ONNX QLinearConv reuses the int32 bias values directly without re-scaling, which requires bias_scale[i] == x_scale * w_scale[i] for each output channel. Previously ConvNodeGroupSelector::Check did not verify this invariant, so a QDQ model whose bias DQ used an arbitrary scale (e.g. produced by a non-conformant quantizer) would be silently fused into QLinearConv with wrong outputs. Add a static helper CheckConvBiasScale() that reads the three scale initializers, handles both per-tensor (scalar) and per-channel weight scales, and rejects the fusion when the tolerance check |b_scale - x_scale*w_scale| > atol + rtol*|x_scale*w_scale| (atol=1e-6, rtol=1e-2) fails for any channel. If any scale is not a constant initializer the helper returns false conservatively. Add two Python test cases in TestConvBiasScaleValidation: - test_mismatched_bias_scale_skips_fusion: verifies that optimized and unoptimized sessions agree when bias_scale is 2x the correct value, proving fusion was safely skipped. - test_matching_bias_scale_allows_fusion: verifies the same agreement when bias_scale is exact, ensuring no regression on valid models. Fixes microsoft#24711
There was a problem hiding this comment.
Pull request overview
This PR hardens the QDQ Conv → QLinearConv fusion in the QDQ transformer by validating that the bias dequantization scale conforms to the ONNX QLinearConv requirement (bias_scale == x_scale * w_scale[i]), preventing silent wrong results when models use non-canonical bias scales.
Changes:
- Add
CheckConvBiasScalevalidation toConvNodeGroupSelector::Checkto skip fusion when bias scale does not matchx_scale * w_scale[i]within tolerance. - Add Python tests that run the same model with optimizations enabled vs disabled to detect bias-scale-related corruption.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
onnxruntime/core/optimizer/qdq_transformer/selectors_actions/qdq_selectors.cc |
Adds bias scale validation gate to prevent incorrect QDQ Conv fusion into QLinearConv. |
onnxruntime/test/python/quantization/test_qdq.py |
Adds regression tests for matched vs mismatched bias scales under different optimization levels. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…scale check CheckConvBiasScale derived num_channels from w_scales.size() alone, which is 1 for per-tensor weight scale even when bias_scale is a per-channel vector. That caused valid graphs (per-channel bias with scalar weight scale) to be rejected and only one channel to be validated when num_channels was inferred wrongly. Use the larger of w_num/b_num for num_channels and broadcast either scale across channels when scalar. Reject only when both are vectors with mismatched lengths. Also extend the matching/mismatched-bias-scale tests to save the optimized graph and assert QLinearConv presence/absence directly via op counts, so the tests fail fast if fusion silently disables for an unrelated reason.
tianleiwu
left a comment
There was a problem hiding this comment.
Thanks for tightening this fusion path. I found one remaining correctness gap in the same bias-DQ area: the scale is now validated, but the bias zero point is still discarded by the QLinearConv rewrite, so a matching-scale model with nonzero bias zero point can still be fused into silent wrong output.
…fusion CheckConvBiasScale only validated the bias dequantize scale, but ConvMoves discards both the bias DQ scale and zero-point during the rewrite to QLinearConv (which has no bias_zp input and assumes implicit zero). A model with matching b_scale = x_scale * w_scale but nonzero b_zp would therefore be silently fused into a graph computing bias_q * b_scale instead of the spec-correct (bias_q - b_zp) * b_scale, producing shifted outputs. Add a sibling helper CheckConvBiasZeroPoint that, when the optional bias zero-point input is present and constant, requires it to be int32 with every element zero. Absent or empty-name inputs are treated as zero (matching the DequantizeLinear default). Non-constant or non-int32 zero points are conservatively rejected so fusion is skipped rather than producing silently wrong results. Wire the new check into ConvNodeGroupSelector::Check immediately after the scale check, only on the bias-DQ branch (dq_nodes.size() == 3). Add a regression test test_nonzero_bias_zp_skips_fusion that builds a conformant-scale graph with b_zp=1, asserts the optimized graph contains no QLinearConv (the primary structural guarantee), and additionally compares optimized vs unoptimized outputs as a defense-in-depth check.
|
Thanks for catching the bias zero-point gap. Pushed Regression test |
tianleiwu
left a comment
There was a problem hiding this comment.
The latest push addresses all prior review concerns:
- Bias scale validation (
CheckConvBiasScale): correctly handles scalar vs per-channel broadcasting for both weight and bias scales, with conservative early-return when initializers are non-constant. - Bias zero-point validation (
CheckConvBiasZeroPoint): rejects fusion when the bias DQ zero-point is present but not a constant int32 zero — closing the last known silent-corruption path. - Test coverage: three well-structured tests covering mismatched scale, matching scale, and nonzero zero-point, each with both structural (op count) and numeric assertions.
One minor nit on the tolerance comment (see inline). LGTM otherwise.
Concern addressed in latest push: CheckConvBiasZeroPoint now validates bias zero-point. Superseded by APPROVE review.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Address review feedback on the QDQ Conv -> QLinearConv fusion validator: - Replace the inaccurate 'matching convention in optimizer/utils.cc' comment in CheckConvBiasScale. The actual helper optimizer_utils::IsInitializerWithExpectedValue uses atol=1e-8 / rtol=1e-5, which is tighter than the tolerances used here. Document why looser tolerances are intentional (bias_scale = input_scale * weight_scale accumulates fp rounding from upstream tooling). - Add a clarifying comment in CheckConvBiasZeroPoint explaining that fusion is skipped when the bias DequantizeLinear has a nonzero zero-point, since QLinearConv has no bias_zero_point input. Comments only; no behavior change.
|
Thanks for the careful read. Addressed both points in 2c2e3ed (comments only — no behavior change):
|
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…te scales Address copilot-pull-request-reviewer feedback: harden the QDQ Conv -> QLinearConv fusion gating against malformed initializers. - CheckConvBiasScale: return false when x_scale, w_scale, or b_scale initializers are empty (would otherwise out-of-bounds on x_scales[0], w_scales[i], or b_scales[i]). - CheckConvBiasScale: reject non-finite scale values via std::isfinite. NaN compares unequal to itself so the tolerance check could pass or fail unpredictably; Inf arithmetic is similarly unreliable. Be conservative and refuse fusion. - CheckConvBiasZeroPoint: reject empty zero-point initializers rather than silently treating an empty span as "all zeros". Include <cmath> for std::isfinite.
|
Pushed 37f30de addressing the four review points from the latest pass:
|
tianleiwu
left a comment
There was a problem hiding this comment.
Summary
The fix is correct, well-defended, and addresses a real silent-corruption bug. Previous review concerns (empty initializer guard, NaN/Inf rejection, empty zero-point check, tolerance comment accuracy) have all been addressed in the latest push.
Positive highlights:
- Conservative fusion-gate design (returns
falsewhen verification is uncertain). - Proper handling of per-tensor vs per-channel broadcasting.
- Good edge-case coverage: empty spans, non-finite values, absent zero-point.
- Clear comments explaining the intentionally looser tolerance.
CheckConvBiasZeroPointcorrectly identifies that QLinearConv has nobias_zero_pointinput.
Two non-blocking suggestions inline.
| const float expected = x_scale * w_scale; | ||
| if (std::abs(b_scale - expected) > (atol + rtol * std::abs(expected))) { | ||
| return false; | ||
| } |
There was a problem hiding this comment.
Nit: x_scale is loop-invariant — the std::isfinite(x_scale) check could be hoisted above the loop to avoid redundant evaluation on every channel iteration. Not a correctness issue (optimizer-time cost is negligible), just a clarity improvement:
if (!std::isfinite(x_scale)) {
return false;
}
for (size_t i = 0; i < num_channels; ++i) {
const float w_scale = (w_num == 1) ? w_scales[0] : w_scales[i];
const float b_scale = (b_num == 1) ? b_scales[0] : b_scales[i];
if (!std::isfinite(w_scale) || !std::isfinite(b_scale)) {
return false;
}
...
}| """ | ||
| inp_shape = [1, 1, 4, 4] | ||
| weight_shape = [1, 1, 1, 1] | ||
|
|
There was a problem hiding this comment.
Suggestion: all three test cases use weight_shape = [1, 1, 1, 1] (single output channel). Consider adding a per-channel test (e.g., weight_shape = [4, 1, 1, 1] with a per-channel w_scale array and matching/mismatched per-channel b_scale) to exercise the broadcasting loop in CheckConvBiasScale. This would also complement the existing C++ tests in onnxruntime/test/optimizer/qdq_transformer_test.cc which test per-channel weight quantization.
|
Many CI pipeline failed. Please take a look. |
|
Thanks for the approval, @tianleiwu. On the CI failures — they span Linux/Windows/macOS/Android/WebGPU/WASM across unrelated EPs (CUDA, TensorRT, OpenVINO, QNN, DML, XNNPACK), which is consistent with an upstream main breakage rather than anything in this two-file change. The diff only touches On the two nits: I can address both in a follow-up — hoist the |
…er-channel coverage - Hoist isfinite(x_scale) above the per-channel loop in CheckConvBiasScale so it is evaluated once per Conv node rather than once per output channel. - Update bias scales in QDQ_Selector_Test_ConvClip / ConvClipNonScalar / Conv_Relu test cases so b_scale == x_scale * w_scale, matching the new QLinearConv fusion invariant. - Regenerate qdq_conv.onnx and qdq_convs.onnx with Scale=1.0 (opset 13 pinned) so the shared scale in those static fixtures also satisfies b_scale == x_scale * w_scale. - Add a per-channel test in test_qdq.py exercising a non-finite element in a 1-D w_scale to ensure fusion is rejected when any channel's scale is non-finite.
|
Thanks for the review. Pushed Nit 1 ( Nit 2 ( CI failures — root cause was test fixtures that violated the new invariant
One open question: the committed |
|
@Rishi-Dave, see Ci failures. |
|
Thanks @tianleiwu. Took a closer look — the failures appear unrelated to this PR's diff:
Could you re-trigger the failed jobs when convenient? Happy to dig into any specific check if you'd like. |
@Rishi-Dave, I re-triggered the jobs by reopening the PR. It seems that not help. Could you try merge main branch? |
Summary
CheckConvBiasScalevalidator insideConvNodeGroupSelector::CheckMotivation
Fixes #24711.
The ONNX QLinearConv spec requires the int32 bias to use scale
x_scale × w_scale[i]so the fused kernel can reuse it directly. The current QDQ selector only verifies the bias dtype is INT32 — it never checks that the bias DQ's scale satisfies this relationship. When a model is constructed with an arbitrary bias scale (e.g. user-supplied or from a non-canonical quantizer), the selector still fuses the subgraph and the QLinearConv kernel produces silently wrong outputs atORT_ENABLE_EXTENDEDand above on CPU EP. CUDA and disabled-optimization paths produce correct results, making the bug particularly hard to diagnose.Changes
onnxruntime/core/optimizer/qdq_transformer/selectors_actions/qdq_selectors.cc: addCheckConvBiasScalestatic helper. Returnsfalse(skip fusion) when:x_scaleis not a scalar / 1-element rank-1 tensorb_scalelength is neither 1 nornum_channelsx_scale × w_scale[i]by more thanatol=1e-6 + rtol=1e-2 × |expected|onnxruntime/test/python/quantization/test_qdq.py: newTestConvBiasScaleValidationclass with two cases — mismatched bias scale (asserts optimized output matches unoptimized) and matching bias scale (asserts correctness preserved when fusion is allowed).Test Plan
python -m pytest onnxruntime/test/python/quantization/test_qdq.py::TestConvBiasScaleValidation -vverify_quantize_convfamily) should continue to pass — fusion is unchanged for canonical quantizer-produced models where bias_scale equals input_scale × weight_scale exactly.ORT_ENABLE_ALLoutput now matchesORT_DISABLE_ALL.