Skip to content

Commit d43a48b

Browse files
authored
Fix ASan OOM in QDQ Gemm transformer tests (#28797)
# PR: Fix ASan OOM in QDQ Gemm transformer tests ## Description PR #28131 ("Reject QDQ Gemm→QGemm fusion when alpha != 1 with bias") added `alpha_not_one` coverage to the QDQ Gemm fusion tests. This multiplied the number of `TransformerTester` session builds inside the already-large `Gemm_U8U8U8` test matrix and pushed the AddressSanitizer (ASan) build of `onnxruntime_test_all` over its allocator limit, causing the `windows_x64_asan` CI to fail with `AddressSanitizer: Out of memory. The process has exhausted 8192MB for size class 8192`. This PR isolates the `alpha != 1` coverage into small, dedicated tests so the peak memory of any single test is reduced. ## Summary of Changes | File | Change | |------|--------| | `onnxruntime/test/optimizer/qdq_transformer_test.cc` | Added an `opset_version` parameter to `QDQTransformerGemmTests` (default `0` = run opsets 12/18/19); replaced the three hardcoded `TransformerTester` calls with a loop over the selected opset(s); removed the inline `alpha_not_one` block from the templated `QDQTransformerGemmTests()`; added a dedicated `TEST(QDQTransformerTests, Gemm_AlphaNotOne_U8U8U8)` that runs only uint8/uint8/uint8 at opset 19. | | `onnxruntime/test/optimizer/qdq_transformer_fastmath_test.cc` | Same refactor for the fastmath variant; added `TEST(QDQTransformerTests, Gemm_AlphaNotOne_U8U8U8_FastMath)` running only uint8/uint8/uint8 at opset 19. | The net effect is that the incremental `alpha_not_one` work added by #28131 drops from 24 session builds (4 alpha variants × 3 opsets, in each of the regular and fastmath files) to 8, and is no longer part of the large `Gemm_U8U8U8` matrix test — directly lowering the peak memory consumed in a single test. ## Testing - Build with `--enable_address_sanitizer` (the `windows_x64_asan` configuration) and run `onnxruntime_test_all`; confirm `QDQTransformerTests.Gemm_AlphaNotOne_U8U8U8`, `QDQTransformerTests.Gemm_AlphaNotOne_U8U8U8_FastMath`, and `QDQTransformerTests.Gemm_U8U8U8` pass and the suite no longer hits the ASan OOM. - Fusion behavior is unchanged: the same `alpha != 1` rejection logic is still exercised, just with a narrower datatype/opset footprint. ## Motivation and Context The ASan failure is the sanitizer's internal allocator size-class limit (8 GB for `size class 8192`), not a runner RAM cap that can simply be raised. Loosening it via `ASAN_OPTIONS` quarantine tuning would weaken the sanitizer's bug-detection guarantees, so the fix targets the test's memory footprint instead. ### Options considered 1. **`--test_parallel` (reduce CTest concurrency).** Lower the parallelism in the ASan workflow (e.g., `--test_parallel 1`) so fewer test binaries run concurrently. This only addresses cumulative/overlapping process memory; it does **not** reduce the peak memory of a single test, and it slows the CI down for every run. Rejected as a blunt, non-durable workaround. 2. **Shard the ASan tests.** Split `onnxruntime_test_all` into N gtest shards (`GTEST_TOTAL_SHARDS` / `GTEST_SHARD_INDEX`) so the ASan allocator resets between shards. This helps with cumulative growth across the whole binary, but it still does **not** reduce the peak memory of any individual test — if one test alone approaches the limit, sharding the binary will not help. Rejected for the same root-cause reason. 3. **Break the test into smaller tests (chosen).** Isolate the `alpha != 1` coverage into dedicated tests that run a single datatype (uint8/uint8/uint8) at a single opset (19), and remove the alpha cases from the large `Gemm_U8U8U8` matrix. This reduces the work done in the heaviest single test and addresses the peak-memory problem at its source while keeping the same fusion behavior under test. Reference: PR #28131 (merge commit `585273033e`). ## Checklist - [x] Tests added/updated - [ ] Documentation updated (if applicable) - [x] No breaking changes (test-only change) - [ ] CI passes
1 parent b3e1a9e commit d43a48b

2 files changed

Lines changed: 43 additions & 67 deletions

File tree

onnxruntime/test/optimizer/qdq_transformer_fastmath_test.cc

Lines changed: 22 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -324,7 +324,7 @@ TEST(QDQTransformerTests, MatMul_S8S8U8_DisableFastMath) {
324324

325325
template <typename Input1Type, typename Input2Type, typename OutputType, typename BiasType = int32_t>
326326
void QDQTransformerGemmTests(bool has_output_q, bool has_bias, bool beta_not_one = false,
327-
bool disable_fastmath = false, bool alpha_not_one = false) {
327+
bool disable_fastmath = false, bool alpha_not_one = false, int opset_version = 0) {
328328
auto test_case = [&](const std::vector<int64_t>& input1_shape, const std::vector<int64_t>& input2_shape,
329329
bool use_contrib_qdq = false) {
330330
auto build_test_case = [&](ModelTestBuilder& builder) {
@@ -435,33 +435,19 @@ void QDQTransformerGemmTests(bool has_output_q, bool has_bias, bool beta_not_one
435435
kOrtSessionOptionsMlasGemmFastMathArm64Bfloat16, "1"));
436436
};
437437

438-
TransformerTester(build_test_case,
439-
check_binary_op_graph,
440-
TransformerLevel::Level1,
441-
TransformerLevel::Level2,
442-
12 /*opset_version*/,
443-
NAN /*per_sample_tolerance*/,
444-
NAN /*relative_per_sample_tolerance*/,
445-
std::make_unique<QDQSelectorActionTransformer>(QDQIsInt8Allowed()),
446-
add_session_options);
447-
TransformerTester(build_test_case,
448-
check_binary_op_graph,
449-
TransformerLevel::Level1,
450-
TransformerLevel::Level2,
451-
18 /*opset_version*/,
452-
NAN /*per_sample_tolerance*/,
453-
NAN /*relative_per_sample_tolerance*/,
454-
std::make_unique<QDQSelectorActionTransformer>(QDQIsInt8Allowed()),
455-
add_session_options);
456-
TransformerTester(build_test_case,
457-
check_binary_op_graph,
458-
TransformerLevel::Level1,
459-
TransformerLevel::Level2,
460-
19 /*opset_version*/,
461-
NAN /*per_sample_tolerance*/,
462-
NAN /*relative_per_sample_tolerance*/,
463-
std::make_unique<QDQSelectorActionTransformer>(QDQIsInt8Allowed()),
464-
add_session_options);
438+
const auto opset_versions = opset_version == 0 ? std::vector<int>{12, 18, 19}
439+
: std::vector<int>{opset_version};
440+
for (int current_opset_version : opset_versions) {
441+
TransformerTester(build_test_case,
442+
check_binary_op_graph,
443+
TransformerLevel::Level1,
444+
TransformerLevel::Level2,
445+
current_opset_version,
446+
NAN /*per_sample_tolerance*/,
447+
NAN /*relative_per_sample_tolerance*/,
448+
std::make_unique<QDQSelectorActionTransformer>(QDQIsInt8Allowed()),
449+
add_session_options);
450+
}
465451

466452
if (disable_fastmath) {
467453
auto add_session_options = [&](SessionOptions& so) {
@@ -498,17 +484,18 @@ void QDQTransformerGemmTests() {
498484
QDQTransformerGemmTests<Input1Type, Input2Type, OutputType, BiasType>(false, true, true);
499485
QDQTransformerGemmTests<Input1Type, Input2Type, OutputType, BiasType>(true, false, true);
500486
QDQTransformerGemmTests<Input1Type, Input2Type, OutputType, BiasType>(true, true, true);
501-
if constexpr (std::is_same_v<Input1Type, uint8_t> && std::is_same_v<Input2Type, uint8_t> &&
502-
std::is_same_v<OutputType, uint8_t> && std::is_same_v<BiasType, int32_t>) {
503-
QDQTransformerGemmTests<Input1Type, Input2Type, OutputType, BiasType>(false, false, false, false, true);
504-
QDQTransformerGemmTests<Input1Type, Input2Type, OutputType, BiasType>(false, true, false, false, true);
505-
QDQTransformerGemmTests<Input1Type, Input2Type, OutputType, BiasType>(true, false, false, false, true);
506-
QDQTransformerGemmTests<Input1Type, Input2Type, OutputType, BiasType>(true, true, false, false, true);
507-
}
508487
// dummy test to disable the fastmath session
509488
QDQTransformerGemmTests<Input1Type, Input2Type, OutputType, BiasType>(true, true, true, true);
510489
}
511490

491+
TEST(QDQTransformerTests, Gemm_AlphaNotOne_U8U8U8_FastMath) {
492+
constexpr int opset_version = 19;
493+
QDQTransformerGemmTests<uint8_t, uint8_t, uint8_t>(false, false, false, false, true, opset_version);
494+
QDQTransformerGemmTests<uint8_t, uint8_t, uint8_t>(false, true, false, false, true, opset_version);
495+
QDQTransformerGemmTests<uint8_t, uint8_t, uint8_t>(true, false, false, false, true, opset_version);
496+
QDQTransformerGemmTests<uint8_t, uint8_t, uint8_t>(true, true, false, false, true, opset_version);
497+
}
498+
512499
TEST(QDQTransformerTests, Gemm_U8U8U8_FastMath) {
513500
QDQTransformerGemmTests<uint8_t, uint8_t, uint8_t>();
514501
QDQTransformerGemmTests<uint8_t, uint8_t, uint8_t, uint8_t>();

onnxruntime/test/optimizer/qdq_transformer_test.cc

Lines changed: 21 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -719,7 +719,7 @@ TEST(QDQTransformerTests, MatMul_S8S8U8) {
719719

720720
template <typename Input1Type, typename Input2Type, typename OutputType, typename BiasType = int32_t>
721721
void QDQTransformerGemmTests(bool has_output_q, bool has_bias, bool beta_not_one = false,
722-
bool alpha_not_one = false) {
722+
bool alpha_not_one = false, int opset_version = 0) {
723723
auto test_case = [&](const std::vector<int64_t>& input1_shape, const std::vector<int64_t>& input2_shape,
724724
bool use_contrib_qdq = false) {
725725
auto build_test_case = [&](ModelTestBuilder& builder) {
@@ -825,30 +825,18 @@ void QDQTransformerGemmTests(bool has_output_q, bool has_bias, bool beta_not_one
825825
}
826826
};
827827

828-
TransformerTester(build_test_case,
829-
check_binary_op_graph,
830-
TransformerLevel::Level1,
831-
TransformerLevel::Level2,
832-
12 /*opset_version*/,
833-
0.01 /*per_sample_tolerance*/,
834-
0.01 /*relative_per_sample_tolerance*/,
835-
std::make_unique<QDQSelectorActionTransformer>(QDQIsInt8Allowed()));
836-
TransformerTester(build_test_case,
837-
check_binary_op_graph,
838-
TransformerLevel::Level1,
839-
TransformerLevel::Level2,
840-
18 /*opset_version*/,
841-
0.01 /*per_sample_tolerance*/,
842-
0.01 /*relative_per_sample_tolerance*/,
843-
std::make_unique<QDQSelectorActionTransformer>(QDQIsInt8Allowed()));
844-
TransformerTester(build_test_case,
845-
check_binary_op_graph,
846-
TransformerLevel::Level1,
847-
TransformerLevel::Level2,
848-
19 /*opset_version*/,
849-
0.01 /*per_sample_tolerance*/,
850-
0.01 /*relative_per_sample_tolerance*/,
851-
std::make_unique<QDQSelectorActionTransformer>(QDQIsInt8Allowed()));
828+
const auto opset_versions = opset_version == 0 ? std::vector<int>{12, 18, 19}
829+
: std::vector<int>{opset_version};
830+
for (int current_opset_version : opset_versions) {
831+
TransformerTester(build_test_case,
832+
check_binary_op_graph,
833+
TransformerLevel::Level1,
834+
TransformerLevel::Level2,
835+
current_opset_version,
836+
0.01 /*per_sample_tolerance*/,
837+
0.01 /*relative_per_sample_tolerance*/,
838+
std::make_unique<QDQSelectorActionTransformer>(QDQIsInt8Allowed()));
839+
}
852840
};
853841

854842
test_case({2, 2}, {2, 4});
@@ -868,13 +856,14 @@ void QDQTransformerGemmTests() {
868856
QDQTransformerGemmTests<Input1Type, Input2Type, OutputType, BiasType>(false, true, true);
869857
QDQTransformerGemmTests<Input1Type, Input2Type, OutputType, BiasType>(true, false, true);
870858
QDQTransformerGemmTests<Input1Type, Input2Type, OutputType, BiasType>(true, true, true);
871-
if constexpr (std::is_same_v<Input1Type, uint8_t> && std::is_same_v<Input2Type, uint8_t> &&
872-
std::is_same_v<OutputType, uint8_t> && std::is_same_v<BiasType, int32_t>) {
873-
QDQTransformerGemmTests<Input1Type, Input2Type, OutputType, BiasType>(false, false, false, true);
874-
QDQTransformerGemmTests<Input1Type, Input2Type, OutputType, BiasType>(false, true, false, true);
875-
QDQTransformerGemmTests<Input1Type, Input2Type, OutputType, BiasType>(true, false, false, true);
876-
QDQTransformerGemmTests<Input1Type, Input2Type, OutputType, BiasType>(true, true, false, true);
877-
}
859+
}
860+
861+
TEST(QDQTransformerTests, Gemm_AlphaNotOne_U8U8U8) {
862+
constexpr int opset_version = 19;
863+
QDQTransformerGemmTests<uint8_t, uint8_t, uint8_t>(false, false, false, true, opset_version);
864+
QDQTransformerGemmTests<uint8_t, uint8_t, uint8_t>(false, true, false, true, opset_version);
865+
QDQTransformerGemmTests<uint8_t, uint8_t, uint8_t>(true, false, false, true, opset_version);
866+
QDQTransformerGemmTests<uint8_t, uint8_t, uint8_t>(true, true, false, true, opset_version);
878867
}
879868

880869
TEST(QDQTransformerTests, Gemm_U8U8U8) {

0 commit comments

Comments
 (0)