Fix ASan OOM in QDQ Gemm transformer tests (#28797)

tianleiwu · web-flow · commit d43a48bdbb50 · 2026-06-04T23:43:06.000Z
# PR: Fix ASan OOM in QDQ Gemm transformer tests ## Description PR #28131 ("Reject QDQ Gemm→QGemm fusion when alpha != 1 with bias") added `alpha_not_one` coverage to the QDQ Gemm fusion tests. This multiplied the number of `TransformerTester` session builds inside the already-large `Gemm_U8U8U8` test matrix and pushed the AddressSanitizer (ASan) build of `onnxruntime_test_all` over its allocator limit, causing the `windows_x64_asan` CI to fail with `AddressSanitizer: Out of memory. The process has exhausted 8192MB for size class 8192`. This PR isolates the `alpha != 1` coverage into small, dedicated tests so the peak memory of any single test is reduced. ## Summary of Changes | File | Change | |------|--------| | `onnxruntime/test/optimizer/qdq_transformer_test.cc` | Added an `opset_version` parameter to `QDQTransformerGemmTests` (default `0` = run opsets 12/18/19); replaced the three hardcoded `TransformerTester` calls with a loop over the selected opset(s); removed the inline `alpha_not_one` block from the templated `QDQTransformerGemmTests()`; added a dedicated `TEST(QDQTransformerTests, Gemm_AlphaNotOne_U8U8U8)` that runs only uint8/uint8/uint8 at opset 19. | | `onnxruntime/test/optimizer/qdq_transformer_fastmath_test.cc` | Same refactor for the fastmath variant; added `TEST(QDQTransformerTests, Gemm_AlphaNotOne_U8U8U8_FastMath)` running only uint8/uint8/uint8 at opset 19. | The net effect is that the incremental `alpha_not_one` work added by #28131 drops from 24 session builds (4 alpha variants × 3 opsets, in each of the regular and fastmath files) to 8, and is no longer part of the large `Gemm_U8U8U8` matrix test — directly lowering the peak memory consumed in a single test. ## Testing - Build with `--enable_address_sanitizer` (the `windows_x64_asan` configuration) and run `onnxruntime_test_all`; confirm `QDQTransformerTests.Gemm_AlphaNotOne_U8U8U8`, `QDQTransformerTests.Gemm_AlphaNotOne_U8U8U8_FastMath`, and `QDQTransformerTests.Gemm_U8U8U8` pass and the suite no longer hits the ASan OOM. - Fusion behavior is unchanged: the same `alpha != 1` rejection logic is still exercised, just with a narrower datatype/opset footprint. ## Motivation and Context The ASan failure is the sanitizer's internal allocator size-class limit (8 GB for `size class 8192`), not a runner RAM cap that can simply be raised. Loosening it via `ASAN_OPTIONS` quarantine tuning would weaken the sanitizer's bug-detection guarantees, so the fix targets the test's memory footprint instead. ### Options considered 1. **`--test_parallel` (reduce CTest concurrency).** Lower the parallelism in the ASan workflow (e.g., `--test_parallel 1`) so fewer test binaries run concurrently. This only addresses cumulative/overlapping process memory; it does **not** reduce the peak memory of a single test, and it slows the CI down for every run. Rejected as a blunt, non-durable workaround. 2. **Shard the ASan tests.** Split `onnxruntime_test_all` into N gtest shards (`GTEST_TOTAL_SHARDS` / `GTEST_SHARD_INDEX`) so the ASan allocator resets between shards. This helps with cumulative growth across the whole binary, but it still does **not** reduce the peak memory of any individual test — if one test alone approaches the limit, sharding the binary will not help. Rejected for the same root-cause reason. 3. **Break the test into smaller tests (chosen).** Isolate the `alpha != 1` coverage into dedicated tests that run a single datatype (uint8/uint8/uint8) at a single opset (19), and remove the alpha cases from the large `Gemm_U8U8U8` matrix. This reduces the work done in the heaviest single test and addresses the peak-memory problem at its source while keeping the same fusion behavior under test. Reference: PR #28131 (merge commit `585273033e`). ## Checklist - [x] Tests added/updated - [ ] Documentation updated (if applicable) - [x] No breaking changes (test-only change) - [ ] CI passes
diff --git a/onnxruntime/test/optimizer/qdq_transformer_fastmath_test.cc b/onnxruntime/test/optimizer/qdq_transformer_fastmath_test.cc
@@ -324,7 +324,7 @@ TEST(QDQTransformerTests, MatMul_S8S8U8_DisableFastMath) {
 
 template <typename Input1Type, typename Input2Type, typename OutputType, typename BiasType = int32_t>
 void QDQTransformerGemmTests(bool has_output_q, bool has_bias, bool beta_not_one = false,
-                             bool disable_fastmath = false, bool alpha_not_one = false) {
+                             bool disable_fastmath = false, bool alpha_not_one = false, int opset_version = 0) {
   auto test_case = [&](const std::vector<int64_t>& input1_shape, const std::vector<int64_t>& input2_shape,
                        bool use_contrib_qdq = false) {
     auto build_test_case = [&](ModelTestBuilder& builder) {
@@ -435,33 +435,19 @@ void QDQTransformerGemmTests(bool has_output_q, bool has_bias, bool beta_not_one
           kOrtSessionOptionsMlasGemmFastMathArm64Bfloat16, "1"));
     };
 
-    TransformerTester(build_test_case,
-                      check_binary_op_graph,
-                      TransformerLevel::Level1,
-                      TransformerLevel::Level2,
-                      12 /*opset_version*/,
-                      NAN /*per_sample_tolerance*/,
-                      NAN /*relative_per_sample_tolerance*/,
-                      std::make_unique<QDQSelectorActionTransformer>(QDQIsInt8Allowed()),
-                      add_session_options);
-    TransformerTester(build_test_case,
-                      check_binary_op_graph,
-                      TransformerLevel::Level1,
-                      TransformerLevel::Level2,
-                      18 /*opset_version*/,
-                      NAN /*per_sample_tolerance*/,
-                      NAN /*relative_per_sample_tolerance*/,
-                      std::make_unique<QDQSelectorActionTransformer>(QDQIsInt8Allowed()),
-                      add_session_options);
-    TransformerTester(build_test_case,
-                      check_binary_op_graph,
-                      TransformerLevel::Level1,
-                      TransformerLevel::Level2,
-                      19 /*opset_version*/,
-                      NAN /*per_sample_tolerance*/,
-                      NAN /*relative_per_sample_tolerance*/,
-                      std::make_unique<QDQSelectorActionTransformer>(QDQIsInt8Allowed()),
-                      add_session_options);
+    const auto opset_versions = opset_version == 0 ? std::vector<int>{12, 18, 19}
+                                                   : std::vector<int>{opset_version};
+    for (int current_opset_version : opset_versions) {
+      TransformerTester(build_test_case,
+                        check_binary_op_graph,
+                        TransformerLevel::Level1,
+                        TransformerLevel::Level2,
+                        current_opset_version,
+                        NAN /*per_sample_tolerance*/,
+                        NAN /*relative_per_sample_tolerance*/,
+                        std::make_unique<QDQSelectorActionTransformer>(QDQIsInt8Allowed()),
+                        add_session_options);
+    }
 
     if (disable_fastmath) {
       auto add_session_options = [&](SessionOptions& so) {
@@ -498,17 +484,18 @@ void QDQTransformerGemmTests() {
   QDQTransformerGemmTests<Input1Type, Input2Type, OutputType, BiasType>(false, true, true);
   QDQTransformerGemmTests<Input1Type, Input2Type, OutputType, BiasType>(true, false, true);
   QDQTransformerGemmTests<Input1Type, Input2Type, OutputType, BiasType>(true, true, true);
-  if constexpr (std::is_same_v<Input1Type, uint8_t> && std::is_same_v<Input2Type, uint8_t> &&
-                std::is_same_v<OutputType, uint8_t> && std::is_same_v<BiasType, int32_t>) {
-    QDQTransformerGemmTests<Input1Type, Input2Type, OutputType, BiasType>(false, false, false, false, true);
-    QDQTransformerGemmTests<Input1Type, Input2Type, OutputType, BiasType>(false, true, false, false, true);
-    QDQTransformerGemmTests<Input1Type, Input2Type, OutputType, BiasType>(true, false, false, false, true);
-    QDQTransformerGemmTests<Input1Type, Input2Type, OutputType, BiasType>(true, true, false, false, true);
-  }
   // dummy test to disable the fastmath session
   QDQTransformerGemmTests<Input1Type, Input2Type, OutputType, BiasType>(true, true, true, true);
 }
 
+TEST(QDQTransformerTests, Gemm_AlphaNotOne_U8U8U8_FastMath) {
+  constexpr int opset_version = 19;
+  QDQTransformerGemmTests<uint8_t, uint8_t, uint8_t>(false, false, false, false, true, opset_version);
+  QDQTransformerGemmTests<uint8_t, uint8_t, uint8_t>(false, true, false, false, true, opset_version);
+  QDQTransformerGemmTests<uint8_t, uint8_t, uint8_t>(true, false, false, false, true, opset_version);
+  QDQTransformerGemmTests<uint8_t, uint8_t, uint8_t>(true, true, false, false, true, opset_version);
+}
+
 TEST(QDQTransformerTests, Gemm_U8U8U8_FastMath) {
   QDQTransformerGemmTests<uint8_t, uint8_t, uint8_t>();
   QDQTransformerGemmTests<uint8_t, uint8_t, uint8_t, uint8_t>();
diff --git a/onnxruntime/test/optimizer/qdq_transformer_test.cc b/onnxruntime/test/optimizer/qdq_transformer_test.cc
@@ -719,7 +719,7 @@ TEST(QDQTransformerTests, MatMul_S8S8U8) {
 
 template <typename Input1Type, typename Input2Type, typename OutputType, typename BiasType = int32_t>
 void QDQTransformerGemmTests(bool has_output_q, bool has_bias, bool beta_not_one = false,
-                             bool alpha_not_one = false) {
+                             bool alpha_not_one = false, int opset_version = 0) {
   auto test_case = [&](const std::vector<int64_t>& input1_shape, const std::vector<int64_t>& input2_shape,
                        bool use_contrib_qdq = false) {
     auto build_test_case = [&](ModelTestBuilder& builder) {
@@ -825,30 +825,18 @@ void QDQTransformerGemmTests(bool has_output_q, bool has_bias, bool beta_not_one
       }
     };
 
-    TransformerTester(build_test_case,
-                      check_binary_op_graph,
-                      TransformerLevel::Level1,
-                      TransformerLevel::Level2,
-                      12 /*opset_version*/,
-                      0.01 /*per_sample_tolerance*/,
-                      0.01 /*relative_per_sample_tolerance*/,
-                      std::make_unique<QDQSelectorActionTransformer>(QDQIsInt8Allowed()));
-    TransformerTester(build_test_case,
-                      check_binary_op_graph,
-                      TransformerLevel::Level1,
-                      TransformerLevel::Level2,
-                      18 /*opset_version*/,
-                      0.01 /*per_sample_tolerance*/,
-                      0.01 /*relative_per_sample_tolerance*/,
-                      std::make_unique<QDQSelectorActionTransformer>(QDQIsInt8Allowed()));
-    TransformerTester(build_test_case,
-                      check_binary_op_graph,
-                      TransformerLevel::Level1,
-                      TransformerLevel::Level2,
-                      19 /*opset_version*/,
-                      0.01 /*per_sample_tolerance*/,
-                      0.01 /*relative_per_sample_tolerance*/,
-                      std::make_unique<QDQSelectorActionTransformer>(QDQIsInt8Allowed()));
+    const auto opset_versions = opset_version == 0 ? std::vector<int>{12, 18, 19}
+                                                   : std::vector<int>{opset_version};
+    for (int current_opset_version : opset_versions) {
+      TransformerTester(build_test_case,
+                        check_binary_op_graph,
+                        TransformerLevel::Level1,
+                        TransformerLevel::Level2,
+                        current_opset_version,
+                        0.01 /*per_sample_tolerance*/,
+                        0.01 /*relative_per_sample_tolerance*/,
+                        std::make_unique<QDQSelectorActionTransformer>(QDQIsInt8Allowed()));
+    }
   };
 
   test_case({2, 2}, {2, 4});
@@ -868,13 +856,14 @@ void QDQTransformerGemmTests() {
   QDQTransformerGemmTests<Input1Type, Input2Type, OutputType, BiasType>(false, true, true);
   QDQTransformerGemmTests<Input1Type, Input2Type, OutputType, BiasType>(true, false, true);
   QDQTransformerGemmTests<Input1Type, Input2Type, OutputType, BiasType>(true, true, true);
-  if constexpr (std::is_same_v<Input1Type, uint8_t> && std::is_same_v<Input2Type, uint8_t> &&
-                std::is_same_v<OutputType, uint8_t> && std::is_same_v<BiasType, int32_t>) {
-    QDQTransformerGemmTests<Input1Type, Input2Type, OutputType, BiasType>(false, false, false, true);
-    QDQTransformerGemmTests<Input1Type, Input2Type, OutputType, BiasType>(false, true, false, true);
-    QDQTransformerGemmTests<Input1Type, Input2Type, OutputType, BiasType>(true, false, false, true);
-    QDQTransformerGemmTests<Input1Type, Input2Type, OutputType, BiasType>(true, true, false, true);
-  }
+}
+
+TEST(QDQTransformerTests, Gemm_AlphaNotOne_U8U8U8) {
+  constexpr int opset_version = 19;
+  QDQTransformerGemmTests<uint8_t, uint8_t, uint8_t>(false, false, false, true, opset_version);
+  QDQTransformerGemmTests<uint8_t, uint8_t, uint8_t>(false, true, false, true, opset_version);
+  QDQTransformerGemmTests<uint8_t, uint8_t, uint8_t>(true, false, false, true, opset_version);
+  QDQTransformerGemmTests<uint8_t, uint8_t, uint8_t>(true, true, false, true, opset_version);
 }
 
 TEST(QDQTransformerTests, Gemm_U8U8U8) {