Bump iOS XCTest timeout for ExecuTorchLLMTests (#19354)

psiddh · web-flow · commit 0ee31fc1e928 · 2026-05-07T09:14:24.000-07:00
Summary:
The 13 XCTestCase methods in
`xplat/executorch/extension/llm/apple:ExecuTorchLLMTests`
(testLLaMA, testPhi4, testGemma, testLLaVA, testVoxtral and their
reset variants) regularly hit the 1800-second per-test ceiling
enforced by `fbobjc/Tools/xctest_runner` for the `long_running`
label. LLM inference on iOS-sim CPU (1B-class models,
128-768 token sequences, each test calls `generate()` twice)
routinely exceeds 30 minutes per test method, producing spurious
"Test timed out after 1800 seconds" flakes on the test-issues
dashboard for owner `ai_infra_mobile_platform`.

Per the runner formula
`TEST_CASE_TIMEOUT(60s) * label_multiplier * 3`:

| label          | multiplier | per-XCTestCase budget |
|----------------|-----------:|----------------------:|
| long_running   |        x10 |                 1800s |
| glacial (here) |        x30 |                 5400s |

Switching to `glacial` (the highest tier supported by the runner)
gives each test 90 minutes. Adding
`test_test_rule_timeout_ms = 14400000` sets the bundle-level
wall-clock budget to 4h, which is comfortable headroom for ~5
testcases at 90 min each plus xctest setup/teardown.

Note: this diff is unrelated to T269848646. T269848646 tracks a
separate cluster of 446 iOS-sim test-run *cancellations*
(`duration: 0.00`, "test execution was cancelled because the test
run was cancelled") that is owned by testinfra and is not
addressed here.

Reviewed By: shoumikhin

Differential Revision: D104147313
diff --git a/extension/llm/apple/BUCK b/extension/llm/apple/BUCK
@@ -16,7 +16,17 @@ non_fbcode_target(_kind = fb_apple_library,
     ],
     sdks = IOS,
     visibility = EXECUTORCH_CLIENTS,
-    test_labels = ["long_running"],
+    # `glacial` raises the per-XCTestCase timeout from 1800s -> 5400s (90 min)
+    # via fbobjc/Tools/xctest_runner: TEST_CASE_TIMEOUT(60s) * 30 * 3.
+    # Required because LLM inference (LLaMA, Phi4, Gemma, LLaVA, Voxtral)
+    # on iOS-sim CPU regularly exceeds 30 minutes for a full forward pass.
+    test_labels = ["glacial"],
+    # Rule-level wall-clock for the whole auto-generated test bundle:
+    # ExecuTorchLLMTests currently contains 13 XCTestCase methods, and
+    # individual methods can exceed 30 minutes on iOS-sim CPU. This 4h
+    # budget is intended as the total bundle/shard wall-clock, including
+    # xctest setup/teardown overhead; it is not based on "5 testcases".
+    test_test_rule_timeout_ms = 14400000,
     test_deps = [
         ":ExecuTorchLLMTestResource",
         "//xplat/executorch/backends/xnnpack:xnnpack_backendApple",