Skip to content

Commit 8de1414

Browse files
authored
fix(test-plans): mitigate scheduled e2e-autotest flakiness (#1622)
* fix(test-plans): mitigate scheduled e2e-autotest flakiness Triage of the last 8 scheduled e2e-autotest runs identified three failure categories: a real plan bug, LLM screenshot-based false downgrades, and real timing flakes. This change addresses all three. Category A — real plan bug * java-pack-help-center-webview was missing vscjava.vscode-java-pack from setup.extensions. On scheduled runs (no PR VSIX) java.welcome was unregistered and the open-help-center step silently timed out. This was the #1 failure across the last 8 nightly runs (7/8). Now installs the pack from the marketplace on schedule runs while still letting --vsix override on PR runs. Category B — LLM downgrade noise on ls-ready * Add skipLlmVerify: true (introduced in @vscjava/vscode-autotest 0.7.5) to every ls-ready step that has no structured verify* field. The waitForLanguageServer action is itself the authoritative deterministic check; the LLM was downgrading these whenever the status bar still showed background indexing ("Java: Searching... 0%"), even though the LS was fully functional. Affected: java-dependency-viewer, java-extension-pack, java-fresh-import, java-maven-resolve-type, java-maven, java-new-file-snippet, java-single-file, java-webview-migration. Category C — real timing flakes * java-test-runner: bump wait-test-discovery from 45s to 90s (the vscode-java-test discovery scan can take longer than 45s on a cold cache) and add retries: 1 to run-all-tests so a discovery-still-warming first invocation can retry. * java-maven-resolve-type: add retries: 1 to save-after-resolve so a slow Maven re-import on a cold cache (where the LS hasn't yet republished zero-errors at the time of save) can retry instead of failing the plan. Plans whose flaky steps already carry a structured verify* field (e.g. verify-completion with verifyCompletion: { notEmpty: true }, save-after-organize with verifyFile, verify-help-center-content with verifyWebview) no longer need plan changes because the framework auto-skip in @vscjava/vscode-autotest 0.7.5 already short-circuits the LLM re-check whenever any structured verifier is present. * fix(test-plans): restrict skipLlmVerify to G1/G4; add retries for cold-cache flakes Reverts the over-broad framework auto-skip (any structured verify -> no LLM) that was landed in autotest v0.7.5/0.7.6. LLM screenshot verification is the anti-silent-pass safety net and must stay enabled on steps where the screenshot carries unique signal (popup visibility, decoration lag, panel content). Final policy: - skipLlmVerify=true on Group 1 (16 ls-ready steps): waitForLanguageServer polls the same status bar text the LLM would read, so LLM adds zero signal. - skipLlmVerify=true on Group 4 (3 disk-write steps: save-after-organize, add-gson-dependency, create-formatter-profile): action mutates a file not open in any editor; before/after screenshots are by-design identical and LLM always downgrades. verifyFile from disk is the authoritative signal. - retries: 1 on 8 verify-completion steps to mitigate cold-cache 'Loading...' LLM downgrades while keeping the screenshot check enabled. - retries: 1 on java-maven-resolve-type save-after-resolve (kept from prior commit) for Maven indexer warm-up. - Wait bump 45 -> 90s on java-test-runner wait-test-discovery (kept). - java-pack-help-center-webview setup.extensions hard-requires java-pack (kept) — fixes the real bug (5/8 failures). LLM coverage preserved on verify-completion (popup visibility), verifyEditor (guards against page-wide DOM stale-tab fallback), verifyProblems (diagnostics red squiggle lag) and verifyWebview (visual rendering). Requires autotest >= 0.7.7 to honor skipLlmVerify without the auto-skip side effect. * ci: retrigger E2E AutoTest against @vscjava/vscode-autotest@0.7.8 The 0.7.7 release did not actually honor skipLlmVerify because planParser dropped the field on deserialize. 0.7.8 contains the parser fix; this empty commit restarts CI so the matrix installs the correct version.
1 parent c429a05 commit 8de1414

17 files changed

Lines changed: 76 additions & 1 deletion

test-plans/java-basic-editing.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,7 @@ steps:
4848
- id: "ls-ready"
4949
action: "waitForLanguageServer"
5050
verify: "Java extension has activated and the simple-app project tree is visible in the Explorer sidebar"
51+
skipLlmVerify: true # waitForLanguageServer reads the same status bar text the LLM would inspect
5152
verifyProblems:
5253
errors: 2
5354
timeout: 120
@@ -140,6 +141,10 @@ steps:
140141
path: "~/src/app/App.java"
141142
contains: "import java.io.File"
142143
timeout: 10
144+
# Save All has no on-screen change (tab dirty dot clears on the saved
145+
# tab but the active editor isn't the saved file). LLM downgrades on
146+
# before==after by-design.
147+
skipLlmVerify: true
143148

144149
# ── Step 8: Rename Symbol (F2) ──────────────────────────────
145150
- id: "close-all-before-rename"

test-plans/java-debugger.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@ steps:
4444
verify: "Java workspace has loaded; Problems panel shows no errors"
4545
verifyProblems:
4646
errors: 0
47+
skipLlmVerify: true # waitForLanguageServer is authoritative; LLM only sees the same status bar
4748
timeout: 120
4849

4950
# ── Open App.java ────────────────────────────────────────

test-plans/java-dependency-viewer.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,11 @@ steps:
2828
- id: "ls-ready"
2929
action: "waitForLanguageServer"
3030
verify: "Java workspace has loaded; Explorer shows the project tree and Problems panel is settled"
31+
# waitForLanguageServer is the authoritative deterministic check — the
32+
# status bar can still flicker "Java: Searching... 0%" for background
33+
# indexing right after the LS reports ready, which has historically
34+
# caused LLM screenshot downgrades. Skip LLM here.
35+
skipLlmVerify: true
3136
timeout: 120
3237

3338
# ── Open dependency view ─────────────────────────────────

test-plans/java-extension-pack.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,9 @@ steps:
3030
- id: "ls-ready"
3131
action: "waitForLanguageServer"
3232
verify: "Java workspace has loaded; Explorer shows the project tree and Problems panel is settled"
33+
# waitForLanguageServer is the authoritative deterministic check —
34+
# status-bar background indexing can cause spurious LLM downgrades.
35+
skipLlmVerify: true
3336
timeout: 120
3437

3538
# ── Trigger Classpath configuration command ──────────────

test-plans/java-fresh-import.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,9 @@ steps:
4545
- id: "ls-ready"
4646
action: "waitForLanguageServer"
4747
verify: "spring-petclinic project has been imported; Java extension is activated and ready for editing"
48+
# waitForLanguageServer is authoritative — skip LLM screenshot re-check
49+
# (status bar background indexing causes false downgrades).
50+
skipLlmVerify: true
4851
timeout: 300
4952

5053
# ── Verify completion ────────────────────────────────────
@@ -67,3 +70,6 @@ steps:
6770
verifyCompletion:
6871
notEmpty: true
6972
waitBefore: 5
73+
# LLM may downgrade if it sees "Loading..." spinner on cold cache; retry
74+
# gives the LS a warmed cache so the popup is fully rendered.
75+
retries: 1

test-plans/java-gradle-java25.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@ steps:
3636
verifyProblems:
3737
errors: 0
3838
timeout: 300
39+
skipLlmVerify: true # waitForLanguageServer is authoritative; LLM only sees the same status bar
3940

4041
# ── Step 2: Open Java file ───────────────────────────────
4142
# wiki: "Open Foo.java, make sure the editing experience is correctly working"
@@ -54,6 +55,8 @@ steps:
5455
verifyCompletion:
5556
notEmpty: true
5657
waitBefore: 8
58+
# Retry once on cold-cache "Loading..." LLM downgrades.
59+
retries: 1
5760

5861
# ── Step 4: Verify editing ────────────────────────────────
5962
- id: "goto-line"

test-plans/java-gradle.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@ steps:
3636
verifyProblems:
3737
errors: 0
3838
timeout: 300
39+
skipLlmVerify: true # waitForLanguageServer is authoritative; LLM only sees the same status bar
3940

4041
# ── Step 2: Open Foo.java and verify editing experience ─
4142
# wiki: "Open Foo.java file, make sure the editing experience

test-plans/java-maven-java25.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@ steps:
3434
verifyProblems:
3535
errors: 0
3636
timeout: 180
37+
skipLlmVerify: true # waitForLanguageServer is authoritative; LLM only sees the same status bar
3738

3839
# ── Step 2: Open Java file ───────────────────────────────
3940
# wiki: "Open Bar.java, make sure the editing experience is correctly working"
@@ -52,6 +53,8 @@ steps:
5253
verifyCompletion:
5354
notEmpty: true
5455
waitBefore: 8
56+
# Retry once on cold-cache "Loading..." LLM downgrades.
57+
retries: 1
5558

5659
# ── Step 4: Verify editing ────────────────────────────────
5760
- id: "goto-line"

test-plans/java-maven-multimodule.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@ steps:
3232
action: "waitForLanguageServer"
3333
verify: "Multimodule Maven workspace has loaded; the Java extension is initialized for the project with module1 and module2 visible in the Explorer (the Problems panel may briefly show diagnostics that are still being recomputed after import — the verifyProblems checks below pin the final state)"
3434
timeout: 180
35+
skipLlmVerify: true # waitForLanguageServer is authoritative; LLM only sees the same status bar
3536

3637
# ── Step 2: Verify module1 Foo.java ──────────────────────
3738
# wiki: "make sure the editing experience is correctly working
@@ -48,6 +49,8 @@ steps:
4849
verifyCompletion:
4950
notEmpty: true
5051
waitBefore: 8
52+
# Retry once on cold-cache "Loading..." LLM downgrades.
53+
retries: 1
5154

5255
# Close module1's tab first so the next `open file Foo.java` request
5356
# disambiguates to module2/Foo.java rather than re-focusing the already-
@@ -69,3 +72,5 @@ steps:
6972
verifyCompletion:
7073
notEmpty: true
7174
waitBefore: 8
75+
# Retry once on cold-cache "Loading..." LLM downgrades.
76+
retries: 1

test-plans/java-maven-resolve-type.yaml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,8 @@ steps:
4949
action: "waitForLanguageServer"
5050
verify: "maven-resolve-type project has been imported; the Java extension is activated and pom.xml is visible in the Explorer"
5151
timeout: 180
52+
# waitForLanguageServer is authoritative — skip LLM screenshot re-check.
53+
skipLlmVerify: true
5254

5355
# ── Open Java file ──────────────────────────────────────
5456
- id: "open-app"
@@ -103,6 +105,10 @@ steps:
103105
path: "~/pom.xml"
104106
contains: "com.google.code.gson"
105107
waitBefore: 2
108+
# Disk-only insertLineInFile: pom.xml isn't in any editor, so before/after
109+
# screenshots are necessarily identical. LLM always downgrades; verifyFile
110+
# reading from disk is the only meaningful signal.
111+
skipLlmVerify: true
106112

107113
# Re-open pom.xml so the AFTER screenshot shows the new <dependency>
108114
# block. Loading fresh from disk avoids any in-memory/disk mismatch.
@@ -161,6 +167,10 @@ steps:
161167
errors: 0
162168
waitBefore: 20
163169
timeout: 90
170+
# Maven re-import on a cold cache can take significantly longer than the
171+
# waitBefore window; a single retry (with the LS likely already settled
172+
# by then) recovers without inflating the happy-path wait further.
173+
retries: 1
164174

165175
# After save, the language server publishes diagnostics (status bar updates
166176
# to 0 errors, verified deterministically above). However, on Linux runners

0 commit comments

Comments
 (0)