Skip to content

Commit a64ef4a

Browse files
wenytang-msCopilot
andcommitted
fix(test-plans): restrict skipLlmVerify to G1/G4; add retries for cold-cache flakes
Reverts the over-broad framework auto-skip (any structured verify -> no LLM) that was landed in autotest v0.7.5/0.7.6. LLM screenshot verification is the anti-silent-pass safety net and must stay enabled on steps where the screenshot carries unique signal (popup visibility, decoration lag, panel content). Final policy: - skipLlmVerify=true on Group 1 (16 ls-ready steps): waitForLanguageServer polls the same status bar text the LLM would read, so LLM adds zero signal. - skipLlmVerify=true on Group 4 (3 disk-write steps: save-after-organize, add-gson-dependency, create-formatter-profile): action mutates a file not open in any editor; before/after screenshots are by-design identical and LLM always downgrades. verifyFile from disk is the authoritative signal. - retries: 1 on 8 verify-completion steps to mitigate cold-cache 'Loading...' LLM downgrades while keeping the screenshot check enabled. - retries: 1 on java-maven-resolve-type save-after-resolve (kept from prior commit) for Maven indexer warm-up. - Wait bump 45 -> 90s on java-test-runner wait-test-discovery (kept). - java-pack-help-center-webview setup.extensions hard-requires java-pack (kept) — fixes the real bug (5/8 failures). LLM coverage preserved on verify-completion (popup visibility), verifyEditor (guards against page-wide DOM stale-tab fallback), verifyProblems (diagnostics red squiggle lag) and verifyWebview (visual rendering). Requires autotest >= 0.7.7 to honor skipLlmVerify without the auto-skip side effect. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent f5cc903 commit a64ef4a

13 files changed

Lines changed: 36 additions & 0 deletions

test-plans/java-basic-editing.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,7 @@ steps:
4848
- id: "ls-ready"
4949
action: "waitForLanguageServer"
5050
verify: "Java extension has activated and the simple-app project tree is visible in the Explorer sidebar"
51+
skipLlmVerify: true # waitForLanguageServer reads the same status bar text the LLM would inspect
5152
verifyProblems:
5253
errors: 2
5354
timeout: 120
@@ -140,6 +141,10 @@ steps:
140141
path: "~/src/app/App.java"
141142
contains: "import java.io.File"
142143
timeout: 10
144+
# Save All has no on-screen change (tab dirty dot clears on the saved
145+
# tab but the active editor isn't the saved file). LLM downgrades on
146+
# before==after by-design.
147+
skipLlmVerify: true
143148

144149
# ── Step 8: Rename Symbol (F2) ──────────────────────────────
145150
- id: "close-all-before-rename"

test-plans/java-debugger.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@ steps:
4444
verify: "Java workspace has loaded; Problems panel shows no errors"
4545
verifyProblems:
4646
errors: 0
47+
skipLlmVerify: true # waitForLanguageServer is authoritative; LLM only sees the same status bar
4748
timeout: 120
4849

4950
# ── Open App.java ────────────────────────────────────────

test-plans/java-fresh-import.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,3 +70,6 @@ steps:
7070
verifyCompletion:
7171
notEmpty: true
7272
waitBefore: 5
73+
# LLM may downgrade if it sees "Loading..." spinner on cold cache; retry
74+
# gives the LS a warmed cache so the popup is fully rendered.
75+
retries: 1

test-plans/java-gradle-java25.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@ steps:
3636
verifyProblems:
3737
errors: 0
3838
timeout: 300
39+
skipLlmVerify: true # waitForLanguageServer is authoritative; LLM only sees the same status bar
3940

4041
# ── Step 2: Open Java file ───────────────────────────────
4142
# wiki: "Open Foo.java, make sure the editing experience is correctly working"
@@ -54,6 +55,8 @@ steps:
5455
verifyCompletion:
5556
notEmpty: true
5657
waitBefore: 8
58+
# Retry once on cold-cache "Loading..." LLM downgrades.
59+
retries: 1
5760

5861
# ── Step 4: Verify editing ────────────────────────────────
5962
- id: "goto-line"

test-plans/java-gradle.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@ steps:
3636
verifyProblems:
3737
errors: 0
3838
timeout: 300
39+
skipLlmVerify: true # waitForLanguageServer is authoritative; LLM only sees the same status bar
3940

4041
# ── Step 2: Open Foo.java and verify editing experience ─
4142
# wiki: "Open Foo.java file, make sure the editing experience

test-plans/java-maven-java25.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@ steps:
3434
verifyProblems:
3535
errors: 0
3636
timeout: 180
37+
skipLlmVerify: true # waitForLanguageServer is authoritative; LLM only sees the same status bar
3738

3839
# ── Step 2: Open Java file ───────────────────────────────
3940
# wiki: "Open Bar.java, make sure the editing experience is correctly working"
@@ -52,6 +53,8 @@ steps:
5253
verifyCompletion:
5354
notEmpty: true
5455
waitBefore: 8
56+
# Retry once on cold-cache "Loading..." LLM downgrades.
57+
retries: 1
5558

5659
# ── Step 4: Verify editing ────────────────────────────────
5760
- id: "goto-line"

test-plans/java-maven-multimodule.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@ steps:
3232
action: "waitForLanguageServer"
3333
verify: "Multimodule Maven workspace has loaded; the Java extension is initialized for the project with module1 and module2 visible in the Explorer (the Problems panel may briefly show diagnostics that are still being recomputed after import — the verifyProblems checks below pin the final state)"
3434
timeout: 180
35+
skipLlmVerify: true # waitForLanguageServer is authoritative; LLM only sees the same status bar
3536

3637
# ── Step 2: Verify module1 Foo.java ──────────────────────
3738
# wiki: "make sure the editing experience is correctly working
@@ -48,6 +49,8 @@ steps:
4849
verifyCompletion:
4950
notEmpty: true
5051
waitBefore: 8
52+
# Retry once on cold-cache "Loading..." LLM downgrades.
53+
retries: 1
5154

5255
# Close module1's tab first so the next `open file Foo.java` request
5356
# disambiguates to module2/Foo.java rather than re-focusing the already-
@@ -69,3 +72,5 @@ steps:
6972
verifyCompletion:
7073
notEmpty: true
7174
waitBefore: 8
75+
# Retry once on cold-cache "Loading..." LLM downgrades.
76+
retries: 1

test-plans/java-maven-resolve-type.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -105,6 +105,10 @@ steps:
105105
path: "~/pom.xml"
106106
contains: "com.google.code.gson"
107107
waitBefore: 2
108+
# Disk-only insertLineInFile: pom.xml isn't in any editor, so before/after
109+
# screenshots are necessarily identical. LLM always downgrades; verifyFile
110+
# reading from disk is the only meaningful signal.
111+
skipLlmVerify: true
108112

109113
# Re-open pom.xml so the AFTER screenshot shows the new <dependency>
110114
# block. Loading fresh from disk avoids any in-memory/disk mismatch.

test-plans/java-maven.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,8 @@ steps:
5050
verifyCompletion:
5151
notEmpty: true
5252
waitBefore: 8
53+
# Retry once on cold-cache "Loading..." LLM downgrades.
54+
retries: 1
5355

5456
# 2c. Verify cursor navigation (goToLine)
5557
- id: "goto-line"

test-plans/java-single-file.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,8 @@ steps:
5252
verifyCompletion:
5353
notEmpty: true
5454
waitBefore: 8
55+
# Retry once on cold-cache "Loading..." LLM downgrades.
56+
retries: 1
5557

5658
# ── Step 4: Verify basic editing ────────────────────────────────
5759
- id: "goto-main"

0 commit comments

Comments
 (0)