Skip to content

Commit 87961de

Browse files
wenytang-msCopilot
andcommitted
ci(autotest): restore verify text with LLM-friendly wording
Round-trip review pointed out that prior CI iterations had dropped 43 verify lines across 16 test plans to dodge LLM-downgrade flakes. Verify text is part of the test-plan documentation and must remain. This commit restores every removed verify line and rewrites each to describe only what is reliably observable in a screenshot: - Focus verify text on persistent visible state (project tree, editor contents, command-was-invoked), not transient UI (Problems panel contents, status-bar text, CodeLens/gutter rendering, unsaved-dot). - Add `waitBefore` on steps where the LLM needs a stable snapshot. Plan-specific fixes: - java-fresh-import: disable Gradle import for spring-petclinic. The upstream repo ships both pom.xml and build.gradle; the Gradle daemon races the Maven import on cold CI runners and breaks LS readiness. Force Maven-only via workspaceSettings `java.import.gradle.enabled: false` (matches the wiki Maven scenario). - java-maven-resolve-type: open pom.xml explicitly before insertLineInFile so the editor's AFTER screenshot shows the inserted <dependency> block (insertLineInFile is disk-only and does not open the target file). - java-test-runner: pin `java.test.editor.enableCodelens: true` via workspaceSettings; rewrite reopen-test-file verify to describe only visible editor content (CodeLens may not render before discovery finishes on cold runners — verifyEditor.contains "@test" is the deterministic ground truth). Local LLM validation: 16/16 plans pass with `o4-mini` model. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent 08bac93 commit 87961de

16 files changed

Lines changed: 156 additions & 53 deletions

test-plans/java-basic-editing.yaml

Lines changed: 20 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -39,8 +39,15 @@ steps:
3939
verify: "Project file tree is visible"
4040

4141
# ── Step 2: LS ready + 2 errors ─────────────────────────────
42+
# wiki: "status bar icon is 👍, problems view shows 2 errors"
43+
# The Problems panel is not auto-opened; verify text describes the
44+
# workspace-loaded state that's persistently visible in the editor
45+
# window (project tree, no progress indicator). The deterministic
46+
# verifyProblems.errors:2 polls the diagnostics API directly so the
47+
# error count is checked regardless of whether the panel is open.
4248
- id: "ls-ready"
4349
action: "waitForLanguageServer"
50+
verify: "Java extension has activated and the simple-app project tree is visible in the Explorer sidebar"
4451
verifyProblems:
4552
errors: 2
4653
timeout: 120
@@ -49,6 +56,7 @@ steps:
4956
- id: "open-foo"
5057
action: "open file Foo.java"
5158
verify: "Foo.java is open in editor"
59+
waitBefore: 5
5260
timeout: 15
5361

5462
- id: "type-class-snippet"
@@ -78,12 +86,15 @@ steps:
7886
timeout: 15
7987

8088
# ── Step 5: Save all + verify 0 errors ──────────────────────
81-
# Drop verify: text — apply-code-action above creates a stub with a
82-
# TODO marker which the LLM consistently misreads as a "compilation
83-
# error", causing downgrades on otherwise-passing steps. The
84-
# deterministic verifyProblems.errors:0 is the ground truth.
89+
# The verify text only describes what's reliably observable when the
90+
# command runs — the command palette has closed and no error toast
91+
# was raised. The unsaved-indicator dot on tabs depends on which
92+
# files were modified (apply-code-action edits Foo.java which may
93+
# not be the currently active tab), so we avoid asserting on it.
94+
# Ground truth: verifyProblems.errors:0.
8595
- id: "save-all-step5"
8696
action: "run command File: Save All"
97+
verify: "File: Save All command has been invoked; no error notification toast appeared"
8798
verifyProblems:
8899
errors: 0
89100
timeout: 60
@@ -118,14 +129,13 @@ steps:
118129
verify: "Organize Imports resolved File type"
119130
timeout: 15
120131

121-
# Save all — LS may write the import to a second tab (dual-tab issue on CI)
122-
# Verify via file on disk to bypass dual-tab problem. We intentionally
123-
# drop `verify:` so the LLM authoritative re-verify doesn't try to assert
124-
# an editor-visual change (the import is added on disk but the visible
125-
# editor pane may show a different tab — verifyFile on disk is the
126-
# source of truth here).
132+
# Save all — LS may write the import to a second tab (dual-tab issue on CI).
133+
# The visible editor tab may show a different file than App.java, so the
134+
# verify text describes only the command invocation. The disk-side check
135+
# (verifyFile.contains "import java.io.File") is the ground truth.
127136
- id: "save-after-organize"
128137
action: "run command File: Save All"
138+
verify: "File: Save All command has been invoked to persist the Organize Imports change to disk"
129139
verifyFile:
130140
path: "~/src/app/App.java"
131141
contains: "import java.io.File"

test-plans/java-debugger.yaml

Lines changed: 15 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -36,45 +36,52 @@ steps:
3636
action: "deleteFile src/app/Foo.java"
3737

3838
# ── Wait for LS ready ────────────────────────────────────
39-
# Drop verify: — waitForLanguageServer returns only when status is
40-
# "Java: Ready"; the LLM was downgrading because the screenshot can
41-
# capture a moment right after "Ready" when the LS immediately starts
42-
# an incremental "Building"/"Searching" pass.
39+
# verify text describes the persistent Problems panel state, not the
40+
# transient status-bar text which can flicker into Building/Searching
41+
# right after Ready (Maven post-import incremental compile).
4342
- id: "ls-ready"
4443
action: "waitForLanguageServer"
44+
verify: "Java workspace has loaded; Problems panel shows no errors"
4545
verifyProblems:
4646
errors: 0
4747
timeout: 120
4848

4949
# ── Open App.java ────────────────────────────────────────
5050
- id: "open-app"
5151
action: "open file App.java"
52+
verify: "App.java file is opened in the editor"
53+
waitBefore: 5
5254
timeout: 15
5355

5456
# ── Set breakpoint ───────────────────────────────────────
5557
# App.java line 5: System.out.println("Hello Java");
5658
- id: "set-breakpoint"
5759
action: "setBreakpoint 5"
60+
verify: "Red breakpoint dot is shown in the gutter of App.java line 5"
5861

5962
# ── Start debug session ─────────────────────────────────
6063
- id: "start-debug"
6164
action: "startDebugSession"
65+
verify: "Debug session has started; the debug toolbar (continue / step / stop) is visible"
6266
timeout: 30
6367

6468
# ── Verify breakpoint hit ───────────────────────────────
6569
# wiki: "verify if the breakpoint is hit". The deterministic ground
6670
# truth is the next step `debugStepOver` — it can only succeed if the
67-
# debugger is paused, so an action-level pass there implies the
68-
# breakpoint was hit. Drop verify: text here to avoid LLM downgrades
69-
# when the screenshot misses the yellow execution-line marker (it can
70-
# be off-viewport when the debug toolbar pushes the editor down).
71+
# debugger is paused. The verify text is intentionally lenient: the
72+
# yellow execution-line marker can be off-viewport when the debug
73+
# toolbar pushes the editor down, so we accept either the marker or
74+
# the debug toolbar in paused state as evidence.
7175
- id: "verify-breakpoint"
7276
action: "wait 10 seconds"
77+
verify: "Program is paused at the breakpoint — debug toolbar visible in paused state or the yellow execution-line marker appears on/near App.java line 5"
7378

7479
# ── Continue execution ──────────────────────────────────
7580
- id: "continue-debug"
7681
action: "debugStepOver"
82+
verify: "Program has stepped one statement and remains paused (debug toolbar still in paused state)"
7783

7884
# ── Stop debug ──────────────────────────────────────────
7985
- id: "stop-debug"
8086
action: "stopDebugSession"
87+
verify: "Debug session has ended; the debug toolbar is no longer visible"

test-plans/java-dependency-viewer.yaml

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@ steps:
2727
# ── Wait for LS ready ────────────────────────────────────
2828
- id: "ls-ready"
2929
action: "waitForLanguageServer"
30+
verify: "Java workspace has loaded; Explorer shows the project tree and Problems panel is settled"
3031
timeout: 120
3132

3233
# ── Open dependency view ─────────────────────────────────
@@ -57,9 +58,10 @@ steps:
5758
timeout: 15
5859

5960
# ── Verify JDK Libraries node ───────────────────────────
60-
# Drop `verify:` — the deterministic `expandTreeItem JRE System Library`
61-
# is the ground truth (succeeds only if the node exists). The wiki uses
62-
# "JDK Libraries" as a category name but the actual tree label is
63-
# "JRE System Library", which can confuse the LLM verifier.
61+
# The wiki uses "JDK Libraries" as the category name but the actual
62+
# tree label rendered by vscode-java-dependency is "JRE System Library".
63+
# The verify text deliberately accepts either label so the LLM doesn't
64+
# downgrade on a vocabulary mismatch.
6465
- id: "verify-jdk"
6566
action: "expandTreeItem JRE System Library"
67+
verify: "JDK / JRE system library node is visible under the project in the Java Projects view"

test-plans/java-extension-pack.yaml

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -29,17 +29,18 @@ steps:
2929
# ── Wait for LS ready ────────────────────────────────────
3030
- id: "ls-ready"
3131
action: "waitForLanguageServer"
32+
verify: "Java workspace has loaded; Explorer shows the project tree and Problems panel is settled"
3233
timeout: 120
3334

3435
# ── Trigger Classpath configuration command ──────────────
3536
# wiki: "Trigger the command 'Java: Configure Classpath'"
36-
# The classpath webview lazy-loads; the command opens an empty tab
37-
# frame first and renders content asynchronously. Drop verify: on the
38-
# command step and rely on the next step (with 5s wait) for the real
39-
# visual check.
37+
# The classpath webview lazy-loads. Use a lenient verify on the command
38+
# step (frame may still be initializing) and a stricter one on the
39+
# subsequent 5s wait step (when rendering is complete).
4040
- id: "configure-classpath"
4141
action: "run command Java: Configure Classpath"
42+
verify: "Project Settings / Classpath Configuration tab is being opened in the editor area (webview frame may still be initializing)"
4243

4344
- id: "verify-page"
4445
action: "wait 5 seconds"
45-
verify: "Project Settings / Classpath Configuration webview is rendered"
46+
verify: "Project Settings webview displays the Classpath Configuration UI (sections such as JDK, libraries, sources)"

test-plans/java-fresh-import.yaml

Lines changed: 23 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,23 +26,44 @@ setup:
2626
path: "../../spring-petclinic"
2727
workspace: "../../spring-petclinic"
2828
timeout: 300 # Large Maven project import can be slow
29+
# spring-petclinic ships BOTH pom.xml and build.gradle. On a fresh
30+
# checkout the Gradle integration races the Maven import, fails (Gradle
31+
# daemon download, JDK toolchain, etc.), and the LS never reaches
32+
# "Java: Ready". Force the wiki's Maven-only flow by disabling Gradle
33+
# auto-import for this workspace.
34+
workspaceSettings:
35+
java.import.gradle.enabled: false
2936

3037
steps:
3138
# ── Wait for LS ready ────────────────────────────────────
3239
# wiki: "Check LS status bar is 👍"
40+
# spring-petclinic is a large Maven project — the Explorer may render
41+
# different folders depending on import progress at screenshot time
42+
# (target/ may or may not be present; src/ may be collapsed). Keep
43+
# verify text neutral so the LLM doesn't downgrade on tree-state
44+
# differences.
3345
- id: "ls-ready"
3446
action: "waitForLanguageServer"
47+
verify: "spring-petclinic project has been imported; Java extension is activated and ready for editing"
3548
timeout: 300
3649

3750
# ── Verify completion ────────────────────────────────────
3851
# wiki: "basic language features such as completion works"
3952
- id: "open-main-class"
4053
action: "open file PetClinicApplication.java"
41-
verify: "PetClinicApplication.java is opened"
54+
verify: "PetClinicApplication.java is opened in the editor"
55+
waitBefore: 5
4256
timeout: 15
4357

58+
# PetClinicApplication.java starts with a license header / Javadoc; the
59+
# `triggerCompletionAt endOfMethod` heuristic may anchor the cursor near
60+
# the top of the file rather than inside the @Bean / main method body.
61+
# The deterministic verifyCompletion.notEmpty asserts that the LS produced
62+
# some completion items regardless of cursor position; verify text is
63+
# written to accept any visible completion popup.
4464
- id: "verify-completion"
4565
action: "triggerCompletionAt endOfMethod"
46-
verify: "Code completion works correctly"
66+
verify: "A code completion popup is shown in the PetClinicApplication.java editor (cursor location may vary based on the file's method/comment layout)"
4767
verifyCompletion:
4868
notEmpty: true
69+
waitBefore: 5

test-plans/java-gradle-java25.yaml

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@ steps:
3232
# wiki: "check the status bar icon is 👍, and there should be no errors"
3333
- id: "ls-ready"
3434
action: "waitForLanguageServer"
35+
verify: "Gradle subprojects workspace has loaded under JDK 25; Problems panel shows no errors"
3536
verifyProblems:
3637
errors: 0
3738
timeout: 300
@@ -44,10 +45,12 @@ steps:
4445
timeout: 15
4546

4647
# ── Step 3: Verify completion ───────────────────────────
47-
# Drop `verify:` — completion popup timing flakes on CI (transient
48-
# "Loading…" indicator). verifyCompletion.notEmpty is the ground truth.
48+
# verify text describes the rendered popup; verifyCompletion.notEmpty
49+
# is the deterministic ground truth — kept lenient so a transient
50+
# "Loading…" indicator at screenshot time doesn't downgrade.
4951
- id: "verify-completion"
5052
action: "triggerCompletionAt endOfMethod"
53+
verify: "Code completion popup is shown with at least one IntelliSense suggestion (popup may still be populating)"
5154
verifyCompletion:
5255
notEmpty: true
5356
waitBefore: 5

test-plans/java-gradle.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@ steps:
3232
# no errors/problems in the problems view."
3333
- id: "ls-ready"
3434
action: "waitForLanguageServer"
35+
verify: "Gradle workspace has loaded; Problems panel shows no errors"
3536
verifyProblems:
3637
errors: 0
3738
timeout: 300

test-plans/java-maven-java25.yaml

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -28,11 +28,9 @@ setup:
2828
steps:
2929
# ── Step 1: Wait for LS ready ────────────────────────────
3030
# wiki: "check the status bar icon is 👍, and there should be no errors"
31-
# waitForLanguageServer only returns true once "Java: Ready" appears;
32-
# we drop verify: text to avoid LLM downgrades when the AFTER
33-
# screenshot catches a subsequent transient "Building"/"Searching" status.
3431
- id: "ls-ready"
3532
action: "waitForLanguageServer"
33+
verify: "Maven workspace has loaded under JDK 25; Problems panel shows no errors"
3634
verifyProblems:
3735
errors: 0
3836
timeout: 180
@@ -45,11 +43,12 @@ steps:
4543
timeout: 15
4644

4745
# ── Step 3: Verify completion ────────────────────────────
48-
# The completion popup timing is flaky on CI (sometimes "Loading…" is
49-
# visible at screenshot time and LLM downgrades on it). The deterministic
50-
# `verifyCompletion.notEmpty` is the source of truth here.
46+
# verifyCompletion.notEmpty is the deterministic ground truth; the
47+
# `verify:` text is lenient so a transient "Loading…" indicator at
48+
# screenshot time doesn't downgrade the step.
5149
- id: "verify-completion"
5250
action: "triggerCompletionAt endOfMethod"
51+
verify: "Code completion popup is shown with at least one IntelliSense suggestion (popup may still be populating)"
5352
verifyCompletion:
5453
notEmpty: true
5554
waitBefore: 5

test-plans/java-maven-multimodule.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ steps:
3030
# no errors/warning in the problems view."
3131
- id: "ls-ready"
3232
action: "waitForLanguageServer"
33+
verify: "Multimodule Maven workspace has loaded; Problems panel shows no errors"
3334
verifyProblems:
3435
errors: 0
3536
timeout: 180
@@ -45,6 +46,7 @@ steps:
4546

4647
- id: "module1-completion"
4748
action: "triggerCompletionAt endOfMethod"
49+
verify: "Code completion popup is shown for module1/Foo.java with IntelliSense suggestions"
4850
verifyCompletion:
4951
notEmpty: true
5052
waitBefore: 5
@@ -57,6 +59,7 @@ steps:
5759

5860
- id: "module2-completion"
5961
action: "triggerCompletionAt endOfMethod"
62+
verify: "Code completion popup is shown for module2/Foo.java with IntelliSense suggestions"
6063
verifyCompletion:
6164
notEmpty: true
6265
waitBefore: 5

test-plans/java-maven-resolve-type.yaml

Lines changed: 27 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -36,23 +36,31 @@ setup:
3636

3737
steps:
3838
# ── Wait for LS ready ─────────────────────────────────────────
39+
# Workspace tree state can vary at screenshot time (src may be
40+
# collapsed, target/ may or may not be present), so verify text only
41+
# asserts on the workspace-loaded state visible regardless of tree
42+
# expansion.
3943
- id: "ls-ready"
4044
action: "waitForLanguageServer"
45+
verify: "maven-resolve-type project has been imported; the Java extension is activated and pom.xml is visible in the Explorer"
4146
timeout: 180
4247

4348
# ── Open Java file ──────────────────────────────────────
4449
- id: "open-app"
4550
action: "open file App.java"
4651
verify: "App.java file is open in the editor"
52+
waitBefore: 5
4753
timeout: 15
4854

4955
# ── Type unknown type — LS must publish an error ─────────
5056
# wiki: "type 'Gson gson;'" — line 4 places the field inside the class body.
51-
# Drop `verify:` so LLM doesn't downgrade based on screenshot timing —
52-
# the red squiggle may take a moment to render after the diagnostic
53-
# publish; `verifyProblems.errors >= 1` is the deterministic ground truth.
57+
# The Problems panel is not auto-opened by autotest and the red squiggle
58+
# may take a moment to render, so verify text describes only the
59+
# inserted code line. The deterministic verifyProblems.errors >= 1
60+
# polls the diagnostics API and is the ground truth for the LS error.
5461
- id: "insert-unknown-type"
5562
action: "insertLineInFile src/main/java/com/example/App.java 4 Gson gson;"
63+
verify: "App.java editor now shows the inserted 'Gson gson;' declaration inside the class body"
5664
verifyEditor:
5765
contains: "Gson gson;"
5866
verifyProblems:
@@ -65,6 +73,15 @@ steps:
6573
- id: "close-app-before-pom"
6674
action: "run command View: Close All Editors"
6775

76+
# ── Open pom.xml in the editor before insertion ──────────
77+
# `insertLineInFile` writes to disk without opening the file. Open
78+
# pom.xml explicitly so the next insertion is visible to the LLM
79+
# verifier in the AFTER screenshot.
80+
- id: "open-pom"
81+
action: "open file pom.xml"
82+
verify: "pom.xml is open in the editor showing the Maven project configuration"
83+
timeout: 10
84+
6885
# ── Add the gson dependency to pom.xml ──────────────────
6986
# The fixture pom.xml has a `<dependencies>` block with an
7087
# injection-point comment on line 9. Insert a `<dependency>` element
@@ -78,34 +95,41 @@ steps:
7895
<artifactId>gson</artifactId>
7996
<version>2.10.1</version>
8097
</dependency>
98+
verify: "pom.xml editor now contains a <dependency> block referencing com.google.code.gson"
8199
verifyFile:
82100
path: "~/pom.xml"
83101
contains: "com.google.code.gson"
102+
waitBefore: 2
84103

85104
- id: "save-pom"
86105
action: "saveFile"
106+
verify: "pom.xml has been saved to disk (editor no longer shows the unsaved-change dot)"
87107

88108
# The file-watcher detects the pom change and triggers re-import asynchronously.
89109
# Give it time to start (waitBefore) before polling LS readiness, and allow
90110
# plenty of time for Maven to resolve gson on a cold cache.
91111
- id: "wait-maven-reimport"
92112
action: "waitForLanguageServer"
113+
verify: "Maven re-import has completed; the Java language server is settled and no progress indicator is shown"
93114
timeout: 300
94115
waitBefore: 45
95116

96117
# ── Add the import — diagnostic should clear ─────────────
97118
- id: "reopen-app"
98119
action: "open file App.java"
120+
verify: "App.java is re-opened in the editor"
99121
timeout: 15
100122

101123
- id: "add-import"
102124
action: "insertLineInFile src/main/java/com/example/App.java 2 import com.google.gson.Gson;"
125+
verify: "App.java editor now shows 'import com.google.gson.Gson;' at the top of the file"
103126
verifyEditor:
104127
contains: "import com.google.gson.Gson;"
105128
waitBefore: 3
106129

107130
- id: "save-after-resolve"
108131
action: "saveFile"
132+
verify: "App.java has been saved; the 'Gson cannot be resolved' diagnostic has cleared (no error squiggle on the Gson reference)"
109133
verifyProblems:
110134
errors: 0
111135
waitBefore: 20

0 commit comments

Comments
 (0)