You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ci(autotest): restore verify text with LLM-friendly wording
Round-trip review pointed out that prior CI iterations had dropped 43
verify lines across 16 test plans to dodge LLM-downgrade flakes. Verify
text is part of the test-plan documentation and must remain.
This commit restores every removed verify line and rewrites each to
describe only what is reliably observable in a screenshot:
- Focus verify text on persistent visible state (project tree, editor
contents, command-was-invoked), not transient UI (Problems panel
contents, status-bar text, CodeLens/gutter rendering, unsaved-dot).
- Add `waitBefore` on steps where the LLM needs a stable snapshot.
Plan-specific fixes:
- java-fresh-import: disable Gradle import for spring-petclinic. The
upstream repo ships both pom.xml and build.gradle; the Gradle daemon
races the Maven import on cold CI runners and breaks LS readiness.
Force Maven-only via workspaceSettings `java.import.gradle.enabled:
false` (matches the wiki Maven scenario).
- java-maven-resolve-type: open pom.xml explicitly before
insertLineInFile so the editor's AFTER screenshot shows the inserted
<dependency> block (insertLineInFile is disk-only and does not open
the target file).
- java-test-runner: pin `java.test.editor.enableCodelens: true` via
workspaceSettings; rewrite reopen-test-file verify to describe only
visible editor content (CodeLens may not render before discovery
finishes on cold runners — verifyEditor.contains "@test" is the
deterministic ground truth).
Local LLM validation: 16/16 plans pass with `o4-mini` model.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
verify: "Debug session has started; the debug toolbar (continue / step / stop) is visible"
62
66
timeout: 30
63
67
64
68
# ── Verify breakpoint hit ───────────────────────────────
65
69
# wiki: "verify if the breakpoint is hit". The deterministic ground
66
70
# truth is the next step `debugStepOver` — it can only succeed if the
67
-
# debugger is paused, so an action-level pass there implies the
68
-
#breakpoint was hit. Drop verify: text here to avoid LLM downgrades
69
-
#when the screenshot misses the yellow execution-line marker (it can
70
-
#be off-viewport when the debug toolbar pushes the editor down).
71
+
# debugger is paused. The verify text is intentionally lenient: the
72
+
#yellow execution-line marker can be off-viewport when the debug
73
+
#toolbar pushes the editor down, so we accept either the marker or
74
+
# the debug toolbar in paused state as evidence.
71
75
- id: "verify-breakpoint"
72
76
action: "wait 10 seconds"
77
+
verify: "Program is paused at the breakpoint — debug toolbar visible in paused state or the yellow execution-line marker appears on/near App.java line 5"
# wiki: "basic language features such as completion works"
39
52
- id: "open-main-class"
40
53
action: "open file PetClinicApplication.java"
41
-
verify: "PetClinicApplication.java is opened"
54
+
verify: "PetClinicApplication.java is opened in the editor"
55
+
waitBefore: 5
42
56
timeout: 15
43
57
58
+
# PetClinicApplication.java starts with a license header / Javadoc; the
59
+
# `triggerCompletionAt endOfMethod` heuristic may anchor the cursor near
60
+
# the top of the file rather than inside the @Bean / main method body.
61
+
# The deterministic verifyCompletion.notEmpty asserts that the LS produced
62
+
# some completion items regardless of cursor position; verify text is
63
+
# written to accept any visible completion popup.
44
64
- id: "verify-completion"
45
65
action: "triggerCompletionAt endOfMethod"
46
-
verify: "Code completion works correctly"
66
+
verify: "A code completion popup is shown in the PetClinicApplication.java editor (cursor location may vary based on the file's method/comment layout)"
0 commit comments