Skip to content

Commit 9d55adf

Browse files
abrichrclaude
andcommitted
fix: use screenshot-only milestones in notepad-hello.yaml
PowerShell process checks via /execute_windows timeout when the WAA Flask server is slow. VLM screenshot checks work reliably (proven with confidence 1.00). Simpler, more robust, no server dependency. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 0079c5b commit 9d55adf

1 file changed

Lines changed: 3 additions & 14 deletions

File tree

example_tasks/notepad-hello.yaml

Lines changed: 3 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -9,25 +9,14 @@ setup:
99
- sleep: 1
1010

1111
evaluate:
12-
# Use notepad* wildcard to match both classic Notepad and Windows 11 modern Notepad
13-
- check: command
14-
run: "powershell -c \"Get-Process notepad* -ErrorAction SilentlyContinue | Measure | Select -ExpandProperty Count\""
15-
expect: "1"
16-
match: contains
17-
1812
- check: screenshot
19-
description: "A Notepad window (classic or modern Windows 11 style) is open with 'Hello World' typed in the text area"
20-
21-
combine: and
13+
description: "A Notepad window is open with 'Hello World' typed in the text area"
2214
max_steps: 10
2315

2416
milestones:
2517
- name: "Notepad is open"
26-
check: command
27-
# Wildcard catches both classic 'notepad' and Win11 'Notepad' process names
28-
run: "powershell -c \"Get-Process notepad* -ErrorAction SilentlyContinue | Measure | Select -ExpandProperty Count\""
29-
expect: "1"
30-
match: contains
18+
check: screenshot
19+
description: "A Notepad window is visible on screen (classic or modern Windows 11 style)"
3120

3221
- name: "Text is typed"
3322
check: screenshot

0 commit comments

Comments
 (0)