Skip to content

Commit 17317d0

Browse files
authored
fix: support saved screenshot diffs (#404)
1 parent 657945d commit 17317d0

File tree

6 files changed

+105
-13
lines changed

6 files changed

+105
-13
lines changed

skills/agent-device/references/verification.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,14 +54,16 @@ Use `diff screenshot` when comparing the current rendered screen against a saved
5454

5555
```bash
5656
agent-device diff screenshot --baseline ./baseline.png --out /tmp/diff.png
57+
agent-device diff screenshot --baseline ./baseline.png ./current.png --out /tmp/diff.png
5758
agent-device diff screenshot --baseline ./baseline.png --out /tmp/diff.png --overlay-refs
5859
```
5960

6061
- Text output includes ranked changed regions with screen-space rectangles, shape, size, density, average color, and luminance. JSON also includes normalized bounds.
6162
- The diff PNG uses a light grayscale current-screen context with changed pixels tinted red and changed regions outlined.
63+
- When a current image path is provided, `diff screenshot` compares the two saved files instead of capturing from the live device or requiring an active session.
6264
- Install `tesseract` when you want `diff screenshot` to add best-effort OCR text deltas, movement clusters, and bbox size-change hints. OCR improves the text/JSON descriptions only; it does not change the pixel comparison or the diff PNG.
6365
- When OCR is available, `diff screenshot` also reports best-effort non-text visual deltas by masking OCR text boxes out of the pixel diff and clustering the remaining residuals. Treat these as hints for icons, controls, and separators, not semantic icon recognition.
64-
- Add `--overlay-refs` to `diff screenshot` when you also want a separate current-screen overlay guide. The raw screenshot is still used for pixel comparison; the overlay guide is only context for non-text controls, icons, and tappable regions. When overlay refs intersect changed regions, the output lists the best current-screen ref matches under the affected region.
66+
- Add `--overlay-refs` to `diff screenshot` when you also want a separate current-screen overlay guide for a live capture. The raw screenshot is still used for pixel comparison; the overlay guide is only context for non-text controls, icons, and tappable regions. When overlay refs intersect changed regions, the output lists the best current-screen ref matches under the affected region. Saved-image comparisons do not have live accessibility refs, so omit `--overlay-refs` when passing a current image path.
6567

6668
## Session recording
6769

src/__tests__/cli-diff.test.ts

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -283,6 +283,57 @@ describe('cli diff commands', () => {
283283
}
284284
});
285285

286+
test('diff screenshot uses supplied current image instead of capturing from daemon', async () => {
287+
const dir = fs.mkdtempSync(path.join(os.tmpdir(), 'cli-diff-test-'));
288+
const baseline = path.join(dir, 'baseline.png');
289+
const current = path.join(dir, 'current.png');
290+
fs.writeFileSync(baseline, solidPngBuffer(10, 10, { r: 0, g: 0, b: 0 }));
291+
fs.writeFileSync(current, solidPngBuffer(10, 10, { r: 255, g: 255, b: 255 }));
292+
293+
try {
294+
const result = await runCliCapture([
295+
'diff',
296+
'screenshot',
297+
'--baseline',
298+
baseline,
299+
current,
300+
'--threshold',
301+
'0',
302+
]);
303+
assert.equal(result.code, null);
304+
assert.equal(result.calls.length, 0);
305+
assert.match(result.stdout, /100% pixels differ/);
306+
assert.match(result.stdout, /100 different \/ 100 total pixels/);
307+
assert.equal(result.stderr, '');
308+
} finally {
309+
fs.rmSync(dir, { recursive: true, force: true });
310+
}
311+
});
312+
313+
test('diff screenshot rejects overlay refs with supplied current image', async () => {
314+
const dir = fs.mkdtempSync(path.join(os.tmpdir(), 'cli-diff-test-'));
315+
const baseline = path.join(dir, 'baseline.png');
316+
const current = path.join(dir, 'current.png');
317+
fs.writeFileSync(baseline, solidPngBuffer(10, 10, { r: 0, g: 0, b: 0 }));
318+
fs.writeFileSync(current, solidPngBuffer(10, 10, { r: 255, g: 255, b: 255 }));
319+
320+
try {
321+
const result = await runCliCapture([
322+
'diff',
323+
'screenshot',
324+
'--baseline',
325+
baseline,
326+
current,
327+
'--overlay-refs',
328+
]);
329+
assert.equal(result.code, 1);
330+
assert.equal(result.calls.length, 0);
331+
assert.match(result.stderr, /saved-image comparisons have no live accessibility refs/);
332+
} finally {
333+
fs.rmSync(dir, { recursive: true, force: true });
334+
}
335+
});
336+
286337
test('diff screenshot uses os.tmpdir for temporary current capture', async () => {
287338
const dir = fs.mkdtempSync(path.join(os.tmpdir(), 'cli-diff-test-'));
288339
const baseline = path.join(dir, 'baseline.png');

src/cli/commands/screenshot.ts

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,13 @@ export const diffCommand: ClientCommandHandler = async ({ positionals, flags, cl
5252

5353
const baselinePath = resolveUserPath(baselineRaw);
5454
const outputPath = typeof flags.out === 'string' ? resolveUserPath(flags.out) : undefined;
55+
const currentRaw = positionals[1];
56+
if (positionals.length > 2) {
57+
throw new AppError(
58+
'INVALID_ARGS',
59+
'diff screenshot accepts at most one current screenshot path',
60+
);
61+
}
5562

5663
let thresholdNum = 0.1;
5764
if (flags.threshold != null && flags.threshold !== '') {
@@ -61,6 +68,21 @@ export const diffCommand: ClientCommandHandler = async ({ positionals, flags, cl
6168
}
6269
}
6370

71+
if (currentRaw) {
72+
if (flags.overlayRefs) {
73+
throw new AppError(
74+
'INVALID_ARGS',
75+
'diff screenshot <current.png> cannot use --overlay-refs because saved-image comparisons have no live accessibility refs',
76+
);
77+
}
78+
const result = await compareScreenshots(baselinePath, resolveUserPath(currentRaw), {
79+
threshold: thresholdNum,
80+
outputPath,
81+
});
82+
writeCommandOutput(flags, result, () => formatScreenshotDiffText(result));
83+
return true;
84+
}
85+
6486
const tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'agent-device-diff-current-'));
6587
const tmpScreenshotPath = path.join(tmpDir, `current-${Date.now()}.png`);
6688
const screenshotResult = await client.capture.screenshot({ path: tmpScreenshotPath });

src/utils/command-schema.ts

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -993,10 +993,10 @@ const COMMAND_SCHEMAS: Record<string, CommandSchema> = {
993993
},
994994
diff: {
995995
usageOverride:
996-
'diff snapshot | diff screenshot --baseline <path> [--out <diff.png>] [--threshold <0-1>] [--overlay-refs]',
996+
'diff snapshot | diff screenshot --baseline <path> [current.png] [--out <diff.png>] [--threshold <0-1>] [--overlay-refs]',
997997
helpDescription: 'Diff accessibility snapshot or compare screenshots pixel-by-pixel',
998998
summary: 'Diff snapshot or screenshot',
999-
positionalArgs: ['kind'],
999+
positionalArgs: ['kind', 'current?'],
10001000
allowedFlags: [...SNAPSHOT_FLAGS, 'baseline', 'threshold', 'out', 'overlayRefs'],
10011001
},
10021002
'ensure-simulator': {

src/utils/output.ts

Lines changed: 24 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -218,7 +218,8 @@ export function formatScreenshotDiffText(data: ScreenshotDiffResult): string {
218218
const indicator = useColor ? colorize('✗', 'red') : '✗';
219219
const pctLabel =
220220
mismatchPercentage === 0 && differentPixels > 0 ? '<0.01' : String(mismatchPercentage);
221-
lines.push(`${indicator} ${pctLabel}% pixels differ`);
221+
const summary = `${pctLabel}% pixels differ`;
222+
lines.push(`${indicator} ${useColor ? colorize(summary, 'red') : summary}`);
222223
}
223224

224225
if (diffPath && !match) {
@@ -244,13 +245,13 @@ export function formatScreenshotDiffText(data: ScreenshotDiffResult): string {
244245

245246
const hints = !match && !dimensionMismatch ? formatScreenshotDiffHints(data) : [];
246247
if (hints.length > 0) {
247-
lines.push(' Hints:');
248+
lines.push(` ${formatMuted('Hints:', useColor)}`);
248249
for (const hint of hints) lines.push(` - ${hint}`);
249250
}
250251

251252
const regions = Array.isArray(data.regions) ? data.regions : [];
252253
if (!match && !dimensionMismatch && regions.length > 0) {
253-
lines.push(' Changed regions:');
254+
lines.push(` ${formatMuted('Changed regions:', useColor)}`);
254255
for (const region of regions.slice(0, 5)) {
255256
const share =
256257
region.shareOfDiffPercentage === 0 && region.differentPixels > 0
@@ -280,11 +281,17 @@ export function formatScreenshotDiffText(data: ScreenshotDiffResult): string {
280281
if (!match && !dimensionMismatch && ocrMatches.length > 0) {
281282
const shownOcrMatches = ocrMatches.slice(0, 8);
282283
lines.push(
283-
` OCR text deltas (${data.ocr?.provider}; baselineBlocks=${data.ocr?.baselineBlocks} ` +
284-
`currentBlocks=${data.ocr?.currentBlocks}; showing ${shownOcrMatches.length}/${ocrMatches.length}; px):`,
284+
` ${formatMuted(
285+
`OCR text deltas (${data.ocr?.provider}; baselineBlocks=${data.ocr?.baselineBlocks} ` +
286+
`currentBlocks=${data.ocr?.currentBlocks}; showing ${shownOcrMatches.length}/${ocrMatches.length}; px):`,
287+
useColor,
288+
)}`,
285289
);
286290
lines.push(
287-
' item | text | movePx | sizeDeltaPx | bboxBaseline | bboxCurrent | confidence | issueHint',
291+
` ${formatMuted(
292+
'item | text | movePx | sizeDeltaPx | bboxBaseline | bboxCurrent | confidence | issueHint',
293+
useColor,
294+
)}`,
288295
);
289296
for (const [index, ocrMatch] of shownOcrMatches.entries()) {
290297
const delta = ocrMatch.delta;
@@ -303,9 +310,14 @@ export function formatScreenshotDiffText(data: ScreenshotDiffResult): string {
303310
if (!match && !dimensionMismatch && nonTextDeltas.length > 0) {
304311
const shownNonTextDeltas = nonTextDeltas.slice(0, 8);
305312
lines.push(
306-
` Non-text visual deltas (showing ${shownNonTextDeltas.length}/${nonTextDeltas.length}; px):`,
313+
` ${formatMuted(
314+
`Non-text visual deltas (showing ${shownNonTextDeltas.length}/${nonTextDeltas.length}; px):`,
315+
useColor,
316+
)}`,
317+
);
318+
lines.push(
319+
` ${formatMuted('item | region | slot | kind | bboxCurrent | nearestText', useColor)}`,
307320
);
308-
lines.push(' item | region | slot | kind | bboxCurrent | nearestText');
309321
for (const delta of shownNonTextDeltas) {
310322
lines.push(
311323
` ${delta.index} | ${delta.regionIndex ? `r${delta.regionIndex}` : '-'} | ` +
@@ -437,6 +449,10 @@ function colorize(text: string, format: Parameters<typeof styleText>[0]): string
437449
return styleText(format, text);
438450
}
439451

452+
function formatMuted(text: string, useColor: boolean): string {
453+
return useColor ? colorize(text, 'dim') : text;
454+
}
455+
440456
function buildSnapshotNotices(
441457
data: Record<string, unknown>,
442458
nodes: SnapshotNode[],

website/docs/docs/commands.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -544,6 +544,7 @@ agent-device screenshot textedit.png # App-session window capture on macOS
544544
agent-device screenshot --fullscreen # Force full-screen capture on macOS app sessions
545545
agent-device open --platform macos --surface desktop && agent-device screenshot desktop.png
546546
agent-device diff screenshot --baseline baseline.png --out diff.png
547+
agent-device diff screenshot --baseline baseline.png current.png --out diff.png
547548
agent-device diff screenshot --baseline baseline.png --out diff.png --overlay-refs
548549
agent-device record start # Start screen recording to auto filename
549550
agent-device record start session.mp4 # Start recording to explicit path
@@ -553,10 +554,10 @@ agent-device record stop # Stop active recording
553554

554555
- Recordings always produce a video artifact. When touch visualization is enabled, they also produce a gesture telemetry sidecar that can be used for post-processing or inspection.
555556
- `screenshot --overlay-refs` captures a fresh full snapshot and burns visible `@eN` refs plus their target rectangles into the saved PNG.
556-
- `diff screenshot` compares the current screenshot to `--baseline`, prints ranked changed regions with screen-space rectangles, shape, size, density, average color, and luminance, and writes a diff PNG with a light grayscale current-screen context, red-tinted changed pixels, and outlined changed regions when `--out` is provided. JSON also includes normalized bounds.
557+
- `diff screenshot` compares the current live screenshot to `--baseline`, or compares `--baseline` to an optional saved `current.png` path without requiring an active session, then prints ranked changed regions with screen-space rectangles, shape, size, density, average color, and luminance, and writes a diff PNG with a light grayscale current-screen context, red-tinted changed pixels, and outlined changed regions when `--out` is provided. JSON also includes normalized bounds.
557558
- If `tesseract` is installed, `diff screenshot` also adds best-effort OCR text deltas, movement clusters, and bbox size-change hints to the text and JSON output. OCR improves descriptions only; it does not change the pixel comparison or the diff PNG.
558559
- When OCR is available, `diff screenshot` also reports best-effort non-text visual deltas by masking OCR text boxes out of the diff and clustering remaining residuals. These are hints for icons, controls, and separators, not semantic icon recognition.
559-
- `diff screenshot --overlay-refs` additionally writes a separate current-screen overlay guide without using that annotated image for the pixel comparison. If current-screen refs intersect changed regions, the output lists the best ref matches under those regions.
560+
- `diff screenshot --overlay-refs` additionally writes a separate current-screen overlay guide for live captures without using that annotated image for the pixel comparison. If current-screen refs intersect changed regions, the output lists the best ref matches under those regions. Saved-image comparisons do not have live accessibility refs, so `--overlay-refs` is unavailable when a `current.png` path is provided.
560561
- In `--json` mode, each overlay ref also includes a screenshot-space `center` point for coordinate fallback like `press <x> <y>`.
561562
- Burned-in touch overlays are exported only on macOS hosts, because the overlay pipeline depends on Swift + AVFoundation helpers.
562563
- On Linux or other non-macOS hosts, `record stop` still succeeds and returns the raw video plus telemetry sidecar, and includes `overlayWarning` when burn-in overlays were skipped.

0 commit comments

Comments
 (0)