Skip to content

Commit 996d1fc

Browse files
authored
feat: add session perf metrics command (#121)
* feat: add session perf metrics command * refactor: simplify open perf result shaping * docs: improve perf guidance in docs and skill * docs: split perf guidance into skill reference
1 parent ab33826 commit 996d1fc

13 files changed

Lines changed: 378 additions & 4 deletions

File tree

README.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ The project is in early development and considered experimental. Pull requests a
1717
- Core commands: `open`, `back`, `home`, `app-switcher`, `press`, `long-press`, `focus`, `type`, `fill`, `scroll`, `scrollintoview`, `wait`, `alert`, `screenshot`, `close`, `reinstall`, `push`.
1818
- Inspection commands: `snapshot` (accessibility tree), `diff snapshot` (structural baseline diff), `appstate`, `apps`, `devices`.
1919
- Clipboard commands: `clipboard read`, `clipboard write <text>`.
20+
- Performance command: `perf` (alias: `metrics`) returns a metrics JSON blob for the active session; startup timing is currently sampled.
2021
- App logs: `logs path` returns session log metadata; `logs start` / `logs stop` stream app output; `logs clear` truncates session app logs; `logs clear --restart` resets and restarts stream in one step; `logs doctor` checks readiness; `logs mark` writes timeline markers.
2122
- Device tooling: `adb` (Android), `simctl`/`devicectl` (iOS via Xcode).
2223
- Minimal dependencies; TypeScript executed directly on Node 22+ (no build step).
@@ -154,6 +155,7 @@ agent-device scrollintoview @e42
154155
- `settings faceid match|nonmatch|enroll|unenroll` (iOS simulator only)
155156
- `settings permission grant|deny|reset camera|microphone|photos|contacts|notifications [full|limited]`
156157
- `appstate`, `apps`, `devices`, `session list`
158+
- `perf` (alias: `metrics`)
157159

158160
Push notification simulation:
159161

@@ -278,6 +280,25 @@ Assertions:
278280
- `is` predicates: `visible`, `hidden`, `exists`, `editable`, `selected`, `text`.
279281
- `is text` uses exact equality.
280282

283+
Performance metrics:
284+
- `perf` (or `metrics`) requires an active session and returns a JSON metrics blob.
285+
- Current metric: `startup` sampled from the elapsed wall-clock time around each session `open` command dispatch (`open-command-roundtrip`), unit `ms`.
286+
- Startup samples are session-scoped and include sample history from recent `open` actions.
287+
- Platform support for current sampling: iOS simulator, iOS physical device, Android emulator/device.
288+
- `fps`, `memory`, and `cpu` are reported as not yet implemented in this release.
289+
- Quick usage:
290+
291+
```bash
292+
agent-device open Settings --platform ios
293+
agent-device perf --json
294+
```
295+
296+
- How to read it:
297+
- `metrics.startup.lastDurationMs`: most recent startup sample in milliseconds.
298+
- `metrics.startup.samples[]`: recent startup history for this session.
299+
- `sampling.startup.method`: currently `open-command-roundtrip`.
300+
- Caveat: startup here is command-to-launch round-trip timing, not true app TTI/first-interactive telemetry.
301+
281302
Replay update:
282303
- `replay <path>` runs deterministic replay from `.ad` scripts.
283304
- `replay -u <path>` attempts selector updates on failures and atomically rewrites the same file.

skills/agent-device/SKILL.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -88,6 +88,7 @@ agent-device is visible 'id="anchor"'
8888
agent-device appstate
8989
agent-device clipboard read
9090
agent-device clipboard write "token"
91+
agent-device perf --json
9192
agent-device push <bundle|package> <payload.json|inline-json>
9293
agent-device get text @e1
9394
agent-device screenshot out.png
@@ -103,6 +104,11 @@ agent-device trace stop ./trace.log
103104
agent-device batch --steps-file /tmp/batch-steps.json --json
104105
```
105106

107+
### Performance Check
108+
109+
- Use `agent-device perf --json` (or `metrics --json`) after `open`.
110+
- For detailed metric semantics, caveats, and interpretation guidance, see [references/perf-metrics.md](references/perf-metrics.md).
111+
106112
## Guardrails (High Value Only)
107113

108114
- Re-snapshot after UI mutations (navigation/modal/list changes).
@@ -145,3 +151,4 @@ agent-device batch --steps-file /tmp/batch-steps.json --json
145151
- [references/video-recording.md](references/video-recording.md)
146152
- [references/coordinate-system.md](references/coordinate-system.md)
147153
- [references/batching.md](references/batching.md)
154+
- [references/perf-metrics.md](references/perf-metrics.md)
Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
# Performance Metrics (`perf` / `metrics`)
2+
3+
Use this reference when you need to measure launch performance in agent workflows.
4+
5+
## Quick flow
6+
7+
```bash
8+
agent-device open Settings --platform ios
9+
agent-device perf --json
10+
```
11+
12+
Alias:
13+
14+
```bash
15+
agent-device metrics --json
16+
```
17+
18+
## What is measured today
19+
20+
- Session-scoped `startup` timing only.
21+
- Sampling method: `open-command-roundtrip`.
22+
- Unit: milliseconds (`ms`).
23+
- Source: elapsed wall-clock time around each session `open` command dispatch for the active app target.
24+
25+
## Output fields to use
26+
27+
- `metrics.startup.lastDurationMs`: most recent startup sample.
28+
- `metrics.startup.lastMeasuredAt`: ISO timestamp of most recent sample.
29+
- `metrics.startup.sampleCount`: number of retained samples.
30+
- `metrics.startup.samples[]`: recent startup history for the current session.
31+
- `sampling.startup.method`: current sampling method identifier.
32+
33+
## Platform support (current)
34+
35+
- iOS simulator: supported for startup sampling.
36+
- iOS physical device: supported for startup sampling.
37+
- Android emulator/device: supported for startup sampling.
38+
- `fps`, `memory`, and `cpu`: currently placeholders (`available: false`).
39+
40+
## Interpretation guidance
41+
42+
- Treat startup values as command round-trip timing, not true app first-frame or first-interactive telemetry.
43+
- Compare like-for-like runs:
44+
- same device target
45+
- same app build
46+
- same workflow/session steps
47+
- Use multiple runs and compare trend/median, not one-off samples.
48+
49+
## Common pitfalls
50+
51+
- Running `perf` before any `open` in the session yields no startup sample yet.
52+
- Comparing values across different devices/runtimes introduces large noise.
53+
- Interpreting current `startup` as CPU/FPS/memory would be incorrect.

src/cli.ts

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -368,6 +368,11 @@ export async function runCli(argv: string[], deps: CliDeps = DEFAULT_CLI_DEPS):
368368
return;
369369
}
370370
}
371+
if (command === 'perf') {
372+
process.stdout.write(`${JSON.stringify(data, null, 2)}\n`);
373+
if (logTailStopper) logTailStopper();
374+
return;
375+
}
371376
}
372377
if (logTailStopper) logTailStopper();
373378
return;

src/core/__tests__/capabilities.test.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,7 @@ test('core commands support iOS simulator, iOS device, and Android', () => {
6969
'longpress',
7070
'logs',
7171
'open',
72+
'perf',
7273
'press',
7374
'record',
7475
'screenshot',

src/core/capabilities.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@ const COMMAND_CAPABILITY_MATRIX: Record<string, CommandCapability> = {
3333
logs: { ios: { simulator: true, device: true }, android: { emulator: true, device: true, unknown: true } },
3434
longpress: { ios: { simulator: true, device: true }, android: { emulator: true, device: true, unknown: true } },
3535
open: { ios: { simulator: true, device: true }, android: { emulator: true, device: true, unknown: true } },
36+
perf: { ios: { simulator: true, device: true }, android: { emulator: true, device: true, unknown: true } },
3637
reinstall: { ios: { simulator: true, device: true }, android: { emulator: true, device: true, unknown: true } },
3738
press: { ios: { simulator: true, device: true }, android: { emulator: true, device: true, unknown: true } },
3839
push: { ios: { simulator: true }, android: { emulator: true, device: true, unknown: true } },

src/daemon/handlers/__tests__/session.test.ts

Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -764,6 +764,118 @@ test('clipboard rejects unsupported iOS physical devices', async () => {
764764
}
765765
});
766766

767+
test('perf requires an active session', async () => {
768+
const sessionStore = makeSessionStore();
769+
const response = await handleSessionCommands({
770+
req: {
771+
token: 't',
772+
session: 'default',
773+
command: 'perf',
774+
positionals: [],
775+
flags: {},
776+
},
777+
sessionName: 'default',
778+
logPath: path.join(os.tmpdir(), 'daemon.log'),
779+
sessionStore,
780+
invoke: noopInvoke,
781+
});
782+
assert.ok(response);
783+
assert.equal(response?.ok, false);
784+
if (response && !response.ok) {
785+
assert.equal(response.error.code, 'SESSION_NOT_FOUND');
786+
}
787+
});
788+
789+
test('perf returns startup samples captured from open actions', async () => {
790+
const sessionStore = makeSessionStore();
791+
const sessionName = 'perf-session';
792+
const measuredAt = new Date('2026-02-24T10:00:00.000Z').toISOString();
793+
const session = makeSession(sessionName, {
794+
platform: 'ios',
795+
id: 'sim-1',
796+
name: 'iPhone 16',
797+
kind: 'simulator',
798+
booted: true,
799+
});
800+
session.actions.push({
801+
ts: Date.now(),
802+
command: 'open',
803+
positionals: ['Settings'],
804+
flags: {},
805+
result: {
806+
startup: {
807+
durationMs: 184,
808+
measuredAt,
809+
method: 'open-command-roundtrip',
810+
appTarget: 'Settings',
811+
appBundleId: 'com.apple.Preferences',
812+
},
813+
},
814+
});
815+
sessionStore.set(sessionName, session);
816+
817+
const response = await handleSessionCommands({
818+
req: {
819+
token: 't',
820+
session: sessionName,
821+
command: 'perf',
822+
positionals: [],
823+
flags: {},
824+
},
825+
sessionName,
826+
logPath: path.join(os.tmpdir(), 'daemon.log'),
827+
sessionStore,
828+
invoke: noopInvoke,
829+
});
830+
assert.ok(response);
831+
assert.equal(response?.ok, true);
832+
if (response && response.ok) {
833+
const startup = (response.data?.metrics as any)?.startup;
834+
assert.equal(startup?.available, true);
835+
assert.equal(startup?.lastDurationMs, 184);
836+
assert.equal(startup?.lastMeasuredAt, measuredAt);
837+
assert.equal(startup?.method, 'open-command-roundtrip');
838+
assert.equal(startup?.sampleCount, 1);
839+
assert.equal(Array.isArray(startup?.samples), true);
840+
}
841+
});
842+
843+
test('perf reports startup metric as unavailable when no sample exists', async () => {
844+
const sessionStore = makeSessionStore();
845+
const sessionName = 'perf-session-empty';
846+
sessionStore.set(
847+
sessionName,
848+
makeSession(sessionName, {
849+
platform: 'android',
850+
id: 'emulator-5554',
851+
name: 'Pixel Emulator',
852+
kind: 'emulator',
853+
booted: true,
854+
}),
855+
);
856+
857+
const response = await handleSessionCommands({
858+
req: {
859+
token: 't',
860+
session: sessionName,
861+
command: 'perf',
862+
positionals: [],
863+
flags: {},
864+
},
865+
sessionName,
866+
logPath: path.join(os.tmpdir(), 'daemon.log'),
867+
sessionStore,
868+
invoke: noopInvoke,
869+
});
870+
assert.ok(response);
871+
assert.equal(response?.ok, true);
872+
if (response && response.ok) {
873+
const startup = (response.data?.metrics as any)?.startup;
874+
assert.equal(startup?.available, false);
875+
assert.match(String(startup?.reason ?? ''), /no startup sample captured yet/i);
876+
}
877+
});
878+
767879
test('open URL on existing iOS session clears stale app bundle id', async () => {
768880
const sessionStore = makeSessionStore();
769881
const sessionName = 'ios-session';

0 commit comments

Comments
 (0)