docs: split perf guidance into skill reference

thymikee · thymikee · commit 25f33e3f1633 · 2026-02-25T13:20:27.000+01:00
diff --git a/skills/agent-device/SKILL.md b/skills/agent-device/SKILL.md
@@ -102,18 +102,10 @@ agent-device trace stop ./trace.log
 agent-device batch --steps-file /tmp/batch-steps.json --json
 ```
 
-### Performance Check (current support)
+### Performance Check
 
-```bash
-agent-device open Settings --platform ios
-agent-device perf --json
-```
-
-- `perf` (alias: `metrics`) is session-scoped and currently reports startup timing samples only.
-- Sampling method is `open-command-roundtrip` in milliseconds around each `open` dispatch.
-- Startup values are useful for regression tracking between builds and runs on the same target/setup.
-- Do not treat current startup value as app-level first frame / first interactive telemetry.
-- `fps`, `memory`, and `cpu` are placeholders in the current release.
+- Use `agent-device perf --json` (or `metrics --json`) after `open`.
+- For detailed metric semantics, caveats, and interpretation guidance, see [references/perf-metrics.md](references/perf-metrics.md).
 
 ## Guardrails (High Value Only)
 
@@ -131,7 +123,6 @@ agent-device perf --json
 - `full|limited` mode applies only to iOS `photos`; other targets reject mode.
 - On Android, non-ASCII `fill/type` may require an ADB keyboard IME on some system images; only install IME APKs from trusted sources and verify checksum/signature.
 - If using `--save-script`, prefer explicit path syntax (`--save-script=flow.ad` or `./flow.ad`).
-- For perf analysis, compare like-for-like runs (same device, app build, and session workflow) to reduce noise.
 
 ## Security and Trust Notes
 
@@ -157,3 +148,4 @@ agent-device perf --json
 - [references/video-recording.md](references/video-recording.md)
 - [references/coordinate-system.md](references/coordinate-system.md)
 - [references/batching.md](references/batching.md)
+- [references/perf-metrics.md](references/perf-metrics.md)
diff --git a/skills/agent-device/references/perf-metrics.md b/skills/agent-device/references/perf-metrics.md
@@ -0,0 +1,53 @@
+# Performance Metrics (`perf` / `metrics`)
+
+Use this reference when you need to measure launch performance in agent workflows.
+
+## Quick flow
+
+```bash
+agent-device open Settings --platform ios
+agent-device perf --json
+```
+
+Alias:
+
+```bash
+agent-device metrics --json
+```
+
+## What is measured today
+
+- Session-scoped `startup` timing only.
+- Sampling method: `open-command-roundtrip`.
+- Unit: milliseconds (`ms`).
+- Source: elapsed wall-clock time around each session `open` command dispatch for the active app target.
+
+## Output fields to use
+
+- `metrics.startup.lastDurationMs`: most recent startup sample.
+- `metrics.startup.lastMeasuredAt`: ISO timestamp of most recent sample.
+- `metrics.startup.sampleCount`: number of retained samples.
+- `metrics.startup.samples[]`: recent startup history for the current session.
+- `sampling.startup.method`: current sampling method identifier.
+
+## Platform support (current)
+
+- iOS simulator: supported for startup sampling.
+- iOS physical device: supported for startup sampling.
+- Android emulator/device: supported for startup sampling.
+- `fps`, `memory`, and `cpu`: currently placeholders (`available: false`).
+
+## Interpretation guidance
+
+- Treat startup values as command round-trip timing, not true app first-frame or first-interactive telemetry.
+- Compare like-for-like runs:
+  - same device target
+  - same app build
+  - same workflow/session steps
+- Use multiple runs and compare trend/median, not one-off samples.
+
+## Common pitfalls
+
+- Running `perf` before any `open` in the session yields no startup sample yet.
+- Comparing values across different devices/runtimes introduces large noise.
+- Interpreting current `startup` as CPU/FPS/memory would be incorrect.