You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Rename skill from agent-device-pr-media to agent-device-evidence
The old name baked in "PR" and "media" - both became misleading once
issues entered scope as a valid input source and the skill's output
grew beyond MP4s (stills + JSON manifest).
`agent-device-evidence` keeps the parent-skill prefix, drops the
source-kind specifier, and uses "evidence" to cover all outputs.
Cache path moved from ~/.cache/agent-device-pr-media/ to
~/.cache/agent-device-evidence/. Prior caches under the old path are
orphaned; users can `rm -rf ~/.cache/agent-device-pr-media` to
reclaim disk - no migration script for v1.
Copy file name to clipboardExpand all lines: .claude/skills/agent-device-evidence/ISSUES.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,4 +1,4 @@
1
-
# agent-device-pr-media issues
1
+
# agent-device-evidence issues
2
2
3
3
## Resolved by v1 design (SKILL.md)
4
4
@@ -41,7 +41,7 @@
41
41
-**`### Tests` is the only hard structural anchor.** Everything inside it is freeform - authors use single numbered lists, restarted numbered lists, h4 sub-headers, prose, "N/A", or leave it empty. The skill does **not** enforce a separator convention. Flow segmentation is LLM-driven; the LLM uses whatever signals are present (h4 headers, numbered restarts, prose markers like "Test case N:" / "Repeat with...", state-change cues). Sample (9 recent merged PRs, 2026-05-07): 0/9 used `#### Test case N:` headers - that convention exists (e.g. PR #89743) but is a minority pattern, not a default.
42
42
-**Platform restriction is opt-in** via PR title or Tests prose ("iOS only", "Android only", "On iOS:"). Default is both. No file-path heuristics, no PR-checklist parsing.
43
43
-**Per-flow artifact**: one MP4 per flow per platform (or one PNG for verify-only single-step flows). Not one big MP4.
## Generalize input source: PR or issue (added 2026-05-07)
47
47
@@ -56,14 +56,14 @@
56
56
-**`expected` field added to per-flow manifest** (issues only) - populated from `## Expected Result:`. The driver MAY use it as a final-state assertion target.
57
57
-**New exit code `8 BAD_INPUT`** for malformed / non-PR-non-issue source URLs. Replaces the old exit `2 SKIP` behavior.
58
58
-**Run-output dir generalized** from `<pr-num>/` to `<source-kind>-<source-num>/` (e.g. `pr-89475/`, `issue-89855/`).
59
-
-**Skill name not yet aligned with broadened scope.** Directory + cache path + cross-links still say `agent-device-pr-media`. Rename to something like `agent-device-flow-evidence` or `agent-device-test-recorder` is a candidate follow-up.
59
+
-**Skill renamed** from `agent-device-pr-media` to `agent-device-evidence` to match the broadened scope (PR + issue inputs, not just PRs). Cache path moved from `~/.cache/agent-device-pr-media/`to `~/.cache/agent-device-evidence/`. Existing prior caches under the old path are orphaned - users can `rm -rf ~/.cache/agent-device-pr-media` to reclaim disk; not worth a migration script for v1.
60
60
61
61
Reference: https://github.com/Expensify/App/issues/89855 (and 6 other recent bug-tagged issues sampled 2026-05-07) - all follow the `## Action Performed:` + `## Expected Result:` + `## Platforms:` template consistently.
62
62
63
63
## Phase 1 cache (added 2026-05-07)
64
64
65
65
-**Problem**: Phase 1 (LLM-driven exploration) is the expensive part; Phase 2 is just `agent-device replay`. Re-running on a PR whose Tests steps haven't changed wastes Phase 1's full cost on every invocation.
66
-
-**Solution**: content-addressable `.ad` cache at `~/.cache/agent-device-pr-media/.ad-cache/<fingerprint>.ad`, shared across PRs.
66
+
-**Solution**: content-addressable `.ad` cache at `~/.cache/agent-device-evidence/.ad-cache/<fingerprint>.ad`, shared across PRs.
67
67
-**Fingerprint** = `sha256(precondition + json(steps) + platform + bundle_id + agent_device_version)`. Title is excluded (label only). PR number / app SHA excluded so the cache shares across PRs; correctness enforced at replay time, not lookup time.
68
68
-**Self-healing**: on Phase 2 replay failure (cache hit path only), invalidate the entry, re-run Phase 1, retry Phase 2 once. Stale entries get evicted lazily as the underlying app changes.
69
69
-**Manifest**: per-flow `cache: "hit" | "miss" | "invalidated" | "bypassed"` and `fingerprint` so reviewers can see what was reused.
Copy file name to clipboardExpand all lines: .claude/skills/agent-device-evidence/SKILL.md
+7-7Lines changed: 7 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,10 +1,10 @@
1
1
---
2
-
name: agent-device-pr-media
2
+
name: agent-device-evidence
3
3
description: Records iOS/Android native MP4 evidence for test/repro flows extracted from an Expensify GitHub PR or issue. Use when the user asks to "record the flow for PR #X", "capture mobile evidence for issue #Y", or "produce screenshots/videos for <PR or issue URL>". Mobile-native only - declines mWeb and Desktop.
Records `iOS: Native` and `Android: Native` MP4 evidence for the test or repro steps declared in an Expensify GitHub **PR or issue**. The source of truth is the test/repro steps themselves, not the surrounding code or context - the skill works equally well on a PR's `### Tests` section, an issue's `## Action Performed:` block, or any future Markdown body where steps are clearly authored.
10
10
@@ -111,7 +111,7 @@ Phase 1 is the expensive part - it runs the LLM-driven exploration loop to produ
-**Hit** (and `--no-cache` is not set): copy cached `.ad` to `$TEST_FLOW.ad`, mark `cache: "hit"` in the manifest, **skip Phase 1 entirely**, proceed to Phase 2.
152
152
-**Miss** (or `--no-cache`): mark `cache: "miss"`, run Phase 1 normally.
153
153
3.**On Phase 1 success** (cache miss path): write `$TEST_FLOW.ad` to the cache, write `<fingerprint>.meta.json` with `{created_ts, original_pr, hits: 1}`.
@@ -172,7 +172,7 @@ Two phases per flow. Lifecycle delegated to the parent skill's bring-up. Phase 1
172
172
3.**Set up run directory** - persistent cache, latest-run-wins:
0 commit comments