Skip to content

Commit afd1b38

Browse files
Rename skill from agent-device-pr-media to agent-device-evidence
The old name baked in "PR" and "media" - both became misleading once issues entered scope as a valid input source and the skill's output grew beyond MP4s (stills + JSON manifest). `agent-device-evidence` keeps the parent-skill prefix, drops the source-kind specifier, and uses "evidence" to cover all outputs. Cache path moved from ~/.cache/agent-device-pr-media/ to ~/.cache/agent-device-evidence/. Prior caches under the old path are orphaned; users can `rm -rf ~/.cache/agent-device-pr-media` to reclaim disk - no migration script for v1.
1 parent 995d6bd commit afd1b38

2 files changed

Lines changed: 11 additions & 11 deletions

File tree

.claude/skills/agent-device-pr-media/ISSUES.md renamed to .claude/skills/agent-device-evidence/ISSUES.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# agent-device-pr-media issues
1+
# agent-device-evidence issues
22

33
## Resolved by v1 design (SKILL.md)
44

@@ -41,7 +41,7 @@
4141
- **`### Tests` is the only hard structural anchor.** Everything inside it is freeform - authors use single numbered lists, restarted numbered lists, h4 sub-headers, prose, "N/A", or leave it empty. The skill does **not** enforce a separator convention. Flow segmentation is LLM-driven; the LLM uses whatever signals are present (h4 headers, numbered restarts, prose markers like "Test case N:" / "Repeat with...", state-change cues). Sample (9 recent merged PRs, 2026-05-07): 0/9 used `#### Test case N:` headers - that convention exists (e.g. PR #89743) but is a minority pattern, not a default.
4242
- **Platform restriction is opt-in** via PR title or Tests prose ("iOS only", "Android only", "On iOS:"). Default is both. No file-path heuristics, no PR-checklist parsing.
4343
- **Per-flow artifact**: one MP4 per flow per platform (or one PNG for verify-only single-step flows). Not one big MP4.
44-
- **Persistent cache**: `~/.cache/agent-device-pr-media/<pr-num>/<run-ts>/`. Survives reboots; latest-run-wins.
44+
- **Persistent cache**: `~/.cache/agent-device-evidence/<pr-num>/<run-ts>/`. Survives reboots; latest-run-wins.
4545

4646
## Generalize input source: PR or issue (added 2026-05-07)
4747

@@ -56,14 +56,14 @@
5656
- **`expected` field added to per-flow manifest** (issues only) - populated from `## Expected Result:`. The driver MAY use it as a final-state assertion target.
5757
- **New exit code `8 BAD_INPUT`** for malformed / non-PR-non-issue source URLs. Replaces the old exit `2 SKIP` behavior.
5858
- **Run-output dir generalized** from `<pr-num>/` to `<source-kind>-<source-num>/` (e.g. `pr-89475/`, `issue-89855/`).
59-
- **Skill name not yet aligned with broadened scope.** Directory + cache path + cross-links still say `agent-device-pr-media`. Rename to something like `agent-device-flow-evidence` or `agent-device-test-recorder` is a candidate follow-up.
59+
- **Skill renamed** from `agent-device-pr-media` to `agent-device-evidence` to match the broadened scope (PR + issue inputs, not just PRs). Cache path moved from `~/.cache/agent-device-pr-media/` to `~/.cache/agent-device-evidence/`. Existing prior caches under the old path are orphaned - users can `rm -rf ~/.cache/agent-device-pr-media` to reclaim disk; not worth a migration script for v1.
6060

6161
Reference: https://github.com/Expensify/App/issues/89855 (and 6 other recent bug-tagged issues sampled 2026-05-07) - all follow the `## Action Performed:` + `## Expected Result:` + `## Platforms:` template consistently.
6262

6363
## Phase 1 cache (added 2026-05-07)
6464

6565
- **Problem**: Phase 1 (LLM-driven exploration) is the expensive part; Phase 2 is just `agent-device replay`. Re-running on a PR whose Tests steps haven't changed wastes Phase 1's full cost on every invocation.
66-
- **Solution**: content-addressable `.ad` cache at `~/.cache/agent-device-pr-media/.ad-cache/<fingerprint>.ad`, shared across PRs.
66+
- **Solution**: content-addressable `.ad` cache at `~/.cache/agent-device-evidence/.ad-cache/<fingerprint>.ad`, shared across PRs.
6767
- **Fingerprint** = `sha256(precondition + json(steps) + platform + bundle_id + agent_device_version)`. Title is excluded (label only). PR number / app SHA excluded so the cache shares across PRs; correctness enforced at replay time, not lookup time.
6868
- **Self-healing**: on Phase 2 replay failure (cache hit path only), invalidate the entry, re-run Phase 1, retry Phase 2 once. Stale entries get evicted lazily as the underlying app changes.
6969
- **Manifest**: per-flow `cache: "hit" | "miss" | "invalidated" | "bypassed"` and `fingerprint` so reviewers can see what was reused.

.claude/skills/agent-device-pr-media/SKILL.md renamed to .claude/skills/agent-device-evidence/SKILL.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
---
2-
name: agent-device-pr-media
2+
name: agent-device-evidence
33
description: Records iOS/Android native MP4 evidence for test/repro flows extracted from an Expensify GitHub PR or issue. Use when the user asks to "record the flow for PR #X", "capture mobile evidence for issue #Y", or "produce screenshots/videos for <PR or issue URL>". Mobile-native only - declines mWeb and Desktop.
44
allowed-tools: Bash(agent-device *) Bash(gh pr view *) Bash(gh issue view *) Bash(gh api *) Bash(mkdir -p *) Bash(rm -rf *) Bash(ls *) Bash(file *) Bash(test *) Bash(date *) Read Write
55
---
66

7-
# agent-device-pr-media
7+
# agent-device-evidence
88

99
Records `iOS: Native` and `Android: Native` MP4 evidence for the test or repro steps declared in an Expensify GitHub **PR or issue**. The source of truth is the test/repro steps themselves, not the surrounding code or context - the skill works equally well on a PR's `### Tests` section, an issue's `## Action Performed:` block, or any future Markdown body where steps are clearly authored.
1010

@@ -111,7 +111,7 @@ Phase 1 is the expensive part - it runs the LLM-driven exploration loop to produ
111111
### Cache layout
112112

113113
```
114-
~/.cache/agent-device-pr-media/.ad-cache/
114+
~/.cache/agent-device-evidence/.ad-cache/
115115
├── <fingerprint>.ad # the cached Phase 1 script
116116
└── <fingerprint>.meta.json # {created_ts, original_pr, last_used_ts, hits}
117117
```
@@ -147,7 +147,7 @@ Fields NOT included (intentionally):
147147
For each flow, in order:
148148

149149
1. **Compute fingerprint** from flow + platform + bundle_id + CLI version.
150-
2. **Look up** `~/.cache/agent-device-pr-media/.ad-cache/<fingerprint>.ad`:
150+
2. **Look up** `~/.cache/agent-device-evidence/.ad-cache/<fingerprint>.ad`:
151151
- **Hit** (and `--no-cache` is not set): copy cached `.ad` to `$TEST_FLOW.ad`, mark `cache: "hit"` in the manifest, **skip Phase 1 entirely**, proceed to Phase 2.
152152
- **Miss** (or `--no-cache`): mark `cache: "miss"`, run Phase 1 normally.
153153
3. **On Phase 1 success** (cache miss path): write `$TEST_FLOW.ad` to the cache, write `<fingerprint>.meta.json` with `{created_ts, original_pr, hits: 1}`.
@@ -172,7 +172,7 @@ Two phases per flow. Lifecycle delegated to the parent skill's bring-up. Phase 1
172172
3. **Set up run directory** - persistent cache, latest-run-wins:
173173
```bash
174174
PR_NUM=<num>; RUN_TS=$(date -u +%Y%m%dT%H%M%SZ)
175-
RUN_DIR="$HOME/.cache/agent-device-pr-media/$PR_NUM/$RUN_TS"
175+
RUN_DIR="$HOME/.cache/agent-device-evidence/$PR_NUM/$RUN_TS"
176176
mkdir -p "$RUN_DIR/ios" "$RUN_DIR/android"
177177
# Optional: rm -rf prior runs for this PR before mkdir to keep disk lean
178178
```
@@ -207,7 +207,7 @@ On cache miss (or `--no-cache`):
207207
test -s "$TEST_FLOW.ad" || { record per-flow status "phase1_failed: empty script"; continue }
208208
```
209209
210-
7. **Write to cache** - on success, copy `$TEST_FLOW.ad` to `~/.cache/agent-device-pr-media/.ad-cache/<fingerprint>.ad` and write the meta sidecar.
210+
7. **Write to cache** - on success, copy `$TEST_FLOW.ad` to `~/.cache/agent-device-evidence/.ad-cache/<fingerprint>.ad` and write the meta sidecar.
211211
212212
### Phase 2 - Recording (per flow, deterministic replay)
213213
@@ -262,7 +262,7 @@ For flows classified `kind: still`:
262262
### Run-output layout
263263
264264
```
265-
~/.cache/agent-device-pr-media/
265+
~/.cache/agent-device-evidence/
266266
├── .ad-cache/ # cross-source Phase 1 cache (see "Phase 1 cache")
267267
│ ├── <fingerprint>.ad
268268
│ └── <fingerprint>.meta.json

0 commit comments

Comments
 (0)