From 53ff9d49c5bad6d212e38e9069eb77789657b490 Mon Sep 17 00:00:00 2001 From: Anton Lykhoyda Date: Mon, 18 May 2026 15:37:42 +0200 Subject: [PATCH] docs: align README/CLAUDE.md/docs-site with 74-tool surface + add Actions concept section MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Audit reconciled doc claims against the actual code surface (commands/, agents/, skills/, scripts/cdp-bridge/src/index.ts) and surfaced the Actions concept that was already implemented but not documented as its own family. Counts (everything now reads from a single 2026-05-18 audit of trackedTool calls): - MCP tools: was "70+ / 64 / 51 / 38" depending on page → 74 everywhere - Commands: was "16 / 13 / 8 listed" → 17 everywhere - Agents: 5 (unchanged, was correct) Outdated info fixed: - docs-site/architecture.mdx implementation table said iOS routes through agent-device CLI; reality since PR #164 / D1219 is rn-fast-runner. Per-platform dispatch table now matches CLAUDE.md. - tools/index.mdx labelled cdp_interact as deprecated; the source description in index.ts does not, and the tool is the preferred JS-level interaction primitive. - astro.config.mjs sidebar omitted the Actions page, doctor, list-learned-actions, and run-action; all added. New "Actions" concept content: - docs-site/actions/index.mdx expanded with Why-hybrid section, composition pattern (worked prologue example), measured impact (210x speedup data point), comparison vs pure-script / pure-LLM alternatives, full tool surface (4 MCP + 2 commands), and the artifact-first protocol. - README Actions block tightened to one paragraph + updated table linking to the deep doc. - CLAUDE.md gains a contributor-facing Actions section under Architecture so future sessions internalise the LLM+pragmatic hybrid rationale. - MCP Tools table across all three (README / CLAUDE.md / tools/index.mdx) regrouped to surface Actions as a 5th family alongside CDP / Device / Testing / Macro-Asserts. Files changed: README.md, CLAUDE.md, docs-site/astro.config.mjs, docs-site/src/content/docs/actions/index.mdx, docs-site/src/content/docs/architecture.mdx, docs-site/src/content/docs/tools/index.mdx Co-Authored-By: Claude Opus 4.7 --- CLAUDE.md | 103 +++++++++++++++---- README.md | 27 ++--- docs-site/astro.config.mjs | 17 ++- docs-site/src/content/docs/actions/index.mdx | 81 ++++++++++++--- docs-site/src/content/docs/architecture.mdx | 43 +++++--- docs-site/src/content/docs/tools/index.mdx | 52 +++++++--- 6 files changed, 235 insertions(+), 88 deletions(-) diff --git a/CLAUDE.md b/CLAUDE.md index b2be6aa6..5c517ae2 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -43,15 +43,33 @@ Development scaffolding lives in the **sibling workspace repo**: - `maestro-runner` — auto-installed to `~/.maestro-runner/` ### Essential commands + +Authoring & lifecycle: ``` -/rn-dev-agent:setup — Check & install all prerequisites +/rn-dev-agent:setup — Check & install all prerequisites; scaffolds .rn-agent/ +/rn-dev-agent:doctor — 14-row diagnostic table for the whole environment +/rn-dev-agent:check-env — Quick environment-readiness check /rn-dev-agent:rn-feature-dev — Full 8-phase feature development pipeline -/rn-dev-agent:test-feature — Test a feature end-to-end on device +/rn-dev-agent:test-feature — Test a feature end-to-end; auto-records an action on pass /rn-dev-agent:debug-screen — Diagnose and fix the current screen -/rn-dev-agent:check-env — Verify environment readiness /rn-dev-agent:build-and-test — Build app, then test feature -/rn-dev-agent:proof-capture — Record proof video + screenshots -/rn-dev-agent:send-feedback — Report a bug +/rn-dev-agent:proof-capture — Rehearsal-gated video + screenshots + PR body +/rn-dev-agent:nav-graph — Extract / inspect the app navigation graph +/rn-dev-agent:send-feedback — Report a bug with sanitised environment context +``` + +Actions (replayable flows — see "Actions" section below): +``` +/rn-dev-agent:list-learned-actions [q] — Inventory of saved flows + feedback memories +/rn-dev-agent:run-action -e K=V — Replay a saved action; auto-repair-aware +``` + +Experience Engine (cross-session learning): +``` +/rn-dev-agent:rn-agent-health — Health check the local experience store +/rn-dev-agent:rn-agent-compact — Compact telemetry on disk +/rn-dev-agent:rn-agent-export — Export the experience store +/rn-dev-agent:rn-agent-import — Import an experience snapshot ``` ### How it works @@ -105,40 +123,83 @@ Fallback: `xcrun simctl` (iOS) + `adb` (Android) for device lifecycle (boot / in ### MCP Server (cdp-bridge) -64 tools exposed via MCP (count last audited 2026-04-28; per-category breakdown below predates several additions and is approximate): +**74 tools** exposed via MCP (re-audited 2026-05-18; counted from `trackedTool()` calls in `scripts/cdp-bridge/src/index.ts`). Five conceptual families: -**CDP tools** (React internals via Chrome DevTools Protocol over WebSocket): +**CDP tools** — React internals via Chrome DevTools Protocol over WebSocket: - `cdp_status` — health check with domain capabilities + reconnect state - `cdp_connect` / `cdp_disconnect` / `cdp_targets` — connection management - `cdp_evaluate` — arbitrary JS execution in Hermes -- `cdp_reload` — full reload with auto-reconnect -- `cdp_dev_settings` — programmatic dev menu actions -- `cdp_component_tree` / `cdp_component_state` — React fiber introspection +- `cdp_reload` / `cdp_restart` — full reload / restart with auto-reconnect +- `cdp_dev_settings` / `cdp_open_devtools` — dev menu + DevTools attach +- `cdp_component_tree` / `cdp_component_state` / `cdp_diagnostic_renderers` — React fiber introspection - `cdp_navigation_state` / `cdp_nav_graph` / `cdp_navigate` — navigation - `cdp_store_state` / `cdp_dispatch` — Redux/Zustand/React Query state -- `cdp_network_log` / `cdp_network_body` / `cdp_console_log` / `cdp_error_log` — buffered events -- `cdp_wait_for_network` — block until a network request matching url_pattern + method completes (D682, GH #65 P3) — collapses "tap → check → tap" into a single deterministic call -- `cdp_interact` — press/type/scroll by testID via fiber tree -- `cdp_heap_usage` — JS memory usage -- `cdp_cpu_profile` — CPU profiling with hot function ranking -- `cdp_object_inspect` — handle-based lazy object inspection -- `cdp_exception_breakpoint` — catch exceptions with timed capture +- `cdp_network_log` / `cdp_network_body` / `cdp_wait_for_network` — network buffer + sync (D682) +- `cdp_console_log` / `cdp_error_log` / `cdp_native_errors` / `cdp_metro_events` — log/error/metro streams +- `cdp_interact` — press/type/scroll by testID via fiber tree (JS-level; not deprecated — preferred over device_* when a testID is reliable) +- `cdp_heap_usage` / `cdp_cpu_profile` / `cdp_object_inspect` / `cdp_exception_breakpoint` — profiling + inspection +- `cdp_mmkv` — read/write MMKV storage - `cdp_set_shared_value` — set Reanimated SharedValue by testID for proof captures - `collect_logs` — parallel multi-source log collection -**Device tools** (14 — iOS: in-tree `rn-fast-runner` `/command` endpoint; Android: `agent-device` CLI): +**Device tools** (14, native interaction — iOS: in-tree `rn-fast-runner` `/command`; Android: `agent-device` CLI): - `device_list` / `device_screenshot` / `device_snapshot` - `device_find` / `device_press` / `device_fill` / `device_swipe` / `device_scroll` - `device_scrollintoview` / `device_back` / `device_longpress` / `device_pinch` - `device_permission` / `device_batch` +Plus device helpers filed alongside CDP in code: `device_deeplink`, `device_accept_system_dialog`, `device_dismiss_system_dialog`, `device_focus_next`, `device_pick_date`, `device_pick_value`, `device_record`, `device_reset_state`. + iOS-only quirks worth knowing: - `device_fill` may surface a Swift-internal `XCUIElement.typeText` quiescence-timeout from XCTest's main-thread sync. The TS client treats this specific error as success on `.type` (`meta.runnerTimeoutShim: true`) because the side-effect (text appended to the field) demonstrably succeeds — observed across the iOS-MVP smoke-tests. - `device_find` non-exact + `device_scrollintoview` ALWAYS route through the TS orchestrators on iOS (never the legacy `agent-device find/scrollintoview` CLI), so they don't respawn the upstream `AgentDeviceRunner`. -**Testing & composite tools** (13): -- `proof_step` / `cross_platform_verify` / `maestro_run` / `maestro_generate` / `maestro_test_all` -- `cdp_auto_login` + device helpers (deeplink, accept/dismiss dialog, focus_next, pick_date, pick_value) +**Actions** (the LLM/pragmatic hybrid — see the Actions section below): +- `cdp_run_action` — replay an action by id with `params`; orchestrates `maestro_run` + `cdp_repair_action` + retry; persists a `RunRecord` with `autoRepair` telemetry +- `cdp_repair_action` — fuzzy-match a stale `testID` against the live snapshot, patch the YAML, retry; refuses on human edits (mtime), >3 repairs/24h, or snapshot infra failure +- `cdp_record_test_save_as_action` — promote a recorded walk to `.rn-agent/actions/.yaml` with metadata header + sidecar; auto-promotes to `status: active` on first clean replay +- `cdp_record_test_*` — start / stop / generate / annotate / save / load / list (recorder upstream of actions) + +**Testing & composite tools**: +- `proof_step` / `cross_platform_verify` — verification primitives +- `maestro_run` / `maestro_generate` / `maestro_test_all` — Maestro orchestration +- `cdp_auto_login` — credentials-or-deeplink login wrapper + +**Macro-Asserts** (state-assertive replays — internal state, not pixels): +- `expect_redux` / `expect_route` / `expect_visible_by_testid` / `expect_text` + +### Actions — the LLM/pragmatic hybrid + +An **action** is a parameterised Maestro flow under `.rn-agent/actions/.yaml` with a metadata header (id / intent / tags / mutates / status / appId). Actions are **emitted by the agent** when `/test-feature` verification passes — they are not human-authored. They get replayed via `/run-action` (or directly by `cdp_run_action`) as **prologues** before the agent does new interactive work. + +**Why we have them.** LLM agents are good at improvising on novel screens, slow and stochastic at re-deriving things they've already seen. Pure-script approaches (Detox, Maestro, Appium) are the opposite — fast but brittle to UI drift. Actions sit in the middle: every successful verification adds one, every drift gets quietly absorbed by `cdp_repair_action`, every truly broken flow escalates. Measured: a 3-step wizard that takes ~14 min as an interactive walk runs in ~4 s replayed (~210× speedup); across 35 stories the average dropped from ~12 min to ~4 min once the corresponding actions existed. + +**Composition rule.** The agent never replays an entire job from a script. Each task is two regimes: +1. **Pragmatic reusable actions** for the predictable parts (login, navigation, multi-step setup, locale switching, dismissing gates). +2. **LLM-driven discovery** for the part that is actually new (verifying a specific UI state, exercising a new edge case, debugging a regression). + +**Artifact-first protocol.** `rn-tester` and `rn-debugger` agents are instructed (via `feedback_execute_artifacts_before_manual.md`) to scan saved actions before composing any new `device_*` primitives. Manual primitives are a **fallback**, not the default. Single source of truth for the inventory is `scripts/learned-actions.mjs` — shared by `/list-learned-actions`, `/run-action`, and both agents' Step 0 artifact scans. + +**Tool surface for actions** (one conceptual family — see "Actions" in the MCP server list above): + +| Tool / Command | Role | +|---|---| +| `cdp_record_test_save_as_action` | Promote a recorded walk → first-class action with metadata header + sidecar | +| `cdp_run_action` | Replay with params; orchestrates `maestro_run` + `cdp_repair_action` + retry; persists `RunRecord` with `autoRepair` telemetry (passed/failed/refused/skipped + phase timings) | +| `cdp_repair_action` | Fuzzy-match stale `testID` against live snapshot, patch YAML, retry; refuses on human edits (mtime), >3 repairs/24h, or snapshot infra failure | +| `/list-learned-actions` | Read-only inventory (single source of truth: `scripts/learned-actions.mjs`) | +| `/run-action -e K=V` | Side-effecting execution; gates safety checks (mutates flag, appId match, `${VAR}` coverage), then calls `cdp_run_action` | + +**Why hybrid beats either extreme**: + +| Failure mode | Pure script | Pure LLM | This plugin | +|---|---|---|---| +| `testID` renamed | Breaks; human re-records | Re-discovers each run | `cdp_repair_action` patches + retries + logs diff | +| Product logic changed | Passes anyway, masks bug | Probabilistically catches | Refuses to auto-patch logic break; surfaces failure | +| Net-new behaviour | Can't author | Re-derives every session | Discovers interactively, **auto-saves verified walk as new action** | +| Cost over time | Linear (drift = human) | Quadratic (full walk each session) | Sub-linear (drift absorbed, library compounds) | + +Full user-facing doc: [docs-site/actions](docs-site/src/content/docs/actions/index.mdx) (published at `lykhoyda.github.io/rn-dev-agent/actions/`). ### Key Technical Decisions diff --git a/README.md b/README.md index 0a03d687..b4b42294 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,7 @@ A [Claude Code](https://docs.anthropic.com/en/docs/claude-code) plugin that turns Claude into a React Native development partner. It explores your codebase, designs architecture, implements features, then **verifies everything live on the simulator** — reading the component tree, store state, and navigation stack through Chrome DevTools Protocol. -**70+ MCP tools** · **5 agents** · **16 commands** · **1180+ tests** · **46 best-practice rules** · [Full documentation](https://lykhoyda.github.io/rn-dev-agent/) +**74 MCP tools** · **5 agents** · **17 commands** · **1180+ tests** · **46 best-practice rules** · [Full documentation](https://lykhoyda.github.io/rn-dev-agent/) --- @@ -74,20 +74,20 @@ Claude runs an [8-phase pipeline](https://lykhoyda.github.io/rn-dev-agent/comman | `/rn-dev-agent:build-and-test ` | Build app (local or EAS), install on device, then test | | `/rn-dev-agent:proof-capture ` | Rehearsal-gated video + screenshots + PR body | -**Actions: replayable app flows** +**Actions: replayable app flows + the LLM/pragmatic hybrid** -A saved, replayable flow through your app — login, navigate to settings, reach a known state. The plugin records actions automatically when verification passes; you replay them in seconds. +An **action** is a saved Maestro flow the agent **emits** when `/test-feature` verification passes — not something you author. Each task is then composed of two regimes: **pragmatic reusable actions** for predictable parts (login, navigation, multi-step setup) and **LLM-driven discovery** for the part that's actually new. The agent uses actions as prologues to reach a known state before doing fresh interactive work. Measured: a 3-step wizard that took **13 min 55 s** as an interactive walk runs in **~4 s** when replayed — ~210× faster. | | | |---|---| -| **What** | A saved, replayable flow through your app (login, navigation, multi-step setup). | +| **What** | A saved, parameterised flow with a metadata header and `${KEY}` placeholders. | | **Where** | `.rn-agent/actions/.yaml`. The plugin's home in your project is `.rn-agent/`. | -| **Create one** | Run `/rn-dev-agent:test-feature `. When verification passes, the plugin saves the verified walk as an action. | -| **Run one** | List with `/rn-dev-agent:list-learned-actions`; replay with `/rn-dev-agent:run-action `. The agent also picks an action automatically when it needs to reach a known state (e.g. logged-in home) before doing new work. | -| **Self-repair** | If a `testID` changes, the plugin patches the action against the live UI and retries. Small UI drift is absorbed without re-recording; broken product logic is not auto-fixed. | -| **Why** | Known flows replay in seconds instead of being rediscovered interactively. Repeated setup work like login becomes one fast step. | +| **Create one** | Run `/rn-dev-agent:test-feature `. The verified walk is saved as an action. | +| **Run one** | List with `/rn-dev-agent:list-learned-actions`; replay with `/rn-dev-agent:run-action `. The agent also picks an action automatically when it needs to reach a known state. | +| **Self-repair** | If a `testID` changes, `cdp_repair_action` fuzzy-matches against the live snapshot, patches the YAML, and retries. Small UI drift absorbed; broken product logic surfaced, not auto-fixed. | +| **Why hybrid** | Pure scripts don't adapt; pure LLM re-derives everything every session. Actions are the memory of the LLM loop — every successful verification adds one, every drift gets quietly absorbed, every truly broken flow escalates. | -[Full actions guide](https://lykhoyda.github.io/rn-dev-agent/actions/) +[Full actions guide — why the hybrid matters, tool surface, comparison vs Detox/Maestro/pure-LLM](https://lykhoyda.github.io/rn-dev-agent/actions/) **Setup & diagnostics:** @@ -131,13 +131,14 @@ if (__DEV__) { ## MCP Tools -The plugin exposes a wide surface area of MCP tools across four families. See the [tools reference](https://lykhoyda.github.io/rn-dev-agent/tools/) for the full list. +The plugin exposes **74 MCP tools** across five families. See the [tools reference](https://lykhoyda.github.io/rn-dev-agent/tools/) for the full list. | Family | What it's for | Examples | |---|---|---| -| **CDP** | React internals via Chrome DevTools Protocol | `cdp_component_tree`, `cdp_store_state`, `cdp_evaluate`, `cdp_native_errors`, `cdp_record_test_*`, `cdp_repair_action` | -| **Device** | Native interaction with the simulator/emulator | `device_find`, `device_press`, `device_fill`, `device_screenshot`, `device_pick_date` | -| **Testing** | E2E replay and PR-ready proof | `proof_step`, `cross_platform_verify`, `maestro_run`, `cdp_run_action` | +| **CDP** | React internals via Chrome DevTools Protocol | `cdp_status`, `cdp_component_tree`, `cdp_store_state`, `cdp_evaluate`, `cdp_native_errors`, `cdp_navigate`, `collect_logs` | +| **Device** | Native interaction with the simulator/emulator | `device_find`, `device_press`, `device_fill`, `device_screenshot`, `device_pick_date`, `device_batch` | +| **Actions** | Record / replay / self-repair persistent flows ([guide](https://lykhoyda.github.io/rn-dev-agent/actions/)) | `cdp_run_action`, `cdp_repair_action`, `cdp_record_test_save_as_action`, `cdp_record_test_*` | +| **Testing** | E2E replay and PR-ready proof | `proof_step`, `cross_platform_verify`, `maestro_run`, `maestro_test_all`, `cdp_auto_login` | | **Macro-Asserts** | State-assertive replays — internal state, not pixels | `expect_redux`, `expect_route`, `expect_visible_by_testid`, `expect_text` | ### What's new in v0.44.18 (2026-05-05) diff --git a/docs-site/astro.config.mjs b/docs-site/astro.config.mjs index 06252471..463dd798 100644 --- a/docs-site/astro.config.mjs +++ b/docs-site/astro.config.mjs @@ -7,7 +7,7 @@ export default defineConfig({ integrations: [ starlight({ title: 'rn-dev-agent', - description: 'Claude Code plugin for React Native development — 51 MCP tools, 5 agents, 13 commands. Explore, build, verify, and test features live on iOS Simulator and Android Emulator via Chrome DevTools Protocol.', + description: 'Claude Code plugin for React Native development — 74 MCP tools, 5 agents, 17 commands. Explore, build, verify, and test features live on iOS Simulator and Android Emulator via Chrome DevTools Protocol.', social: [ { icon: 'github', label: 'GitHub', href: 'https://github.com/Lykhoyda/rn-dev-agent' }, ], @@ -24,7 +24,7 @@ export default defineConfig({ '@context': 'https://schema.org', '@type': 'SoftwareApplication', name: 'rn-dev-agent', - description: 'Claude Code plugin for React Native development with 51 MCP tools for live app verification via Chrome DevTools Protocol.', + description: 'Claude Code plugin for React Native development with 74 MCP tools for live app verification via Chrome DevTools Protocol.', applicationCategory: 'DeveloperApplication', operatingSystem: 'macOS, Linux', url: 'https://lykhoyda.github.io/rn-dev-agent/', @@ -44,6 +44,7 @@ export default defineConfig({ sidebar: [ { label: 'Getting Started', slug: 'getting-started' }, { label: 'Architecture', slug: 'architecture' }, + { label: 'Actions', slug: 'actions' }, { label: 'Commands', items: [ @@ -57,6 +58,14 @@ export default defineConfig({ { label: 'debug-screen', slug: 'commands/debug-screen' }, { label: 'check-env', slug: 'commands/check-env' }, { label: 'setup', slug: 'commands/setup' }, + { label: 'doctor', slug: 'commands/doctor' }, + ], + }, + { + label: 'Actions', + items: [ + { label: 'list-learned-actions', slug: 'commands/list-learned-actions' }, + { label: 'run-action', slug: 'commands/run-action' }, ], }, { @@ -83,7 +92,7 @@ export default defineConfig({ items: [ { label: 'Overview', slug: 'tools' }, { - label: 'CDP Tools (24)', + label: 'CDP Tools', collapsed: false, autogenerate: { directory: 'tools/cdp' }, }, @@ -93,7 +102,7 @@ export default defineConfig({ autogenerate: { directory: 'tools/device' }, }, { - label: 'Testing Tools (13)', + label: 'Testing Tools (5)', collapsed: true, autogenerate: { directory: 'tools/testing' }, }, diff --git a/docs-site/src/content/docs/actions/index.mdx b/docs-site/src/content/docs/actions/index.mdx index 68d166a7..f79697e1 100644 --- a/docs-site/src/content/docs/actions/index.mdx +++ b/docs-site/src/content/docs/actions/index.mdx @@ -1,21 +1,81 @@ --- title: Actions -description: Saved, replayable flows through your app — what they are, where they live, and how the agent uses them. +description: Saved, replayable flows through your app — what they are, why they exist, and how the agent mixes them with open-ended LLM work. --- import { Aside } from '@astrojs/starlight/components'; -An **action** is a saved, replayable flow through your app — login, navigating to a screen, completing a multi-step form. The plugin records them automatically when verification passes; you replay them in seconds via `/run-action`, and the agent uses them as prologues when it needs to reach a known state. +An **action** is a saved, parameterised flow through your app — login, navigating to a screen, completing a multi-step form. The plugin records them automatically when `/test-feature` verification passes; you replay them in seconds via `/run-action`, and the agent uses them as **prologues** when it needs to reach a known state before doing new work. | | | |---|---| -| **What** | A saved, replayable flow through your app (login, navigation, multi-step setup). | +| **What** | A saved, replayable Maestro flow with a metadata header and `${KEY}` placeholders. | | **Where** | `.rn-agent/actions/.yaml`. The plugin's home in your project is `.rn-agent/`. | -| **Create one** | Run `/rn-dev-agent:test-feature `. When verification passes, the plugin saves the verified walk as an action. | +| **Create one** | Run `/rn-dev-agent:test-feature `. On clean verification, the verified walk is saved as an action. | | **Run one** | List with `/rn-dev-agent:list-learned-actions`; replay with `/rn-dev-agent:run-action `. The agent also picks an action automatically when it needs to reach a known state. | | **Self-repair** | If a `testID` changes, the plugin patches the action against the live UI and retries. Small UI drift is absorbed; broken product logic is not. | | **Why** | Known flows replay in seconds instead of being rediscovered interactively. Repeated setup work like login becomes one fast step. | +## Why we have actions — the LLM/pragmatic hybrid + +LLM agents are great at **understanding intent and improvising on novel screens**. They are slow and stochastic at **re-deriving things they've already seen**. A login flow that took fourteen minutes to walk interactively the first time will take fourteen minutes again the next time, every time, if we don't record it. + +Pure-script approaches (Maestro, Detox, Appium) are the opposite: fast and deterministic on the happy path, but they don't adapt. A renamed `testID` breaks the script; a human re-records. + +Actions sit deliberately in the middle. They are **emitted by the agent, not authored by humans** — the verification walk that proves the feature works is the same artefact that gets replayed next time. The agent is in charge of when to *use* an action versus when to discover something new. + +### The composition pattern + +The agent never replays an entire job from a script — that would defeat the point of having an LLM in the loop. Each task is composed of two regimes: + +1. **Pragmatic reusable actions** for the predictable parts — login, "navigate to settings → security", "create a draft task with X title", switching locale, dismissing the subscription gate, getting back to logged-in home. +2. **LLM-driven discovery** for the part that is actually new — verifying a specific UI state, exercising a new edge case, debugging a regression, walking a freshly built feature. + +A worked example. You ask: "tap the cart badge." + +- The agent reads the navigation state. The app is on `LoginScreen`, but the cart badge lives on `HomeScreen`. +- It scans saved actions and finds `user-login`, whose recorded outcome is "logged-in home." +- It runs `user-login` (~4 seconds), arrives at `HomeScreen`. +- Then it discovers the cart badge interactively and taps it. + +### Measured impact + +A 3-step task-creation wizard took **13 min 55 s** as an interactive agent walk on first run; the same wizard replayed as an action runs in **~4 seconds** — a ~210× speed-up. Across 35 stories in the test app, average end-to-end time dropped from ~12 min to ~4 min once the corresponding actions existed. The latency win is most of the point, but the deeper one is **determinism**: replayed prologues take a fixed number of turns, so the LLM doesn't waste context re-orienting before getting to the actually-novel work. + +### How this compares with the alternatives + +| Failure mode | Pure script (Detox, Maestro) | Pure LLM (no actions) | This plugin | +|---|---|---|---| +| `testID` renamed in app | Breaks; human re-records | Re-discovers slowly each run | `cdp_repair_action` patches the YAML via fuzzy match against the live snapshot, retries, logs the diff | +| Button moved / restyled | Breaks | Adapts but spends turns | Repair handles it; if structure changed, escalates | +| Product logic changed | Passes anyway, masking the bug | Probabilistically catches it | Refuses to auto-patch a logical break; surfaces the failure to you | +| Net-new behaviour to verify | n/a — can't author for unknown flows | Re-derives every session | Discovers interactively, then **the verified walk auto-saves as a new action** | +| Cost over time | Linear (every drift needs a human) | Quadratic-ish (every session re-pays full walk) | Sub-linear (drift auto-absorbed, new flows compound the library) | + +Said another way: actions are the **memory** of the LLM loop. Every successful verification adds one. Every drift gets quietly absorbed. Every truly broken flow escalates. + +## Tool surface + +The hybrid is implemented across **four MCP tools** (one conceptual family, "Actions") and **two slash commands**. + +| Tool | Role | +|---|---| +| `cdp_record_test_save_as_action` | Convert a recorded interactive walk into a first-class `.rn-agent/actions/.yaml` with metadata header and sidecar state file. Auto-promotes to `status: active` after the first clean replay. | +| `cdp_run_action` | Replay an action by id with `params`. Orchestrates `maestro_run` + optional `cdp_repair_action` retry. Persists a `RunRecord` with `autoRepair` telemetry (passed / failed / refused / skipped, phase timings) so the experience engine can learn which flows are stable. | +| `cdp_repair_action` | When a run fails with `SELECTOR_NOT_FOUND`, fuzzy-match the stale selector against the live snapshot, patch the YAML, retry. Refuses on human-edited files (`mtime` check), >3 repairs/24h, or snapshot infrastructure failure. | +| `cdp_record_test_*` (start / stop / generate / annotate / save / load / list) | The recorder upstream of actions — captures device taps + CDP state assertions during interactive walks, before they get promoted to actions. | + +| Command | Role | +|---|---| +| [`/rn-dev-agent:list-learned-actions`](../commands/list-learned-actions/) | Read-only inventory — feedback memories + flows + skeletons + plugin commands. Shared script (`scripts/learned-actions.mjs`) is the single source of truth, also called by `rn-tester` / `rn-debugger` agents before they walk anything manually. | +| [`/rn-dev-agent:run-action`](../commands/run-action/) | Side-effecting execution — looks up the action via the same script, gates safety checks (`mutates` flag, `appId` match, `${VAR}` coverage), then calls `cdp_run_action`. | + +## The artifact-first protocol + +Both the `rn-tester` and `rn-debugger` agents are instructed (via `feedback_execute_artifacts_before_manual.md`) to scan saved actions **before** composing any new `device_*` primitives. Manual primitives are the **fallback**, not the default — that's the lever that keeps the LLM from paying full-walk latency on flows it has already verified once. + +In practice this means a session opens with `/list-learned-actions` (or its programmatic equivalent), routes through `/run-action` when a match exists, and only drops to interactive `device_press` / `device_fill` / `cdp_interact` when no action covers the intent. + ## Where actions live The plugin's home in your project is `.rn-agent/`. Actions live in the `actions/` subdirectory; sibling folders hold supporting state. @@ -78,19 +138,6 @@ When a `testID` gets renamed in your app and the action references the old name, Self-repair handles **small UI drift** (a renamed `testID`, a moved button), not broken product logic. It refuses to touch files you've hand-edited (`mtime` check). It refuses after 3 repairs on the same action in 24h to avoid runaway patching. Failures escalate to you. -## The agent picks actions automatically - -You don't have to type `/run-action` yourself. When you ask the agent to do something on screen X but the app is on screen Y, the agent will scan saved actions for one that gets you to X and run it as a prologue — then continue with the new work interactively. - -**Example.** You ask: "tap the cart badge." - -- The agent reads the navigation state. The app is on `LoginScreen`, but the cart badge lives on `HomeScreen`. -- It finds a saved `user-login` action whose recorded outcome is "logged-in home." -- It runs `user-login` (~4 seconds), arrives at `HomeScreen`. -- Then it discovers the cart badge interactively and taps it. - -This is the **hybrid composition** pattern. Actions cover the predictable parts (login, onboarding, locale switching, subscription gates) so the agent only spends time on the part that's actually new. - ## What actions are NOT - **Not a magic auto-tester.** Self-repair handles small UI drift; it doesn't fix broken features. diff --git a/docs-site/src/content/docs/architecture.mdx b/docs-site/src/content/docs/architecture.mdx index d55a06d4..82117316 100644 --- a/docs-site/src/content/docs/architecture.mdx +++ b/docs-site/src/content/docs/architecture.mdx @@ -55,13 +55,14 @@ The product layers above are organized; underneath, three implementation layers │ ┌──────▼───▼────────────▼─────────────────▼──────┐ │ │ │ MCP Server (CDP Bridge) │ │ │ │ WebSocket → Metro → Hermes CDP │ │ -│ │ 70+ tools: CDP + device + testing │ │ -│ └─────────────────────┬───────────────────────────┘ │ -│ │ │ -│ ┌─────────────────────▼───────────────────────────┐ │ -│ │ agent-device CLI │ │ -│ │ Native device interaction + fast-runner │ │ -│ └──────────────────────────────────────────────────┘ │ +│ │ 74 tools across 5 families │ │ +│ └─────────┬───────────────────────────┬───────────┘ │ +│ │ │ │ +│ ┌─────────▼──────────┐ ┌─────────▼──────────┐ │ +│ │ rn-fast-runner │ │ agent-device CLI │ │ +│ │ (iOS, in-tree) │ │ (Android, 3-tier) │ │ +│ │ XCTest /command │ │ daemon → CLI │ │ +│ └────────────────────┘ └────────────────────┘ │ └─────────────────────────────────────────────────────┘ │ │ ┌────▼────┐ ┌─────▼─────┐ @@ -72,23 +73,25 @@ The product layers above are organized; underneath, three implementation layers | Implementation tier | Tool | Role | |-------|------|------| -| **Device interaction** | agent-device CLI (auto-installed) | Cross-platform native device control: tap, swipe, fill, find, snapshot, screenshot | +| **Device interaction (iOS)** | In-tree `rn-fast-runner` XCTest rig (`scripts/rn-fast-runner/`) — `POST /command` HTTP | Native iOS device control. Always calls `XCUIApplication.activate()` per request (D1219, PR #164). iOS no longer requires `agent-device`. | +| **Device interaction (Android)** | `agent-device` CLI (auto-installed) | 3-tier dispatch: daemon socket → fast-runner → CLI fallback | | **App introspection** | Custom MCP server → Hermes CDP via WebSocket | Persistent WebSocket — reads React fiber tree, store state, network, console, errors | | **E2E testing** | maestro-runner (preferred) / Maestro (fallback) | YAML-based persistent test files; underlying format for actions in `.rn-agent/actions/` | -Fallback: `xcrun simctl` (iOS) + `adb` (Android) for device lifecycle when agent-device is unavailable. +Fallback: `xcrun simctl` (iOS) + `adb` (Android) for device **lifecycle** (boot / install / launch / terminate) — the runner doesn't manage device state, only interaction. ## MCP server (CDP bridge) The MCP server is a Node.js process that maintains a persistent WebSocket connection to the React Native app's Hermes engine through Metro's CDP endpoint. -**70+ tools** across four families: +**74 tools** across five families: - **CDP** — React internals via Chrome DevTools Protocol (component tree, store state, navigation, profiling, network) -- **Device** — Native interaction via agent-device CLI (tap, swipe, fill, snapshot, screenshot) -- **Testing** — Action replay, proof capture, auto-login, cross-platform verify +- **Device** — Native interaction (iOS: rn-fast-runner, Android: agent-device) +- **[Actions](actions/)** — Record / replay / self-repair (`cdp_run_action`, `cdp_repair_action`, `cdp_record_test_save_as_action`, `cdp_record_test_*`) +- **Testing** — Proof capture, auto-login, cross-platform verify, Maestro orchestration - **Macro-Asserts** — State-assertive replays (`expect_redux`, `expect_route`, `expect_visible_by_testid`, `expect_text`) -All tools are registered through a single `trackedTool()` wrapper that adds telemetry via the Experience Engine. +All tools are registered through a single `trackedTool()` wrapper that adds telemetry via the Experience Engine — that's the same mechanism that feeds the auto-action capture loop. ### Helper injection @@ -122,15 +125,21 @@ Since MCP is pull-based (tools are called on demand), events that fire between c | Network fallback for RN < 0.83 | Try `Network.enable`, if fails → inject fetch/XHR monkey-patches | | Filter mandatory on component tree | Full dumps waste 10K+ tokens — always scope to testID or component | -## 3-tier device dispatch +## Device dispatch by platform -For iOS, device interactions use a three-tier fallback: +Since PR #164 / D1219 the two platforms use different dispatch paths: -1. **fast-runner** (XCTest HTTP server) — ~216ms tap, ~5ms snapshot, ~74ms screenshot +**iOS — single-endpoint `rn-fast-runner`.** Every iOS `device_*` call short-circuits through `runIOS()` (TS client at `scripts/cdp-bridge/src/runners/rn-fast-runner-client.ts`) to a `POST /command` HTTP endpoint exposed by an in-tree XCTest rig. Coordinate-based gestures map to `.drag`; direction-based swipes/scrolls are pre-computed to coords by `device-interact.ts` before dispatch. `device_find` (non-exact) and `device_scrollintoview` are TS-side orchestrators over `runIOS('snapshot')` — no Swift `.findText` round-trip for fuzzy matching. iOS no longer requires `agent-device`. + +**Android — 3-tier `agent-device` fallback.** The Android path retains the original tiered dispatch via `agent-device-wrapper.ts`: + +1. **fast-runner** (XCTest-style HTTP server bundled with `agent-device`) — lowest latency 2. **agent-device daemon** — persistent process, medium latency 3. **agent-device CLI** — direct invocation, highest latency -The fast-runner is the default path when available, providing 13x faster tap interactions than the CLI fallback. +Measured: iOS `rn-fast-runner` delivers ~216ms tap, ~5ms snapshot, ~74ms screenshot — the fast-runner path is ~13× faster than CLI fallbacks on either platform. + +A stale `~/.agent-device/daemon.json` can respawn the upstream `AgentDeviceRunner` and fight the in-tree `rn-fast-runner` for focus on iOS. The plugin detects the legacy daemon at session-open; set `RN_DEVICE_KILL_LEGACY=1` to opt into termination automatically. ## What we're NOT using (and why) diff --git a/docs-site/src/content/docs/tools/index.mdx b/docs-site/src/content/docs/tools/index.mdx index a1d3066b..478051a7 100644 --- a/docs-site/src/content/docs/tools/index.mdx +++ b/docs-site/src/content/docs/tools/index.mdx @@ -1,39 +1,43 @@ --- title: MCP Tools -description: 38 tools for React Native development — CDP introspection, device interaction, and testing. +description: 74 tools for React Native development — CDP introspection, device interaction, actions, testing, and macro-asserts. --- -The MCP server exposes **38 tools** organized in three categories. All tools are available through Claude Code's tool-calling interface. +The MCP server exposes **74 tools** organized into five conceptual families. All tools are available through Claude Code's tool-calling interface. -## When to use which category +## When to use which family | Need | Use | Example | |------|-----|---------| -| Read React component tree, store state, navigation | **CDP tools** | `cdp_component_tree`, `cdp_store_state` | -| Tap, swipe, type, scroll on the device | **Device tools** | `device_press`, `device_fill`, `device_swipe` | -| Run E2E test flows, capture proof | **Testing tools** | `maestro_run`, `proof_step` | -| Check environment health | **CDP tools** | `cdp_status` (always call first) | -| Debug crashes spanning JS + native | **CDP tools** | `collect_logs` with multiple sources | +| Read React component tree, store state, navigation | **CDP** | `cdp_component_tree`, `cdp_store_state` | +| Tap, swipe, type, scroll on the device | **Device** | `device_press`, `device_fill`, `device_swipe` | +| Replay or self-repair a known flow | **[Actions](../actions/)** | `cdp_run_action`, `cdp_repair_action` | +| Run E2E flows, capture proof | **Testing** | `maestro_run`, `proof_step` | +| Assert internal state instead of pixels | **Macro-Asserts** | `expect_redux`, `expect_route` | +| Check environment health | **CDP** | `cdp_status` (always call first) | +| Debug crashes spanning JS + native | **CDP** | `collect_logs` with multiple sources | -## CDP Tools (19) +## CDP Tools Read React internals via Chrome DevTools Protocol over a WebSocket connection to Hermes. -**Connection lifecycle:** `cdp_status` (auto-connect) → `cdp_connect` (explicit) → `cdp_disconnect` (teardown) → `cdp_targets` (list without connecting) +**Connection lifecycle:** `cdp_status` (auto-connect) → `cdp_connect` (explicit) → `cdp_disconnect` (teardown) → `cdp_targets` (list without connecting) → `cdp_restart` (full process restart) → `cdp_open_devtools` (attach DevTools) -**Component inspection:** `cdp_component_tree` (filtered fiber tree) → `cdp_component_state` (hook state by testID) +**Component inspection:** `cdp_component_tree` (filtered fiber tree) → `cdp_component_state` (hook state by testID) → `cdp_diagnostic_renderers` (fiber-root invisibility diagnosis) **Navigation:** `cdp_navigation_state` (current route) → `cdp_nav_graph` (full graph with go-to-screen) → `cdp_navigate` (by screen name) -**State management:** `cdp_store_state` (Redux/Zustand/React Query) → `cdp_dispatch` (Redux action + read-back) +**State management:** `cdp_store_state` (Redux/Zustand/React Query) → `cdp_dispatch` (Redux action + read-back) → `cdp_mmkv` (MMKV read/write) -**Logs:** `cdp_console_log`, `cdp_network_log`, `cdp_error_log`, `collect_logs` (multi-source) +**Logs & events:** `cdp_console_log`, `cdp_network_log`, `cdp_network_body`, `cdp_wait_for_network`, `cdp_error_log`, `cdp_native_errors`, `cdp_metro_events`, `collect_logs` (multi-source) -**Utilities:** `cdp_evaluate` (raw JS), `cdp_reload` (full reload), `cdp_dev_settings` (dev menu), `cdp_interact` (deprecated) +**Profiling & inspection:** `cdp_heap_usage`, `cdp_cpu_profile`, `cdp_object_inspect`, `cdp_exception_breakpoint`, `cdp_set_shared_value` + +**Interaction & utilities:** `cdp_interact` (press/type/scroll by testID via fiber tree — preferred for JS-level interaction), `cdp_evaluate` (raw JS), `cdp_reload` (full reload), `cdp_dev_settings` (dev menu) ## Device Tools (14) -Native interaction via the agent-device CLI. Requires an open session (`device_snapshot action=open`). +Native interaction. iOS routes through the in-tree `rn-fast-runner` `/command` endpoint; Android routes through the `agent-device` CLI (3-tier daemon → fast-runner → CLI dispatch). Requires an open session (`device_snapshot action=open`). **Session:** `device_list` → `device_snapshot` (open/snapshot/close) → `device_screenshot` @@ -41,8 +45,24 @@ Native interaction via the agent-device CLI. Requires an open session (`device_s **Utilities:** `device_permission` (grant/revoke) → `device_batch` (multi-step sequence) +**Helpers** (filed alongside CDP in code): `device_deeplink`, `device_accept_system_dialog`, `device_dismiss_system_dialog`, `device_focus_next`, `device_pick_date`, `device_pick_value`, `device_record`, `device_reset_state`. + +## Actions + +Record / replay / self-repair persistent flows. See the [**Actions concept guide**](../actions/) for why the LLM/pragmatic hybrid matters. + +**Replay & repair:** `cdp_run_action` (orchestrates `maestro_run` + auto-repair + retry; persists `RunRecord` with phase timings) → `cdp_repair_action` (fuzzy-patches stale `testID` against live snapshot; refuses on human edits, >3 repairs/24h) + +**Recording → action promotion:** `cdp_record_test_start` → `cdp_record_test_annotate` (mid-recording state checks) → `cdp_record_test_stop` → `cdp_record_test_generate` (raw Maestro YAML) → `cdp_record_test_save` / `cdp_record_test_save_as_action` (promote to first-class action) → `cdp_record_test_list` / `cdp_record_test_load` + ## Testing Tools (5) E2E testing and proof capture — work even when the app is crashed or on native screens. -`cdp_auto_login` → `maestro_run` / `maestro_generate` → `maestro_test_all` → `proof_step` +`cdp_auto_login` → `maestro_run` / `maestro_generate` → `maestro_test_all` → `proof_step` → `cross_platform_verify` + +## Macro-Asserts + +State-assertive replays — assert internal app state instead of pixels. The differentiator vs Maestro Cloud / BrowserStack / KaneAI: those compare screenshots; these read Redux / route / fiber state directly. + +`expect_redux` (state.path equals value) · `expect_route` (active route name) · `expect_visible_by_testid` (testID present in tree) · `expect_text` (component text content)