Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
103 changes: 82 additions & 21 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,15 +43,33 @@ Development scaffolding lives in the **sibling workspace repo**:
- `maestro-runner` — auto-installed to `~/.maestro-runner/`

### Essential commands

Authoring & lifecycle:
```
/rn-dev-agent:setup — Check & install all prerequisites
/rn-dev-agent:setup — Check & install all prerequisites; scaffolds .rn-agent/
/rn-dev-agent:doctor — 14-row diagnostic table for the whole environment
/rn-dev-agent:check-env — Quick environment-readiness check
/rn-dev-agent:rn-feature-dev <desc> — Full 8-phase feature development pipeline
/rn-dev-agent:test-feature <desc> — Test a feature end-to-end on device
/rn-dev-agent:test-feature <desc> — Test a feature end-to-end; auto-records an action on pass
/rn-dev-agent:debug-screen — Diagnose and fix the current screen
/rn-dev-agent:check-env — Verify environment readiness
/rn-dev-agent:build-and-test <desc> — Build app, then test feature
/rn-dev-agent:proof-capture <desc> — Record proof video + screenshots
/rn-dev-agent:send-feedback — Report a bug
/rn-dev-agent:proof-capture <desc> — Rehearsal-gated video + screenshots + PR body
/rn-dev-agent:nav-graph — Extract / inspect the app navigation graph
/rn-dev-agent:send-feedback — Report a bug with sanitised environment context
```

Actions (replayable flows — see "Actions" section below):
```
/rn-dev-agent:list-learned-actions [q] — Inventory of saved flows + feedback memories
/rn-dev-agent:run-action <name> -e K=V — Replay a saved action; auto-repair-aware
```

Experience Engine (cross-session learning):
```
/rn-dev-agent:rn-agent-health — Health check the local experience store
/rn-dev-agent:rn-agent-compact — Compact telemetry on disk
/rn-dev-agent:rn-agent-export — Export the experience store
/rn-dev-agent:rn-agent-import — Import an experience snapshot
```

### How it works
Expand Down Expand Up @@ -105,40 +123,83 @@ Fallback: `xcrun simctl` (iOS) + `adb` (Android) for device lifecycle (boot / in

### MCP Server (cdp-bridge)

64 tools exposed via MCP (count last audited 2026-04-28; per-category breakdown below predates several additions and is approximate):
**74 tools** exposed via MCP (re-audited 2026-05-18; counted from `trackedTool()` calls in `scripts/cdp-bridge/src/index.ts`). Five conceptual families:

**CDP tools** (React internals via Chrome DevTools Protocol over WebSocket):
**CDP tools** React internals via Chrome DevTools Protocol over WebSocket:
- `cdp_status` — health check with domain capabilities + reconnect state
- `cdp_connect` / `cdp_disconnect` / `cdp_targets` — connection management
- `cdp_evaluate` — arbitrary JS execution in Hermes
- `cdp_reload` — full reload with auto-reconnect
- `cdp_dev_settings` — programmatic dev menu actions
- `cdp_component_tree` / `cdp_component_state` — React fiber introspection
- `cdp_reload` / `cdp_restart` — full reload / restart with auto-reconnect
- `cdp_dev_settings` / `cdp_open_devtools` — dev menu + DevTools attach
- `cdp_component_tree` / `cdp_component_state` / `cdp_diagnostic_renderers` — React fiber introspection
- `cdp_navigation_state` / `cdp_nav_graph` / `cdp_navigate` — navigation
- `cdp_store_state` / `cdp_dispatch` — Redux/Zustand/React Query state
- `cdp_network_log` / `cdp_network_body` / `cdp_console_log` / `cdp_error_log` — buffered events
- `cdp_wait_for_network` — block until a network request matching url_pattern + method completes (D682, GH #65 P3) — collapses "tap → check → tap" into a single deterministic call
- `cdp_interact` — press/type/scroll by testID via fiber tree
- `cdp_heap_usage` — JS memory usage
- `cdp_cpu_profile` — CPU profiling with hot function ranking
- `cdp_object_inspect` — handle-based lazy object inspection
- `cdp_exception_breakpoint` — catch exceptions with timed capture
- `cdp_network_log` / `cdp_network_body` / `cdp_wait_for_network` — network buffer + sync (D682)
- `cdp_console_log` / `cdp_error_log` / `cdp_native_errors` / `cdp_metro_events` — log/error/metro streams
- `cdp_interact` — press/type/scroll by testID via fiber tree (JS-level; not deprecated — preferred over device_* when a testID is reliable)
- `cdp_heap_usage` / `cdp_cpu_profile` / `cdp_object_inspect` / `cdp_exception_breakpoint` — profiling + inspection
- `cdp_mmkv` — read/write MMKV storage
- `cdp_set_shared_value` — set Reanimated SharedValue by testID for proof captures
- `collect_logs` — parallel multi-source log collection

**Device tools** (14 — iOS: in-tree `rn-fast-runner` `/command` endpoint; Android: `agent-device` CLI):
**Device tools** (14, native interaction — iOS: in-tree `rn-fast-runner` `/command`; Android: `agent-device` CLI):
- `device_list` / `device_screenshot` / `device_snapshot`
- `device_find` / `device_press` / `device_fill` / `device_swipe` / `device_scroll`
- `device_scrollintoview` / `device_back` / `device_longpress` / `device_pinch`
- `device_permission` / `device_batch`

Plus device helpers filed alongside CDP in code: `device_deeplink`, `device_accept_system_dialog`, `device_dismiss_system_dialog`, `device_focus_next`, `device_pick_date`, `device_pick_value`, `device_record`, `device_reset_state`.

iOS-only quirks worth knowing:
- `device_fill` may surface a Swift-internal `XCUIElement.typeText` quiescence-timeout from XCTest's main-thread sync. The TS client treats this specific error as success on `.type` (`meta.runnerTimeoutShim: true`) because the side-effect (text appended to the field) demonstrably succeeds — observed across the iOS-MVP smoke-tests.
- `device_find` non-exact + `device_scrollintoview` ALWAYS route through the TS orchestrators on iOS (never the legacy `agent-device find/scrollintoview` CLI), so they don't respawn the upstream `AgentDeviceRunner`.

**Testing & composite tools** (13):
- `proof_step` / `cross_platform_verify` / `maestro_run` / `maestro_generate` / `maestro_test_all`
- `cdp_auto_login` + device helpers (deeplink, accept/dismiss dialog, focus_next, pick_date, pick_value)
**Actions** (the LLM/pragmatic hybrid — see the Actions section below):
- `cdp_run_action` — replay an action by id with `params`; orchestrates `maestro_run` + `cdp_repair_action` + retry; persists a `RunRecord` with `autoRepair` telemetry
- `cdp_repair_action` — fuzzy-match a stale `testID` against the live snapshot, patch the YAML, retry; refuses on human edits (mtime), >3 repairs/24h, or snapshot infra failure
- `cdp_record_test_save_as_action` — promote a recorded walk to `.rn-agent/actions/<id>.yaml` with metadata header + sidecar; auto-promotes to `status: active` on first clean replay
- `cdp_record_test_*` — start / stop / generate / annotate / save / load / list (recorder upstream of actions)

**Testing & composite tools**:
- `proof_step` / `cross_platform_verify` — verification primitives
- `maestro_run` / `maestro_generate` / `maestro_test_all` — Maestro orchestration
- `cdp_auto_login` — credentials-or-deeplink login wrapper

**Macro-Asserts** (state-assertive replays — internal state, not pixels):
- `expect_redux` / `expect_route` / `expect_visible_by_testid` / `expect_text`

### Actions — the LLM/pragmatic hybrid

An **action** is a parameterised Maestro flow under `.rn-agent/actions/<id>.yaml` with a metadata header (id / intent / tags / mutates / status / appId). Actions are **emitted by the agent** when `/test-feature` verification passes — they are not human-authored. They get replayed via `/run-action` (or directly by `cdp_run_action`) as **prologues** before the agent does new interactive work.

**Why we have them.** LLM agents are good at improvising on novel screens, slow and stochastic at re-deriving things they've already seen. Pure-script approaches (Detox, Maestro, Appium) are the opposite — fast but brittle to UI drift. Actions sit in the middle: every successful verification adds one, every drift gets quietly absorbed by `cdp_repair_action`, every truly broken flow escalates. Measured: a 3-step wizard that takes ~14 min as an interactive walk runs in ~4 s replayed (~210× speedup); across 35 stories the average dropped from ~12 min to ~4 min once the corresponding actions existed.

**Composition rule.** The agent never replays an entire job from a script. Each task is two regimes:
1. **Pragmatic reusable actions** for the predictable parts (login, navigation, multi-step setup, locale switching, dismissing gates).
2. **LLM-driven discovery** for the part that is actually new (verifying a specific UI state, exercising a new edge case, debugging a regression).

**Artifact-first protocol.** `rn-tester` and `rn-debugger` agents are instructed (via `feedback_execute_artifacts_before_manual.md`) to scan saved actions before composing any new `device_*` primitives. Manual primitives are a **fallback**, not the default. Single source of truth for the inventory is `scripts/learned-actions.mjs` — shared by `/list-learned-actions`, `/run-action`, and both agents' Step 0 artifact scans.

**Tool surface for actions** (one conceptual family — see "Actions" in the MCP server list above):

| Tool / Command | Role |
|---|---|
| `cdp_record_test_save_as_action` | Promote a recorded walk → first-class action with metadata header + sidecar |
| `cdp_run_action` | Replay with params; orchestrates `maestro_run` + `cdp_repair_action` + retry; persists `RunRecord` with `autoRepair` telemetry (passed/failed/refused/skipped + phase timings) |
| `cdp_repair_action` | Fuzzy-match stale `testID` against live snapshot, patch YAML, retry; refuses on human edits (mtime), >3 repairs/24h, or snapshot infra failure |
| `/list-learned-actions` | Read-only inventory (single source of truth: `scripts/learned-actions.mjs`) |
| `/run-action <name> -e K=V` | Side-effecting execution; gates safety checks (mutates flag, appId match, `${VAR}` coverage), then calls `cdp_run_action` |

**Why hybrid beats either extreme**:

| Failure mode | Pure script | Pure LLM | This plugin |
|---|---|---|---|
| `testID` renamed | Breaks; human re-records | Re-discovers each run | `cdp_repair_action` patches + retries + logs diff |
| Product logic changed | Passes anyway, masks bug | Probabilistically catches | Refuses to auto-patch logic break; surfaces failure |
| Net-new behaviour | Can't author | Re-derives every session | Discovers interactively, **auto-saves verified walk as new action** |
| Cost over time | Linear (drift = human) | Quadratic (full walk each session) | Sub-linear (drift absorbed, library compounds) |

Full user-facing doc: [docs-site/actions](docs-site/src/content/docs/actions/index.mdx) (published at `lykhoyda.github.io/rn-dev-agent/actions/`).

### Key Technical Decisions

Expand Down
27 changes: 14 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

A [Claude Code](https://docs.anthropic.com/en/docs/claude-code) plugin that turns Claude into a React Native development partner. It explores your codebase, designs architecture, implements features, then **verifies everything live on the simulator** — reading the component tree, store state, and navigation stack through Chrome DevTools Protocol.

**70+ MCP tools** · **5 agents** · **16 commands** · **1180+ tests** · **46 best-practice rules** · [Full documentation](https://lykhoyda.github.io/rn-dev-agent/)
**74 MCP tools** · **5 agents** · **17 commands** · **1180+ tests** · **46 best-practice rules** · [Full documentation](https://lykhoyda.github.io/rn-dev-agent/)

---

Expand Down Expand Up @@ -74,20 +74,20 @@ Claude runs an [8-phase pipeline](https://lykhoyda.github.io/rn-dev-agent/comman
| `/rn-dev-agent:build-and-test <desc>` | Build app (local or EAS), install on device, then test |
| `/rn-dev-agent:proof-capture <desc>` | Rehearsal-gated video + screenshots + PR body |

**Actions: replayable app flows**
**Actions: replayable app flows + the LLM/pragmatic hybrid**

A saved, replayable flow through your app — login, navigate to settings, reach a known state. The plugin records actions automatically when verification passes; you replay them in seconds.
An **action** is a saved Maestro flow the agent **emits** when `/test-feature` verification passes — not something you author. Each task is then composed of two regimes: **pragmatic reusable actions** for predictable parts (login, navigation, multi-step setup) and **LLM-driven discovery** for the part that's actually new. The agent uses actions as prologues to reach a known state before doing fresh interactive work. Measured: a 3-step wizard that took **13 min 55 s** as an interactive walk runs in **~4 s** when replayed — ~210× faster.

| | |
|---|---|
| **What** | A saved, replayable flow through your app (login, navigation, multi-step setup). |
| **What** | A saved, parameterised flow with a metadata header and `${KEY}` placeholders. |
| **Where** | `.rn-agent/actions/<name>.yaml`. The plugin's home in your project is `.rn-agent/`. |
| **Create one** | Run `/rn-dev-agent:test-feature <description>`. When verification passes, the plugin saves the verified walk as an action. |
| **Run one** | List with `/rn-dev-agent:list-learned-actions`; replay with `/rn-dev-agent:run-action <name>`. The agent also picks an action automatically when it needs to reach a known state (e.g. logged-in home) before doing new work. |
| **Self-repair** | If a `testID` changes, the plugin patches the action against the live UI and retries. Small UI drift is absorbed without re-recording; broken product logic is not auto-fixed. |
| **Why** | Known flows replay in seconds instead of being rediscovered interactively. Repeated setup work like login becomes one fast step. |
| **Create one** | Run `/rn-dev-agent:test-feature <description>`. The verified walk is saved as an action. |
| **Run one** | List with `/rn-dev-agent:list-learned-actions`; replay with `/rn-dev-agent:run-action <name>`. The agent also picks an action automatically when it needs to reach a known state. |
| **Self-repair** | If a `testID` changes, `cdp_repair_action` fuzzy-matches against the live snapshot, patches the YAML, and retries. Small UI drift absorbed; broken product logic surfaced, not auto-fixed. |
| **Why hybrid** | Pure scripts don't adapt; pure LLM re-derives everything every session. Actions are the memory of the LLM loop — every successful verification adds one, every drift gets quietly absorbed, every truly broken flow escalates. |

[Full actions guide](https://lykhoyda.github.io/rn-dev-agent/actions/)
[Full actions guide — why the hybrid matters, tool surface, comparison vs Detox/Maestro/pure-LLM](https://lykhoyda.github.io/rn-dev-agent/actions/)

**Setup & diagnostics:**

Expand Down Expand Up @@ -131,13 +131,14 @@ if (__DEV__) {

## MCP Tools

The plugin exposes a wide surface area of MCP tools across four families. See the [tools reference](https://lykhoyda.github.io/rn-dev-agent/tools/) for the full list.
The plugin exposes **74 MCP tools** across five families. See the [tools reference](https://lykhoyda.github.io/rn-dev-agent/tools/) for the full list.

| Family | What it's for | Examples |
|---|---|---|
| **CDP** | React internals via Chrome DevTools Protocol | `cdp_component_tree`, `cdp_store_state`, `cdp_evaluate`, `cdp_native_errors`, `cdp_record_test_*`, `cdp_repair_action` |
| **Device** | Native interaction with the simulator/emulator | `device_find`, `device_press`, `device_fill`, `device_screenshot`, `device_pick_date` |
| **Testing** | E2E replay and PR-ready proof | `proof_step`, `cross_platform_verify`, `maestro_run`, `cdp_run_action` |
| **CDP** | React internals via Chrome DevTools Protocol | `cdp_status`, `cdp_component_tree`, `cdp_store_state`, `cdp_evaluate`, `cdp_native_errors`, `cdp_navigate`, `collect_logs` |
| **Device** | Native interaction with the simulator/emulator | `device_find`, `device_press`, `device_fill`, `device_screenshot`, `device_pick_date`, `device_batch` |
| **Actions** | Record / replay / self-repair persistent flows ([guide](https://lykhoyda.github.io/rn-dev-agent/actions/)) | `cdp_run_action`, `cdp_repair_action`, `cdp_record_test_save_as_action`, `cdp_record_test_*` |
| **Testing** | E2E replay and PR-ready proof | `proof_step`, `cross_platform_verify`, `maestro_run`, `maestro_test_all`, `cdp_auto_login` |
| **Macro-Asserts** | State-assertive replays — internal state, not pixels | `expect_redux`, `expect_route`, `expect_visible_by_testid`, `expect_text` |

### What's new in v0.44.18 (2026-05-05)
Expand Down
17 changes: 13 additions & 4 deletions docs-site/astro.config.mjs
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ export default defineConfig({
integrations: [
starlight({
title: 'rn-dev-agent',
description: 'Claude Code plugin for React Native development — 51 MCP tools, 5 agents, 13 commands. Explore, build, verify, and test features live on iOS Simulator and Android Emulator via Chrome DevTools Protocol.',
description: 'Claude Code plugin for React Native development — 74 MCP tools, 5 agents, 17 commands. Explore, build, verify, and test features live on iOS Simulator and Android Emulator via Chrome DevTools Protocol.',
social: [
{ icon: 'github', label: 'GitHub', href: 'https://github.com/Lykhoyda/rn-dev-agent' },
],
Expand All @@ -24,7 +24,7 @@ export default defineConfig({
'@context': 'https://schema.org',
'@type': 'SoftwareApplication',
name: 'rn-dev-agent',
description: 'Claude Code plugin for React Native development with 51 MCP tools for live app verification via Chrome DevTools Protocol.',
description: 'Claude Code plugin for React Native development with 74 MCP tools for live app verification via Chrome DevTools Protocol.',
applicationCategory: 'DeveloperApplication',
operatingSystem: 'macOS, Linux',
url: 'https://lykhoyda.github.io/rn-dev-agent/',
Expand All @@ -44,6 +44,7 @@ export default defineConfig({
sidebar: [
{ label: 'Getting Started', slug: 'getting-started' },
{ label: 'Architecture', slug: 'architecture' },
{ label: 'Actions', slug: 'actions' },
{
label: 'Commands',
items: [
Expand All @@ -57,6 +58,14 @@ export default defineConfig({
{ label: 'debug-screen', slug: 'commands/debug-screen' },
{ label: 'check-env', slug: 'commands/check-env' },
{ label: 'setup', slug: 'commands/setup' },
{ label: 'doctor', slug: 'commands/doctor' },
],
},
{
label: 'Actions',
items: [
{ label: 'list-learned-actions', slug: 'commands/list-learned-actions' },
{ label: 'run-action', slug: 'commands/run-action' },
],
},
{
Expand All @@ -83,7 +92,7 @@ export default defineConfig({
items: [
{ label: 'Overview', slug: 'tools' },
{
label: 'CDP Tools (24)',
label: 'CDP Tools',
collapsed: false,
autogenerate: { directory: 'tools/cdp' },
},
Expand All @@ -93,7 +102,7 @@ export default defineConfig({
autogenerate: { directory: 'tools/device' },
},
{
label: 'Testing Tools (13)',
label: 'Testing Tools (5)',
collapsed: true,
autogenerate: { directory: 'tools/testing' },
},
Expand Down
Loading
Loading