Skip to content

Commit 819d7dc

Browse files
authored
feat: expose structured MCP command tools (#593)
* feat: expose semantic MCP tools * docs: remove semantic mcp prd * refactor: deepen semantic command surface * refactor: add mcp execution seam * refactor: deepen command grammar * refactor: remove legacy command definitions * refactor: collapse semantic cli wrappers * refactor: remove local mcp placeholders * refactor: derive semantic cli routing * refactor: trim mcp status metadata * refactor: derive semantic input contracts * refactor: split semantic grammar modules * refactor: derive batch input schema * refactor: centralize cli command schema catalog * refactor: share semantic cli output projections * refactor: remove legacy cli output paths * refactor: consolidate command interface surface * docs: align command contract wording * refactor: split command projection from cli grammar * refactor: trim projection exports * fix: satisfy fallow command contract audit * refactor: structure public batch steps * chore: clean batch architecture references * fix: keep legacy cli batch steps working * fix: serialize mcp batches * chore: tighten command surface cleanup * fix: serialize mcp stdin requests * chore: keep mcp config out of command contracts * fix: project structured batch targets * chore: harden command input typing * fix: project maestro backend for replay tests * fix: preserve session mcp request options
1 parent a5fffc6 commit 819d7dc

104 files changed

Lines changed: 7819 additions & 5446 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

AGENTS.md

Lines changed: 38 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -52,12 +52,29 @@ Single-context repo. Read `CONTEXT.md` for domain language and testing/architect
5252
- Keep modules small for agent context safety:
5353
- target <= 300 LOC per implementation file when practical.
5454
- if a file grows past 500 LOC, plan/extract focused submodules before adding new behavior.
55-
- exception: generated files, schema/fixture snapshots, and integration test aggregations.
55+
- if a file grows past 1,000 LOC, treat it as architecture debt unless it is generated data, a fixture snapshot, or an integration test aggregation.
56+
- long guidance/data tables should live behind focused modules instead of sharing a file with parser/runtime logic.
57+
- prefer deep modules over mechanical splits: extract when it improves locality for a concept callers already need, not just to reduce line count.
58+
59+
## Context Management
60+
- Optimize for one-pass agent reads. A module that requires reading many siblings to understand one change is usually too shallow; a module that hides one concept behind a small interface is usually worth keeping.
61+
- Start with the owning module, then one shared helper, then one downstream caller or adapter. Broaden only when the contract crosses that edge.
62+
- Use targeted symbol searches before opening large files. For files over 500 LOC, search for the relevant type/function/section first, then read a bounded range.
63+
- Do not add unrelated exports just to make tests easier. Test through the public interface when possible; if that is awkward, consider whether the module's interface is too shallow.
64+
- When adding new guidance, examples, schemas, or command metadata, decide whether it belongs in the command surface, CLI grammar, CLI help, MCP projection, or daemon runtime before editing.
65+
- Prefer updating existing domain vocabulary in `CONTEXT.md` when naming a new durable module concept. Do not coin parallel names in docs, tests, and code.
5666

5767
## Routing
5868
- Keep `src/daemon.ts` as a thin router.
5969
- Keep command names and daemon routing groups centralized in `src/command-catalog.ts`; do not re-create command string sets in handlers or request policy modules.
60-
- Keep CLI/client positional grammar in `src/command-codecs.ts` and its `src/command-codecs/*` command-family modules. CLI commands, typed client methods, and daemon interaction adapters should reuse these codecs instead of duplicating selector/ref/positionals parsing.
70+
- Keep command input/output contracts in the command modules:
71+
- command surface and shared schemas: `src/commands/command-surface.ts`, `src/commands/command-contract.ts`, `src/commands/command-input.ts`
72+
- typed client command execution: `src/commands/client-command-contracts.ts`
73+
- command families: `src/commands/interaction-command-contracts.ts`, `src/commands/batch-command.ts`, with other typed client contracts in `src/commands/client-command-contracts.ts`
74+
- CLI positional/flag grammar: `src/commands/cli-grammar.ts` and `src/commands/cli-grammar/*`
75+
- typed input to daemon request projection: `src/commands/command-projection.ts`
76+
- CLI/client/runtime output projection: `src/commands/cli-output.ts`, `src/commands/client-output.ts`, `src/commands/runtime-output.ts`
77+
- Do not reintroduce CLI-shaped command adapters or schemas as a second source of truth. CLI, Node.js, and MCP should project from command contracts.
6178
- Keep `src/daemon/request-router.ts` as request orchestration: auth, diagnostics scope, request admission, locking, handler chain, and fallback dispatch.
6279
- Put request policies in focused request modules:
6380
- tenant/lease/selector/lock admission: `src/daemon/request-admission.ts`
@@ -111,17 +128,18 @@ Single-context repo. Read `CONTEXT.md` for domain language and testing/architect
111128

112129
## Adding a New CLI Flag
113130

114-
A new snapshot/command flag touches up to 7 files in a fixed order. Follow this checklist:
131+
A new snapshot/command flag touches only the layers that need to understand it. Follow this checklist in order:
115132

116-
1. `src/utils/command-schema.ts`: add to `CliFlags` type, `FLAG_DEFINITIONS` array, and the relevant `*_FLAGS` constant (e.g. `SNAPSHOT_FLAGS`). Update the command's `usageOverride` string.
117-
2. `src/utils/snapshot.ts` (or the relevant options type): add to `SnapshotOptions` or equivalent.
118-
3. `src/client-types.ts`: add to `CaptureSnapshotOptions` (or equivalent public options type) **and** `InternalRequestOptions`.
119-
4. `src/client-normalizers.ts`: map the public option name to the internal flag name in `buildFlags`.
120-
5. `src/daemon/context.ts`: add to `DaemonCommandContext` type and `contextFromFlags` function.
121-
6. `src/core/dispatch-context.ts`: add to `DispatchContext` when the flag flows into platform dispatch, then thread it through the relevant dispatcher module.
122-
7. `src/cli/commands/<command>.ts`: pass the flag from `flags.*` to the client call.
133+
1. `src/utils/cli-flags.ts`: add to `CliFlags`, `FLAG_DEFINITIONS`, and the relevant exported flag group (e.g. `SNAPSHOT_FLAGS`). Add the flag to `CLI_COMMAND_OVERRIDES` in `src/utils/cli-command-overrides.ts` for each command that supports it; command names/descriptions come from command contracts unless CLI help needs a specific override.
134+
2. `src/commands/cli-grammar/*`: read the CLI flag into command input when the CLI accepts it.
135+
3. `src/commands/command-projection.ts` and command-family projection helpers: write the input into the daemon request only if the flag affects daemon execution.
136+
4. `src/commands/*-command-contracts.ts`: add or update the command input schema only if the option should be available through Node.js or MCP as structured input.
137+
5. `src/client-types.ts`: update the public typed client option only when the Node.js interface exposes the option.
138+
6. `src/client-normalizers.ts`: update daemon flag normalization only when the request still needs a public-to-internal option translation.
139+
7. `src/daemon/context.ts` and `src/core/dispatch-context.ts`: add the field only when it flows into platform dispatch.
140+
8. Handler/platform modules: thread the option only after the command surface, grammar, and projection prove it belongs there.
123141

124-
Command-only flags (like `find --first`) that don't flow to the platform layer only need steps 1 and the handler file.
142+
Command-only flags (like `find --first`) that do not flow to the platform layer usually stop at steps 1-3.
125143

126144
## Hard Rules
127145
- Use process helpers from `src/utils/exec.ts` for TypeScript process execution: `runCmd`, `runCmdStreaming`, `runCmdSync`, `runCmdBackground`, and `runCmdDetached`. Do not import raw `spawn`/`spawnSync` outside `src/utils/exec.ts`; add or extend an exec helper instead. Plain `.mjs` packaging fixtures that cannot import TypeScript helpers should keep child-process usage local and prefer `execFile`/`execFileSync` over spawn.
@@ -190,7 +208,7 @@ Command-only flags (like `find --first`) that don't flow to the platform layer o
190208

191209
## Testing Matrix
192210
- Docs/skills only: no tests required unless a more specific rule below applies.
193-
- CLI help/guidance changes in `src/utils/command-schema.ts`: run `pnpm exec vitest run src/utils/__tests__/args.test.ts`.
211+
- CLI help/guidance changes in `src/utils/cli-help.ts`, `src/utils/cli-command-overrides.ts`, or `src/utils/command-schema.ts`: run `pnpm exec vitest run src/utils/__tests__/args.test.ts`.
194212
- SkillGym prompt/assertion changes: run `pnpm test:skillgym:case <case-id>`; the script builds local CLI help first. For broad validation, use `pnpm test:skillgym`; append `-- --tag fixture-smoke` or `-- --tag skill-guidance` when validating one suite group.
195213
- Non-TS, no behavior impact: no tests unless requested.
196214
- Keep tests behavioral; do not assert shapes or cases TypeScript already proves.
@@ -208,6 +226,7 @@ Command-only flags (like `find --first`) that don't flow to the platform layer o
208226
- Do not run integration tests by default.
209227
- Do not inspect both iOS and Android codepaths unless task requires both.
210228
- Prefer targeted `git diff -- <paths>` over broad file reads during review.
229+
- Keep long help prose in `src/utils/cli-help.ts`; keep flag definitions in `src/utils/cli-flags.ts`; keep CLI-specific command usage/flag metadata in `src/utils/cli-command-overrides.ts`.
211230
- Prefer `snapshot -i`, `find`, and scoped selectors over repeated full snapshot dumps when exploring Apple desktop UIs.
212231
- Keep PR summaries short and scoped.
213232

@@ -222,9 +241,10 @@ Command-only flags (like `find --first`) that don't flow to the platform layer o
222241
- Changing `tsconfig.lib.json`/build tooling without running `pnpm check:tooling`; declaration generation is stricter than `tsc --noEmit`.
223242

224243
## Docs & Skills
225-
- Versioned CLI help is the agent-facing source of truth. Put workflow guidance in `src/utils/command-schema.ts` help topics and assert important copy in `src/utils/__tests__/args.test.ts`.
244+
- Versioned CLI help is the agent-facing source of truth. Put workflow guidance and help-topic prose in `src/utils/cli-help.ts`, keep flag definitions in `src/utils/cli-flags.ts`, keep CLI command overrides in `src/utils/cli-command-overrides.ts`, and assert important copy in `src/utils/__tests__/args.test.ts`.
245+
- Keep parser schema and help rendering separate: `src/utils/command-schema.ts` composes contract-derived command schemas with CLI overrides; `src/utils/cli-help.ts` owns help topics and usage rendering.
226246
- Skills are thin routers. Keep `skills/**/SKILL.md` focused on when to use the skill, version gating, which `agent-device help <topic>` page to read, and a short default loop. Do not duplicate full CLI manuals in skills.
227-
- For behavior/CLI surface changes, update the versioned help instructions in `src/utils/command-schema.ts` and assert important help copy in `src/utils/__tests__/args.test.ts`. Also update `README.md` and relevant `website/docs/**` when user-facing docs need it.
247+
- For behavior/CLI surface changes, update the versioned help instructions in `src/utils/cli-help.ts` or the CLI command metadata in `src/utils/cli-command-overrides.ts`, then assert important help copy in `src/utils/__tests__/args.test.ts`. Also update `README.md` and relevant `website/docs/**` when user-facing docs need it.
228248
- For behavior/CLI surface changes and command-planning guidance changes, write or update a SkillGym case in `test/skillgym/suites/agent-device-smoke-suite.ts` that captures the expected agent command plan.
229249
- Do not update `skills/**/SKILL.md` for command behavior or workflow guidance unless the user explicitly asks; skills must route to versioned CLI help instead of carrying behavior details.
230250
- Keep SkillGym cases behavioral and command-planning oriented. Prefer prompts that assert the user-visible contract and expected command family over brittle exact output, but forbid known bad patterns.
@@ -245,6 +265,7 @@ Command-only flags (like `find --first`) that don't flow to the platform layer o
245265

246266
## Key Files
247267
- CLI parse + formatting: `src/bin.ts`, `src/cli.ts`, `src/utils/args.ts`
268+
- CLI help + option metadata: `src/utils/cli-help.ts`, `src/utils/cli-flags.ts`, `src/utils/cli-command-overrides.ts`, `src/utils/command-schema.ts`, `src/utils/cli-option-schema.ts`
248269
- Daemon client transport: `src/daemon-client.ts`
249270
- Daemon state/store: `src/daemon/session-store.ts`
250271
- Selector DSL and matching: `src/daemon/selectors.ts`
@@ -254,7 +275,9 @@ Command-only flags (like `find --first`) that don't flow to the platform layer o
254275
- Handler context helpers: `src/daemon/context.ts`, `src/daemon/device-ready.ts`
255276
- Request routing/policy: `src/daemon/request-router.ts`, `src/daemon/request-admission.ts`, `src/daemon/request-generic-dispatch.ts`
256277
- Dispatcher + capability map: `src/core/dispatch.ts`, `src/core/dispatch-context.ts`, `src/core/dispatch-interactions.ts`, `src/core/capabilities.ts`
257-
- Command catalog + positional codecs: `src/command-catalog.ts`, `src/command-codecs.ts`, `src/command-codecs/*`
278+
- Command catalog + command surface: `src/command-catalog.ts`, `src/commands/command-surface.ts`, `src/commands/command-contract.ts`, `src/commands/client-command-contracts.ts`
279+
- CLI grammar: `src/commands/cli-grammar.ts`, `src/commands/cli-grammar/*`
280+
- Daemon request projection: `src/commands/command-projection.ts`
258281
- Platform backends: `src/platforms/ios/*`, `ios-runner/*`, `src/platforms/android/*`
259282

260283
## Pull Requests

CONTEXT.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313
- Target: selected automation destination, such as mobile, tv, or desktop.
1414
- Modality: broad supported device family, such as mobile, tv, or desktop.
1515
- Session: daemon-owned state for a selected target and opened app or surface.
16+
- Command surface: catalog of public command identity, interface exposure, adapter policy, and shared command metadata across CLI, Node.js, MCP, and batch entrypoints.
1617

1718
## Testing Principles
1819

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -83,7 +83,7 @@ Snapshots assign refs like `@e1`, `@e2`, and `@e3` to elements on the current sc
8383

8484
## Next Steps
8585

86-
- **Set up your agent**: run the CLI from Cursor, Codex, Claude Code, Windsurf, or another agent terminal. For skills, rules, MCP discovery, and client-specific setup, see [AI Agent Setup](https://incubator.callstack.com/agent-device/docs/agent-setup).
86+
- **Set up your agent**: run the CLI from Cursor, Codex, Claude Code, Windsurf, or another agent terminal. For skills, rules, direct MCP tools, and client-specific setup, see [AI Agent Setup](https://incubator.callstack.com/agent-device/docs/agent-setup).
8787
- **Try the sample app**: clone the repo and run the bundled Expo fixture when you want a guided first dogfood run with screenshots, replay, and performance evidence. See [Quick Start](https://incubator.callstack.com/agent-device/docs/quick-start).
8888
- **Go deeper**: use [Commands](https://incubator.callstack.com/agent-device/docs/commands), [Replay & E2E](https://incubator.callstack.com/agent-device/docs/replay-e2e), and [Debugging & Profiling](https://incubator.callstack.com/agent-device/docs/debugging-profiling) for production workflows.
8989

0 commit comments

Comments
 (0)