feat(#211): maestro_run structured step results + partial progress on timeout by Lykhoyda · Pull Request #302 · Lykhoyda/rn-dev-agent

Lykhoyda · 2026-06-15T08:22:30Z

Summary

Makes maestro_run return structured per-step results and partial progress on timeout, parsed from maestro-runner stdout (no new file I/O). Closes #211. (Issue item 3 — iOS clearState — already shipped in #276/#201.)

New pure module maestro-step-parser.ts parses the {✓|✗} verb[: sel] (N.Ns) step lines maestro-runner already prints; tap-latency.ts (#263) now derives its tap latencies from the same parser (one regex, not two); maestro_run spreads additive fields into the result on all three paths.

Result shape (additive — `output` preserved for `run-action`)

data on pass/warn, meta on fail:

steps:           {index,name,verb,status:'pass'|'fail',durationMs}[]
failedStep:      MaestroStep | null      // terminal failed step only
reason:          {kind,selector} | null  // sanitized — never the raw runner log
lastStep:        MaestroStep | null      // progress marker
timedOut:        boolean
outputTruncated: boolean                 // maxBuffer overflow, distinct from timeout

Process

Plan multi-reviewed pre-code (Codex + Claude): caught the reason-re-embeds-raw blocker, maxBuffer-vs-timeout, verb colon-strip, data/meta envelope placement, terminal-only failedStep.
TDD; per-edit codex-pair review (→ terminal-fail semantics + bounded fields); then full-diff multi-review:
- Codex found a ReDoS in the step regex (overlapping \s+(.*?)\s* → 30s on a glyph + long-whitespace line in untrusted 10MB stdout). Hardened to (\S.*\S|\S)\s*\( — 0ms on all pathological inputs, byte-identical on valid lines.
- Antigravity full-diff review: no material issues; verified envelope placement, raw-free invariant, enhancement: detect wedged simulator test-runtime (degraded tap latency) and recommend reboot #263 non-regression, no consumer breaks.

Test plan

Unit: full cdp-bridge suite 2087/2087 (enhancement: maestro_run — structured step results, partial progress on timeout, iOS clearState support #211 parser cases incl. ReDoS timing guard, field capping, terminal-fail, timeout-vs-maxBuffer).
enhancement: detect wedged simulator test-runtime (degraded tap latency) and recommend reboot #263 regression: gh-263-tap-latency.test.js green — parseTapLatencies behavior identical.
Device: real maestro-runner output (piped) emits 0 ANSI bytes (stripAnsi is defensive-only); parseSteps fail-open on real non-step output → []. Step-line format validated by enhancement: detect wedged simulator test-runtime (degraded tap latency) and recommend reboot #263's real fixtures (same binary v1.0.9).
Pre-merge nicety: live ✓ step (N.Ns) capture (local WDA startup flaked on a fresh iOS 26.5 sim — environment, not the change) + runFlow sub-flow rendering.

🤖 Generated with Claude Code

…al-progress-on-timeout Stdout-only parser generalizing #263; additive result fields {steps,failedStep,reason,lastStep,timedOut,outputTruncated}. Folds in the pre-code multi-LLM plan review (codex + claude file-grounded): reason must be raw-free (parseMaestroFailure embeds full output), maxBuffer-vs-timeout discrimination, verb trailing-colon strip, data/meta envelope placement, terminal-only failedStep, ANSI/runFlow edges. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ults 6 bite-sized TDD tasks: parseSteps+stripAnsi, buildStepSummary+helpers, classifyExecError, tap-latency refactor, maestro-run wiring, changeset. Encodes the multi-LLM review findings as concrete test cases. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…hangeset path MED#1: add pure formatFailureHeadline (structured, raw-free) for the catch-path message; raw err.message kept only as system-error fallback. MED#2: Task 6 runs from repo root, stages repo-root .changeset/ (two-package frontmatter rn-dev-agent-cdp + rn-dev-agent-plugin). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Pure, fail-open parser for `{✓|✗} verb[: sel] (N.Ns)` lines. verb is the first token (trailing colon stripped); requires trailing (N.Ns) so the rn-maestro-run summary + count lines are excluded; ANSI-stripped first. 7/7 tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…y failedStep) findFailedStep/lastObservedStep/summarizeReason/buildStepSummary. summarizeReason projects parseMaestroFailure to {kind,selector}, dropping raw (the full output). failedStep/reason gated on opts.failed so a fail-then-retry-✓ passed run reports no failure. 14/14 tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

classifyExecError distinguishes timeout (killed, no maxBuffer code) from a 10MB buffer overflow. formatFailureHeadline builds a raw-free failure message from the structured summary, naming the failing/last step; raw err.message is fallback only for system errors. 19/19 tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… with #263) Replaces the duplicated ✓-tapOn line regex with a filter over parseSteps. The gh-263 suite (13 tests) stays green — behavior is identical. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…rtial progress on timeout Spreads the structured summary into the result on success/warn/catch paths (data on ok/warn, meta on fail; output preserved for run-action). Catch path classifies timeout vs maxBuffer overflow and builds a raw-free headline naming the failing/last step. Full suite 2082/2082. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ded fields - findFailedStep now returns the TERMINAL failed step (last parsed step iff it failed), so a transient ✗ retried-✓ before a later timeout is not mis-blamed. - cap step names + reason.selector to 200 chars so a pathological step name/selector can't balloon the failure headline past the bounded `output`. 3 new tests; full suite 2085/2085. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…only Multi-review (Codex) found catastrophic backtracking in STEP_RE: the overlapping `\s+(.*?)\s*` whitespace quantifiers blow up on a glyph + long-whitespace line in the untrusted combined stdout+stderr (10MB app logs). Benchmark: prior pattern 30s @ n=2000; Codex's minimal `(\S.*?)` still 30ms @ n=4000 on trailing ws. Switched to `(\S.*\S|\S)\s*\(` (name starts AND ends non-space) — 0ms on all pathological inputs, byte-identical on valid lines. Also: failure headline now uses the structured reason when there's no terminal ✗ step line (raw-free). +2 tests (ReDoS timing guard, reason-only headline). Full suite 2087/2087. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 15b5ad06cf

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

parseSteps appended every match, so a multi-MB stdout/stderr with many step-shaped lines could bloat the MCP response past the output slice. Cap to the most recent MAX_STEPS=1000 (failures + partial-progress live at the tail), true index preserved so a gap signals truncation. +1 test. Full suite 2088/2088. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Lykhoyda and others added 11 commits June 14, 2026 23:40

chore(#211): changeset for maestro_run structured step results

b175287

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector Bot reviewed Jun 15, 2026

View reviewed changes

Comment thread scripts/cdp-bridge/src/domain/maestro-step-parser.ts

Lykhoyda merged commit 8305bbd into main Jun 15, 2026
10 checks passed

Lykhoyda deleted the feat/211-maestro-structured-results branch June 15, 2026 11:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(#211): maestro_run structured step results + partial progress on timeout#302

feat(#211): maestro_run structured step results + partial progress on timeout#302
Lykhoyda merged 12 commits into
mainfrom
feat/211-maestro-structured-results

Lykhoyda commented Jun 15, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Lykhoyda commented Jun 15, 2026

Summary

Result shape (additive — output preserved for run-action)

Process

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Result shape (additive — `output` preserved for `run-action`)