Skip to content

feat(#211): maestro_run structured step results + partial progress on timeout#302

Merged
Lykhoyda merged 12 commits into
mainfrom
feat/211-maestro-structured-results
Jun 15, 2026
Merged

feat(#211): maestro_run structured step results + partial progress on timeout#302
Lykhoyda merged 12 commits into
mainfrom
feat/211-maestro-structured-results

Conversation

@Lykhoyda

Copy link
Copy Markdown
Owner

Summary

Makes maestro_run return structured per-step results and partial progress on timeout, parsed from maestro-runner stdout (no new file I/O). Closes #211. (Issue item 3 — iOS clearState — already shipped in #276/#201.)

New pure module maestro-step-parser.ts parses the {✓|✗} verb[: sel] (N.Ns) step lines maestro-runner already prints; tap-latency.ts (#263) now derives its tap latencies from the same parser (one regex, not two); maestro_run spreads additive fields into the result on all three paths.

Result shape (additive — output preserved for run-action)

data on pass/warn, meta on fail:

steps:           {index,name,verb,status:'pass'|'fail',durationMs}[]
failedStep:      MaestroStep | null      // terminal failed step only
reason:          {kind,selector} | null  // sanitized — never the raw runner log
lastStep:        MaestroStep | null      // progress marker
timedOut:        boolean
outputTruncated: boolean                 // maxBuffer overflow, distinct from timeout

Process

  • Plan multi-reviewed pre-code (Codex + Claude): caught the reason-re-embeds-raw blocker, maxBuffer-vs-timeout, verb colon-strip, data/meta envelope placement, terminal-only failedStep.
  • TDD; per-edit codex-pair review (→ terminal-fail semantics + bounded fields); then full-diff multi-review:

Test plan

🤖 Generated with Claude Code

Lykhoyda and others added 11 commits June 14, 2026 23:40
…al-progress-on-timeout

Stdout-only parser generalizing #263; additive result fields
{steps,failedStep,reason,lastStep,timedOut,outputTruncated}. Folds in the
pre-code multi-LLM plan review (codex + claude file-grounded): reason must be
raw-free (parseMaestroFailure embeds full output), maxBuffer-vs-timeout
discrimination, verb trailing-colon strip, data/meta envelope placement,
terminal-only failedStep, ANSI/runFlow edges.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ults

6 bite-sized TDD tasks: parseSteps+stripAnsi, buildStepSummary+helpers,
classifyExecError, tap-latency refactor, maestro-run wiring, changeset.
Encodes the multi-LLM review findings as concrete test cases.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…hangeset path

MED#1: add pure formatFailureHeadline (structured, raw-free) for the catch-path
message; raw err.message kept only as system-error fallback.
MED#2: Task 6 runs from repo root, stages repo-root .changeset/ (two-package
frontmatter rn-dev-agent-cdp + rn-dev-agent-plugin).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Pure, fail-open parser for `{✓|✗} verb[: sel] (N.Ns)` lines. verb is the first
token (trailing colon stripped); requires trailing (N.Ns) so the rn-maestro-run
summary + count lines are excluded; ANSI-stripped first. 7/7 tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…y failedStep)

findFailedStep/lastObservedStep/summarizeReason/buildStepSummary. summarizeReason
projects parseMaestroFailure to {kind,selector}, dropping raw (the full output).
failedStep/reason gated on opts.failed so a fail-then-retry-✓ passed run reports
no failure. 14/14 tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
classifyExecError distinguishes timeout (killed, no maxBuffer code) from a 10MB
buffer overflow. formatFailureHeadline builds a raw-free failure message from the
structured summary, naming the failing/last step; raw err.message is fallback
only for system errors. 19/19 tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… with #263)

Replaces the duplicated ✓-tapOn line regex with a filter over parseSteps. The
gh-263 suite (13 tests) stays green — behavior is identical.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rtial progress on timeout

Spreads the structured summary into the result on success/warn/catch paths
(data on ok/warn, meta on fail; output preserved for run-action). Catch path
classifies timeout vs maxBuffer overflow and builds a raw-free headline naming
the failing/last step. Full suite 2082/2082.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ded fields

- findFailedStep now returns the TERMINAL failed step (last parsed step iff it
  failed), so a transient ✗ retried-✓ before a later timeout is not mis-blamed.
- cap step names + reason.selector to 200 chars so a pathological step
  name/selector can't balloon the failure headline past the bounded `output`.
3 new tests; full suite 2085/2085.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…only

Multi-review (Codex) found catastrophic backtracking in STEP_RE: the overlapping
`\s+(.*?)\s*` whitespace quantifiers blow up on a glyph + long-whitespace line in
the untrusted combined stdout+stderr (10MB app logs). Benchmark: prior pattern
30s @ n=2000; Codex's minimal `(\S.*?)` still 30ms @ n=4000 on trailing ws.
Switched to `(\S.*\S|\S)\s*\(` (name starts AND ends non-space) — 0ms on all
pathological inputs, byte-identical on valid lines. Also: failure headline now
uses the structured reason when there's no terminal ✗ step line (raw-free).
+2 tests (ReDoS timing guard, reason-only headline). Full suite 2087/2087.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 15b5ad06cf

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread scripts/cdp-bridge/src/domain/maestro-step-parser.ts
parseSteps appended every match, so a multi-MB stdout/stderr with many
step-shaped lines could bloat the MCP response past the output slice. Cap to the
most recent MAX_STEPS=1000 (failures + partial-progress live at the tail), true
index preserved so a gap signals truncation. +1 test. Full suite 2088/2088.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@Lykhoyda Lykhoyda merged commit 8305bbd into main Jun 15, 2026
10 checks passed
@Lykhoyda Lykhoyda deleted the feat/211-maestro-structured-results branch June 15, 2026 11:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

enhancement: maestro_run — structured step results, partial progress on timeout, iOS clearState support

1 participant