feat(#211): maestro_run structured step results + partial progress on timeout#302
Merged
Conversation
…al-progress-on-timeout Stdout-only parser generalizing #263; additive result fields {steps,failedStep,reason,lastStep,timedOut,outputTruncated}. Folds in the pre-code multi-LLM plan review (codex + claude file-grounded): reason must be raw-free (parseMaestroFailure embeds full output), maxBuffer-vs-timeout discrimination, verb trailing-colon strip, data/meta envelope placement, terminal-only failedStep, ANSI/runFlow edges. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ults 6 bite-sized TDD tasks: parseSteps+stripAnsi, buildStepSummary+helpers, classifyExecError, tap-latency refactor, maestro-run wiring, changeset. Encodes the multi-LLM review findings as concrete test cases. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…hangeset path MED#1: add pure formatFailureHeadline (structured, raw-free) for the catch-path message; raw err.message kept only as system-error fallback. MED#2: Task 6 runs from repo root, stages repo-root .changeset/ (two-package frontmatter rn-dev-agent-cdp + rn-dev-agent-plugin). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Pure, fail-open parser for `{✓|✗} verb[: sel] (N.Ns)` lines. verb is the first
token (trailing colon stripped); requires trailing (N.Ns) so the rn-maestro-run
summary + count lines are excluded; ANSI-stripped first. 7/7 tests.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…y failedStep)
findFailedStep/lastObservedStep/summarizeReason/buildStepSummary. summarizeReason
projects parseMaestroFailure to {kind,selector}, dropping raw (the full output).
failedStep/reason gated on opts.failed so a fail-then-retry-✓ passed run reports
no failure. 14/14 tests.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
classifyExecError distinguishes timeout (killed, no maxBuffer code) from a 10MB buffer overflow. formatFailureHeadline builds a raw-free failure message from the structured summary, naming the failing/last step; raw err.message is fallback only for system errors. 19/19 tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rtial progress on timeout Spreads the structured summary into the result on success/warn/catch paths (data on ok/warn, meta on fail; output preserved for run-action). Catch path classifies timeout vs maxBuffer overflow and builds a raw-free headline naming the failing/last step. Full suite 2082/2082. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ded fields - findFailedStep now returns the TERMINAL failed step (last parsed step iff it failed), so a transient ✗ retried-✓ before a later timeout is not mis-blamed. - cap step names + reason.selector to 200 chars so a pathological step name/selector can't balloon the failure headline past the bounded `output`. 3 new tests; full suite 2085/2085. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…only Multi-review (Codex) found catastrophic backtracking in STEP_RE: the overlapping `\s+(.*?)\s*` whitespace quantifiers blow up on a glyph + long-whitespace line in the untrusted combined stdout+stderr (10MB app logs). Benchmark: prior pattern 30s @ n=2000; Codex's minimal `(\S.*?)` still 30ms @ n=4000 on trailing ws. Switched to `(\S.*\S|\S)\s*\(` (name starts AND ends non-space) — 0ms on all pathological inputs, byte-identical on valid lines. Also: failure headline now uses the structured reason when there's no terminal ✗ step line (raw-free). +2 tests (ReDoS timing guard, reason-only headline). Full suite 2087/2087. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 15b5ad06cf
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
parseSteps appended every match, so a multi-MB stdout/stderr with many step-shaped lines could bloat the MCP response past the output slice. Cap to the most recent MAX_STEPS=1000 (failures + partial-progress live at the tail), true index preserved so a gap signals truncation. +1 test. Full suite 2088/2088. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Makes
maestro_runreturn structured per-step results and partial progress on timeout, parsed from maestro-runner stdout (no new file I/O). Closes #211. (Issue item 3 — iOSclearState— already shipped in #276/#201.)New pure module
maestro-step-parser.tsparses the{✓|✗} verb[: sel] (N.Ns)step lines maestro-runner already prints;tap-latency.ts(#263) now derives its tap latencies from the same parser (one regex, not two);maestro_runspreads additive fields into the result on all three paths.Result shape (additive —
outputpreserved forrun-action)dataon pass/warn,metaon fail:Process
reason-re-embeds-rawblocker, maxBuffer-vs-timeout, verb colon-strip, data/meta envelope placement, terminal-onlyfailedStep.\s+(.*?)\s*→ 30s on a glyph + long-whitespace line in untrusted 10MB stdout). Hardened to(\S.*\S|\S)\s*\(— 0ms on all pathological inputs, byte-identical on valid lines.Test plan
gh-263-tap-latency.test.jsgreen —parseTapLatenciesbehavior identical.stripAnsiis defensive-only);parseStepsfail-open on real non-step output →[]. Step-line format validated by enhancement: detect wedged simulator test-runtime (degraded tap latency) and recommend reboot #263's real fixtures (same binary v1.0.9).✓ step (N.Ns)capture (local WDA startup flaked on a fresh iOS 26.5 sim — environment, not the change) +runFlowsub-flow rendering.🤖 Generated with Claude Code