Skip to content

Commit 8305bbd

Browse files
Lykhoydaclaude
andauthored
feat(#211): maestro_run structured step results + partial progress on timeout (#302)
* docs(#211): design spec — maestro_run structured step results + partial-progress-on-timeout Stdout-only parser generalizing #263; additive result fields {steps,failedStep,reason,lastStep,timedOut,outputTruncated}. Folds in the pre-code multi-LLM plan review (codex + claude file-grounded): reason must be raw-free (parseMaestroFailure embeds full output), maxBuffer-vs-timeout discrimination, verb trailing-colon strip, data/meta envelope placement, terminal-only failedStep, ANSI/runFlow edges. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(#211): TDD implementation plan — maestro_run structured step results 6 bite-sized TDD tasks: parseSteps+stripAnsi, buildStepSummary+helpers, classifyExecError, tap-latency refactor, maestro-run wiring, changeset. Encodes the multi-LLM review findings as concrete test cases. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(#211): amend plan from codex-pair review — raw-free headline + changeset path MED#1: add pure formatFailureHeadline (structured, raw-free) for the catch-path message; raw err.message kept only as system-error fallback. MED#2: Task 6 runs from repo root, stages repo-root .changeset/ (two-package frontmatter rn-dev-agent-cdp + rn-dev-agent-plugin). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(#211): parseSteps + stripAnsi — structure maestro-runner step lines Pure, fail-open parser for `{✓|✗} verb[: sel] (N.Ns)` lines. verb is the first token (trailing colon stripped); requires trailing (N.Ns) so the rn-maestro-run summary + count lines are excluded; ANSI-stripped first. 7/7 tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(#211): buildStepSummary + helpers (raw-free reason, terminal-only failedStep) findFailedStep/lastObservedStep/summarizeReason/buildStepSummary. summarizeReason projects parseMaestroFailure to {kind,selector}, dropping raw (the full output). failedStep/reason gated on opts.failed so a fail-then-retry-✓ passed run reports no failure. 14/14 tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(#211): classifyExecError + formatFailureHeadline classifyExecError distinguishes timeout (killed, no maxBuffer code) from a 10MB buffer overflow. formatFailureHeadline builds a raw-free failure message from the structured summary, naming the failing/last step; raw err.message is fallback only for system errors. 19/19 tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(#211): parseTapLatencies derives from shared parseSteps (DRY with #263) Replaces the duplicated ✓-tapOn line regex with a filter over parseSteps. The gh-263 suite (13 tests) stays green — behavior is identical. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(#211): maestro_run returns steps/failedStep/reason/lastStep + partial progress on timeout Spreads the structured summary into the result on success/warn/catch paths (data on ok/warn, meta on fail; output preserved for run-action). Catch path classifies timeout vs maxBuffer overflow and builds a raw-free headline naming the failing/last step. Full suite 2082/2082. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * chore(#211): changeset for maestro_run structured step results Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(#211): address codex-pair review — terminal-fail semantics + bounded fields - findFailedStep now returns the TERMINAL failed step (last parsed step iff it failed), so a transient ✗ retried-✓ before a later timeout is not mis-blamed. - cap step names + reason.selector to 200 chars so a pathological step name/selector can't balloon the failure headline past the bounded `output`. 3 new tests; full suite 2085/2085. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(#211): eliminate STEP_RE ReDoS + structured headline when reason-only Multi-review (Codex) found catastrophic backtracking in STEP_RE: the overlapping `\s+(.*?)\s*` whitespace quantifiers blow up on a glyph + long-whitespace line in the untrusted combined stdout+stderr (10MB app logs). Benchmark: prior pattern 30s @ n=2000; Codex's minimal `(\S.*?)` still 30ms @ n=4000 on trailing ws. Switched to `(\S.*\S|\S)\s*\(` (name starts AND ends non-space) — 0ms on all pathological inputs, byte-identical on valid lines. Also: failure headline now uses the structured reason when there's no terminal ✗ step line (raw-free). +2 tests (ReDoS timing guard, reason-only headline). Full suite 2087/2087. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(#211): cap returned steps to most-recent 1000 (PR #302 review P2) parseSteps appended every match, so a multi-MB stdout/stderr with many step-shaped lines could bloat the MCP response past the output slice. Cap to the most recent MAX_STEPS=1000 (failures + partial-progress live at the tail), true index preserved so a gap signals truncation. +1 test. Full suite 2088/2088. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent 1c35f6d commit 8305bbd

10 files changed

Lines changed: 1338 additions & 39 deletions
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
---
2+
"rn-dev-agent-cdp": patch
3+
"rn-dev-agent-plugin": patch
4+
---
5+
6+
`maestro_run` now returns structured per-step results and partial progress on timeout (GH #211).
7+
8+
The result gains `steps[]` (`{index,name,verb,status,durationMs}`), `failedStep`, `reason` (sanitized `{kind,selector}` — never the raw runner log), `lastStep` (progress marker), `timedOut`, and `outputTruncated`. On timeout the partial steps are returned instead of a bare failure, and the failure headline names the failing/last step. Parsed from maestro-runner stdout (the JVM Maestro CLI fallback degrades fail-open to empty steps); `tapOn` latencies for #263 now derive from the shared parser. Additive — `output` is preserved for `run-action` consumers.

docs/superpowers/plans/2026-06-14-211-maestro-structured-results.md

Lines changed: 627 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 163 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,163 @@
1+
# Design — #211: `maestro_run` structured step results + partial-progress-on-timeout
2+
3+
**Date:** 2026-06-14
4+
**Issue:** [#211](https://github.com/Lykhoyda/rn-dev-agent/issues/211)
5+
**Status:** Approved design → ready for plan
6+
**Branch:** `feat/211-maestro-structured-results` (off `main`; #263 already merged)
7+
8+
## Problem
9+
10+
`maestro_run` works, but verification is harder than it should be:
11+
12+
1. **Output truncated mid-flow** — success is only confirmable via top-level `passed: true` + a separate `cdp_navigation_state`. Per-step pass/fail/durations and terminal assertions aren't visible. The reporter re-ran `grep 'message=' reports/<ts>/junit-report.xml` after *nearly every* failed run (~30/session) to find the failing step.
13+
2. **Timeout returns a bare failure** — a flow that exceeds the cap yields no verdict and no "how far did it get".
14+
15+
Issue item 3 (iOS `clearState` needing `--app-file`) **already shipped** in #276/#201 (`resolveAppFileForClearState`) — out of scope here.
16+
17+
## Goal
18+
19+
Add **structured, additive** fields to the `maestro_run` result so the failing step, reason, and per-step durations are visible without grepping report files, and so a timeout returns partial progress. The per-step durations also become the clean data source #263's degraded-tap-latency heuristic already wants.
20+
21+
## Scope
22+
23+
- **IN:** structured step results; partial progress on timeout.
24+
- **OUT:** iOS `clearState` (shipped); JUnit/report-file parsing (chose stdout-only); `screenshots[]` (YAGNI — the reporter's own suggestion had it empty); bumping the default timeout (the partial-progress *return* is the fix, not a bigger cap).
25+
- **Source:** maestro-runner **stdout** only. The JVM Maestro CLI fallback (iOS-no-adb) emits a different format and degrades **fail-open** to `steps: []` + raw `output`.
26+
27+
## Architecture
28+
29+
Three files; one new pure module. The win is that the per-step data is **already in the stdout** maestro-runner prints — the exact lines #263 parses — so #211's parser is a *generalization* of #263's, and `parseTapLatencies` collapses to a filter over it.
30+
31+
### 1. NEW `src/domain/maestro-step-parser.ts` (pure, no I/O, fail-open)
32+
33+
```ts
34+
export interface ReasonSummary {
35+
kind: 'SELECTOR_NOT_FOUND' | 'TIMEOUT' | 'ASSERTION_FAILED';
36+
selector: string | null;
37+
}
38+
39+
export interface MaestroStep {
40+
index: number; // 0-based observed order — disambiguates loops / runFlow repeats
41+
name: string; // full step text minus the trailing (N.Ns), e.g. `tapOn: id="submit"`
42+
verb: string; // first token after the glyph, trailing ':' stripped, e.g. `tapOn`
43+
status: 'pass' | 'fail';
44+
durationMs: number;
45+
}
46+
47+
export function stripAnsi(s: string): string; // remove SGR codes before matching
48+
export function parseSteps(output: string): MaestroStep[]; // completed steps only (those with a (N.Ns))
49+
export function findFailedStep(steps: MaestroStep[]): MaestroStep | null; // last status==='fail'
50+
export function lastObservedStep(steps: MaestroStep[]): MaestroStep | null; // steps.at(-1)
51+
export function summarizeReason(output: string): ReasonSummary | null; // sanitized — NO raw
52+
53+
export interface StepSummary {
54+
steps: MaestroStep[];
55+
failedStep: MaestroStep | null; // terminal failure only; null unless opts.failed
56+
reason: ReasonSummary | null; // null unless opts.failed
57+
lastStep: MaestroStep | null; // last observed (completed) step — the progress marker
58+
}
59+
export function buildStepSummary(output: string, opts: { failed: boolean }): StepSummary;
60+
```
61+
62+
**Line grammar (verified against the #263 fixtures):** each step prints as
63+
` {✓|✗} {verb}[: {selector}] (N.Ns)`. Parser rules:
64+
65+
- `stripAnsi()` first (belt-and-suspenders; `execFile` is not a TTY so color is *usually* off, but unverified against the real binary — see Risks).
66+
- Anchor on a leading status glyph ``/`` after trimming.
67+
- **Require a trailing `(N.Ns)`** — this excludes the summary line `✗ rn-maestro-run 23.8s` (no parens) and the count lines `3 steps passing` (no glyph). Belt-and-suspenders: also skip a line whose verb is `rn-maestro-run`.
68+
- `verb` = first whitespace-delimited token after the glyph, **with a trailing `:` stripped** (`tapOn:``tapOn`). This is load-bearing for the #263 refactor (filter `verb === 'tapOn'`).
69+
- `name` = the line minus the glyph and the trailing `(N.Ns)`.
70+
- `durationMs` = `round(seconds * 1000)`.
71+
- `verb` is the FIRST token, so a verb-name *inside a selector value* (`✓ assertVisible: text="tapOn …"`) is recorded as `assertVisible` — preserves #263 review-finding #2.
72+
- Garbage / empty / CLI-fallback format → `[]`. Never throws.
73+
74+
**`failedStep` is terminal-only.** `findFailedStep` returns the last `` step, but `buildStepSummary` only populates `failedStep`/`reason` when `opts.failed` is true. maestro-runner logs transient retries; a step that fails-then-retries-✓ on a run that ultimately **passed** must NOT report a `failedStep` (mirrors `parseMaestroFailure`'s END→START terminal-preference, GH#118). The handler passes `failed = !passed`, so on the success path `failedStep` is always null even if a transient `` appears in `steps`.
75+
76+
**`reason` is sanitized — never carries `raw`.** `summarizeReason` calls `parseMaestroFailure` but **projects to `{ kind, selector }`**, explicitly dropping the `raw: string` field that every `MaestroFailure` variant carries. Returning the parser's object directly would re-embed the full unsliced runner log into the result, defeating the 2000/4000-char `output` slice. (UNKNOWN → `null`.)
77+
78+
### 2. REFACTOR `src/domain/tap-latency.ts`
79+
80+
```ts
81+
import { parseSteps } from './maestro-step-parser.js';
82+
export function parseTapLatencies(output: string): number[] {
83+
return parseSteps(output)
84+
.filter((s) => s.verb === 'tapOn' && s.status === 'pass')
85+
.map((s) => s.durationMs);
86+
}
87+
```
88+
89+
`gh-263-tap-latency.test.js` is the regression guard — the `DEGRADED` fixture must still yield `[2800, 3000]` and the single-failed-tap fixture `[]`. `classifyRuntimeDegradation`, `median`, `resolveFloorMs`, `augmentFailureWithDegradation` are unchanged. (The ≥2-sample gate `MIN_SAMPLES_FOR_DEGRADED` is unaffected — it operates on the filtered latency array.)
90+
91+
### 3. WIRE `src/tools/maestro-run.ts`
92+
93+
The current `meta` object (the payload passed to `okResult`/`warnResult`/`failResult`) is extended with the **same field set on all three paths**. Because `okResult(x)`/`warnResult(x,…)` place `x` in `envelope.data` while `failResult(msg,x)` places `x` in `envelope.meta`, the structured fields appear under `data.*` on pass/warn and `meta.*` on fail — and `output` is preserved on every path (`run-action.ts:144` reads `data.output` then `meta.output`).
94+
95+
Added fields (stable set, present on every path):
96+
97+
```ts
98+
steps: MaestroStep[]
99+
failedStep: MaestroStep | null
100+
reason: ReasonSummary | null
101+
lastStep: MaestroStep | null
102+
timedOut: boolean
103+
outputTruncated: boolean
104+
```
105+
106+
- **success** (exit 0, `passed`): `buildStepSummary(output, { failed: false })``steps` + `lastStep`; `failedStep:null, reason:null, timedOut:false, outputTruncated:false`.
107+
- **warn** (exit 0 but `outputIndicatesFlowFailure`): `buildStepSummary(output, { failed: true })`; `timedOut:false, outputTruncated:false`; existing `augmentFailureWithDegradation` (#263) unchanged.
108+
- **catch** (non-zero / timeout / overflow): parse the partial `combined` (stdout+stderr Node attaches to the thrown error); `buildStepSummary(combined, { failed: true })`; existing `#263` augmentation unchanged. Timeout vs overflow discrimination:
109+
```ts
110+
const killed = (err as any).killed === true;
111+
const overflow = (err as any).code === 'ERR_CHILD_PROCESS_STDIO_MAXBUFFER';
112+
const timedOut = killed && !overflow;
113+
const outputTruncated = overflow;
114+
```
115+
`err.killed` is the authoritative timeout discriminator (empirical Node probe: timeout → `killed:true, signal:'SIGTERM', code:null`; normal non-zero → `killed:false, code:N`; a SIGTERM-trapping child can leave `code` non-null while killed, so `code` is used only to *subtract* the maxBuffer case). On a pure timeout `failedStep` is `null` (nothing asserted-failed) and `lastStep` is the last **completed** step — the progress marker (an in-flight step has no `(N.Ns)` yet, so it isn't parsed).
116+
117+
## Result shape (consumer view)
118+
119+
```jsonc
120+
// success/warn → envelope.data ; fail/timeout → envelope.meta
121+
{
122+
"passed": false,
123+
"flowFile": "/tmp/rn-maestro-run-….yaml",
124+
"platform": "ios",
125+
"runner": "maestro-runner",
126+
"output": "…sliced 2000/4000…", // unchanged — back-compat
127+
"steps": [
128+
{ "index": 0, "name": "launchApp", "verb": "launchApp", "status": "pass", "durationMs": 2300 },
129+
{ "index": 1, "name": "tapOn: id=\"submit\"", "verb": "tapOn", "status": "fail", "durationMs": 12700 }
130+
],
131+
"failedStep": { "index": 1, "name": "tapOn: id=\"submit\"", "verb": "tapOn", "status": "fail", "durationMs": 12700 },
132+
"reason": { "kind": "SELECTOR_NOT_FOUND", "selector": "submit" },
133+
"lastStep": { "index": 1, "name": "tapOn: id=\"submit\"", "verb": "tapOn", "status": "fail", "durationMs": 12700 },
134+
"timedOut": false,
135+
"outputTruncated": false,
136+
"runtimeDegraded": { "medianTapMs": 1800, "floorMs": 1500, "sampleCount": 3 } // #263, only when degraded
137+
}
138+
```
139+
140+
## Testing (TDD)
141+
142+
- **NEW `test/unit/gh-211-maestro-step-parser.test.js`** — pure parser + helpers:
143+
- verbs/status/durations; verb has NO trailing colon; index is observed order.
144+
- excludes `✗ rn-maestro-run 23.8s` summary line and `N steps passing/failing` count lines.
145+
- verb-in-selector (`assertVisible: text="tapOn …"`) → verb `assertVisible`.
146+
- empty / garbage / CLI-format → `[]`; never throws.
147+
- `stripAnsi` removes SGR codes; an ANSI-wrapped glyph line still parses.
148+
- `findFailedStep` = last ``; `lastObservedStep` = `steps.at(-1)`.
149+
- `summarizeReason` returns `{ kind, selector }` and **contains no `raw`** (assert the key is absent); UNKNOWN → null.
150+
- `buildStepSummary(out,{failed:false})``failedStep:null,reason:null`; fail-then-retry-✓ output with `{failed:false}``failedStep:null`.
151+
- **REGRESSION `test/unit/gh-263-tap-latency.test.js`** stays green (proves `parseTapLatencies` unchanged).
152+
- **NEW `test/unit/gh-211-maestro-run-structured-results.test.js`** — exercise the pure assembly seam directly (no `execFile` mocking): success/warn metas via `buildStepSummary`; catch-path via a fake error `{ killed:true, code:null, stdout:'…partial…', stderr:'' }` asserting `timedOut:true, failedStep:null, lastStep=<last ✓>`; a maxBuffer fake `{ killed:true, code:'ERR_CHILD_PROCESS_STDIO_MAXBUFFER' }` asserting `timedOut:false, outputTruncated:true`.
153+
- **Patch changeset.**
154+
155+
## Risks / open items
156+
157+
- **ANSI (unverified against the real binary).** No ANSI handling exists in the repo and `execFile` is not a TTY (color usually off), but not guaranteed. Mitigation: ship `stripAnsi()` + test now; during device-verify run `~/.maestro-runner/bin/maestro-runner --platform ios test <flow> | grep -c $'\x1b'` to settle whether the strip is load-bearing or belt-and-suspenders.
158+
- **`runFlow` sub-flows.** `runFlow` is allowlisted/used (GH#186). No captured fixture shows how maestro-runner renders sub-flow child glyphs. `steps[]` is documented as a **flat observed list** — no parent/child hierarchy promised; `index` disambiguates repeats. Confirm rendering during device-verify.
159+
- **CLI fallback** produces no structured steps (different format) → `steps: []`, `output` intact. Acceptable: maestro-runner is the default fast path; fail-open matches #263.
160+
161+
## Provenance
162+
163+
Plan reviewed pre-code via `/brainstorm codex,antigravity` (2026-06-14). Codex + Claude file-grounded research caught: the `reason`-re-embeds-`raw` blocker, the maxBuffer-vs-timeout blocker, the `verb` trailing-colon trap, the `data` vs `meta` envelope placement, terminal-only `failedStep`, and ANSI/`runFlow` edges — all folded into this design. (Antigravity hung with no output.)
Lines changed: 129 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,129 @@
1+
// src/domain/maestro-step-parser.ts
2+
// GH #211: structure maestro_run results from maestro-runner stdout. Pure, no
3+
// I/O, fail-open: unparseable output yields []. Generalizes the #263 step-line
4+
// parser (tap-latency.ts derives parseTapLatencies from parseSteps).
5+
import { parseMaestroFailure } from './maestro-error-parser.js';
6+
// Strip ANSI SGR/color escape sequences. execFile output is usually un-colored
7+
// (child stdout is a pipe, not a TTY) but maestro-runner is not guaranteed to
8+
// honor that, and a glyph-anchored match breaks on a colored `✓`. Built via
9+
// fromCharCode(27) (ESC) to keep a raw control char out of the source/regex.
10+
const ANSI_RE = new RegExp(String.fromCharCode(27) + '\\[[0-9;]*m', 'g');
11+
export function stripAnsi(s) {
12+
return s.replace(ANSI_RE, '');
13+
}
14+
// ` {✓|✗} <name> (N.Ns)` — the trailing (N.Ns) is REQUIRED, which excludes the
15+
// `✗ rn-maestro-run 23.8s` summary line and the `N steps passing` count lines.
16+
// The name is `\S.*\S|\S` (must start AND end non-whitespace). This keeps a
17+
// duration-looking token inside the selector value (`text="took (2.0s)"`) losing
18+
// to the real trailing `$`-anchored duration, AND removes the overlapping
19+
// whitespace quantifiers (`\s+(.*?)\s*`) that made the prior pattern
20+
// catastrophically backtrack (ReDoS) on a glyph + long-whitespace line — the
21+
// combined stdout+stderr carries untrusted multi-MB app logs (multi-LLM review).
22+
const STEP_RE = /^([])\s+(\S.*\S|\S)\s*\(([\d.]+)s\)\s*$/;
23+
// Bound any text interpolated into results/headline so a pathological step name
24+
// or selector (e.g. a multi-KB inputText value) can't balloon the failure
25+
// message and defeat the sliced `output` field (codex-pair review).
26+
const MAX_FIELD = 200;
27+
function cap(s) {
28+
return s.length > MAX_FIELD ? s.slice(0, MAX_FIELD) + '…' : s;
29+
}
30+
// Cap the returned steps so a pathological run (a multi-MB stdout/stderr with
31+
// many step-shaped log lines) can't bloat the MCP response past the `output`
32+
// slice. Keep the most recent steps — failures and partial-progress live at the
33+
// tail — with their true `index` preserved (a gap signals truncation).
34+
const MAX_STEPS = 1000;
35+
export function parseSteps(output) {
36+
if (!output || typeof output !== 'string')
37+
return [];
38+
const steps = [];
39+
let index = 0;
40+
for (const raw of stripAnsi(output).split('\n')) {
41+
const m = STEP_RE.exec(raw.trim());
42+
if (!m)
43+
continue;
44+
const name = m[2].trim();
45+
const verb = name.split(/\s+/)[0].replace(/:$/, '');
46+
if (verb === 'rn-maestro-run')
47+
continue; // belt-and-suspenders vs a future summary format
48+
const seconds = Number(m[3]);
49+
if (!Number.isFinite(seconds))
50+
continue;
51+
steps.push({
52+
index: index++,
53+
name: cap(name),
54+
verb,
55+
status: m[1] === '✓' ? 'pass' : 'fail',
56+
durationMs: Math.round(seconds * 1000),
57+
});
58+
}
59+
return steps.length > MAX_STEPS ? steps.slice(-MAX_STEPS) : steps;
60+
}
61+
// The TERMINAL failed step: the last parsed step iff it failed. maestro-runner
62+
// stops at the first real failure, so the terminal ✗ is the last parsed step; a
63+
// transient ✗ that was retried-✓ before a later timeout is NOT reported, because
64+
// the last parsed step is then the recovery ✓ (codex-pair review).
65+
export function findFailedStep(steps) {
66+
const last = steps.length ? steps[steps.length - 1] : null;
67+
return last && last.status === 'fail' ? last : null;
68+
}
69+
export function lastObservedStep(steps) {
70+
return steps.length ? steps[steps.length - 1] : null;
71+
}
72+
// Project parseMaestroFailure to {kind, selector}, DROPPING its `raw` field —
73+
// every MaestroFailure variant carries `raw` = the full unsliced output, which
74+
// must not be re-embedded into the result (it would defeat the output slice).
75+
export function summarizeReason(output) {
76+
const f = parseMaestroFailure(output);
77+
if (f.kind === 'UNKNOWN')
78+
return null;
79+
const selector = 'selector' in f ? (f.selector ?? null) : null;
80+
return { kind: f.kind, selector: selector === null ? null : cap(selector) };
81+
}
82+
// failedStep/reason are populated ONLY when the run's terminal verdict is fail
83+
// (opts.failed). maestro-runner logs transient retries; a fail-then-retry-✓ on
84+
// a PASSED run must not report a failedStep (mirrors parseMaestroFailure GH#118).
85+
export function buildStepSummary(output, opts) {
86+
const steps = parseSteps(output);
87+
return {
88+
steps,
89+
failedStep: opts.failed ? findFailedStep(steps) : null,
90+
reason: opts.failed ? summarizeReason(output) : null,
91+
lastStep: lastObservedStep(steps),
92+
};
93+
}
94+
// execFile timeout kills the child (killed===true, signal 'SIGTERM', code null).
95+
// A 10MB maxBuffer overflow ALSO rejects with killed===true but code
96+
// 'ERR_CHILD_PROCESS_STDIO_MAXBUFFER' — that's truncation, not a timeout, so it
97+
// must not be mislabeled. `killed` is authoritative; `code` only subtracts the
98+
// overflow case (a SIGTERM-trapping child can leave a non-null exit code).
99+
export function classifyExecError(err) {
100+
const e = err;
101+
const killed = e?.killed === true;
102+
const overflow = e?.code === 'ERR_CHILD_PROCESS_STDIO_MAXBUFFER';
103+
return { timedOut: killed && !overflow, outputTruncated: overflow };
104+
}
105+
// Headline for a failed maestro_run, built from STRUCTURED data so it never
106+
// re-embeds raw runner/app output. The raw fallbackMsg (err.message, which
107+
// execFile populates with stderr) is used ONLY when there is no structured
108+
// signal — e.g. a spawn/system error with no step output. Raw output still
109+
// lives in the bounded `output` field.
110+
export function formatFailureHeadline(summary, cls, fallbackMsg) {
111+
if (cls.timedOut) {
112+
return `Maestro flow timed out${summary.lastStep ? ` after step "${summary.lastStep.name}"` : ''}`;
113+
}
114+
if (cls.outputTruncated) {
115+
return 'Maestro flow output exceeded the 10MB buffer';
116+
}
117+
if (summary.failedStep) {
118+
const r = summary.reason;
119+
const reasonStr = r ? ` (${r.kind}${r.selector ? `: ${r.selector}` : ''})` : '';
120+
return `Maestro flow failed at step "${summary.failedStep.name}"${reasonStr}`;
121+
}
122+
// No terminal ✗ step line (e.g. it was truncated) but a recognizable error
123+
// string survived — prefer the structured, raw-free reason over the raw msg.
124+
if (summary.reason) {
125+
const r = summary.reason;
126+
return `Maestro flow failed (${r.kind}${r.selector ? `: ${r.selector}` : ''})`;
127+
}
128+
return `Maestro flow failed: ${fallbackMsg.slice(0, 500)}`;
129+
}

0 commit comments

Comments
 (0)