Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 18 additions & 8 deletions crates/broker/src/runtime/init.rs
Original file line number Diff line number Diff line change
Expand Up @@ -32,17 +32,27 @@ pub(crate) async fn run_init(cmd: InitCommand, telemetry: TelemetryClient) -> Re
let paths = if cmd.persist || custom_state_dir.is_some() {
ensure_runtime_paths(&runtime_cwd, &resolved_name, custom_state_dir.as_deref())?
} else {
// Warn if a stale .agent-relay/ dir exists from a previous persist run.
// Agents can read files from it directly (logs, state) and get confused.
let stale_dir = runtime_cwd.join(".agent-relay");
if stale_dir.exists() {
// Warn only if there is *actual broker state* in .agent-relay/ from a
// prior `--persist` run that could confuse this ephemeral run.
//
// The SDK workflow runner ALWAYS writes .agent-relay/step-outputs/ and
// .agent-relay/team/worker-logs/ regardless of broker mode (those are
// durable artifacts, not broker state), so a bare directory check fires
// on virtually every workflow run — a noisy false positive. The
// discriminator is `state.json`, which only the broker writes and only
// in persist mode.
let stale_state = runtime_cwd.join(".agent-relay").join("state.json");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Stale state check looks for state.json but persist mode creates state-{name}.json

The new stale-state detection checks for {cwd}/.agent-relay/state.json (init.rs:44), but ensure_runtime_paths in crates/broker/src/runtime/paths.rs:140,176,187 always creates state-{safe_name}.json (e.g., state-my-project.json) in persist mode. The only code that creates a bare state.json is ensure_ephemeral_paths (paths.rs:64), but that writes to a temp directory, not .agent-relay/. As a result, this stale-state warning will never fire for any persist-mode run, silently defeating the intent of the check.

Filename used in persist mode vs stale check

Persist mode (paths.rs:187):

state: root.join(format!("state-{safe_name}.json")),

Stale check (init.rs:44):

let stale_state = runtime_cwd.join(".agent-relay").join("state.json");
Prompt for agents
The stale state check hardcodes `state.json` but `ensure_runtime_paths` (paths.rs:140,176,187) creates `state-{safe_name}.json` where safe_name is derived from the broker name. This means the stale check never matches files left by a persist-mode broker.

To fix this, the check should look for any file matching the `state-*.json` glob pattern inside `.agent-relay/`, rather than a single hardcoded name. For example, use `std::fs::read_dir` on the `.agent-relay/` directory and check for any entry whose name starts with `state-` and ends with `.json`. Alternatively, just check for `.agent-relay/state-{safe_name}.json` using the same `resolved_name` variable that is already computed above (after sanitizing it the same way `ensure_runtime_paths` does).

The suggested-removal message should also be updated to reference the correct filename(s).
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — fixed in e2ceccf. ensure_runtime_paths writes state-{safe_name}.json (not bare state.json), so my original check would never have fired. Replaced the single-name check with a glob of .agent-relay/state-*.json so it catches state files from any broker name, and the warning now lists every match.

if stale_state.exists() {
eprintln!(
"[agent-relay] WARNING: stale .agent-relay/ directory found in {}",
runtime_cwd.display()
"[agent-relay] WARNING: stale broker state found at {}",
stale_state.display()
);
eprintln!(
"[agent-relay] WARNING: remove it to avoid confusing spawned agents: rm -rf {}",
stale_dir.display()
"[agent-relay] WARNING: this run is ephemeral but a prior --persist run left state behind."
);
eprintln!(
"[agent-relay] WARNING: remove it to avoid confusing spawned agents: rm {}",
stale_state.display()
);
}
ensure_ephemeral_paths(&runtime_cwd, &resolved_name)?
Expand Down
104 changes: 104 additions & 0 deletions packages/sdk/src/workflows/__tests__/scrub-pty-chrome.test.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
/**
* Regression tests for WorkflowRunner.scrubForChannel — the function that
* strips PTY/TUI chrome from interactive-agent step output before it gets
* surfaced in workflow logs and channel messages.
*
* The patterns covered here are taken from a real captured run of a
* multi-turn workflow against Claude Code's PTY: when its TUI footer
* overwrites itself faster than the PTY flushes whitespace, lines like
* `bypasspermissionson`, `--INSERT--⏵⏵`, and `Opus 4.7 (1M context) ctx:5%
* $1.45` end up in the captured stream. Before these regex additions, the
* step "Output:" block was unreadable on interactive-agent steps.
*/
import { describe, it, expect } from 'vitest';

import { WorkflowRunner } from '../runner.js';

// scrubForChannel is `private static` — the cast is the minimal-invasive way
// to exercise it from a test without exporting an internal-only helper.
const scrub = (text: string): string =>
(WorkflowRunner as unknown as { scrubForChannel(t: string): string }).scrubForChannel(text);

describe('WorkflowRunner.scrubForChannel — PTY chrome stripping', () => {
it('strips the Claude Code bottom status bar (model + ctx% + cost)', () => {
const input = [
'real content line',
'workflows git:(main) Opus 4.7 (1M context) ctx:5% $1.45',
'Opus4.7(1Mcontext) ctx:6% $1.54',
'another real line',
].join('\n');
const out = scrub(input);
expect(out).toContain('real content line');
expect(out).toContain('another real line');
expect(out).not.toMatch(/ctx\s*:\s*\d+%/);
expect(out).not.toMatch(/\$\d+\.\d+/);
});
Comment on lines +23 to +35
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Assert full footer-line removal, not just token removal.

This test can still pass if scrubForChannel leaves non-token footer fragments behind. Add exact-line absence checks to guard against partial stripping regressions.

Suggested test tightening
   const out = scrub(input);
   expect(out).toContain('real content line');
   expect(out).toContain('another real line');
+  expect(out).not.toContain('workflows git:(main) Opus 4.7 (1M context) ctx:5% $1.45');
+  expect(out).not.toContain('Opus4.7(1Mcontext) ctx:6% $1.54');
   expect(out).not.toMatch(/ctx\s*:\s*\d+%/);
   expect(out).not.toMatch(/\$\d+\.\d+/);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/sdk/src/workflows/__tests__/scrub-pty-chrome.test.ts` around lines
23 - 35, The test currently only ensures tokens are removed but may allow footer
line fragments to remain; update the test around scrub to assert the entire
footer lines are absent by checking that the exact footer strings are not
present as whole lines (e.g., use expect(out.split('\n')).not.toContain(...) or
a multiline regex like not.toMatch(/^[^\n]*Opus.*ctx.*\$\d+\.\d+$/m)) so the
full footer lines produced by Claude are stripped; reference the scrub helper
(and scrubForChannel if used elsewhere) when locating the logic to validate.


it('strips vim-style mode indicators emitted by the input bar', () => {
const input = [
'pre-mode line',
'--INSERT--',
'--INSERT--⏵⏵bypasspermissionson (shift+tabtocycle)',
'post-mode line',
].join('\n');
const out = scrub(input);
expect(out).toContain('pre-mode line');
expect(out).toContain('post-mode line');
expect(out).not.toMatch(/--INSERT--/);
});

it('strips no-whitespace TUI hint variants (bypasspermissionson, pasteagaintoexpand)', () => {
const input = ['before', 'bypasspermissionson', 'pasteagaintoexpand', 'shifttabto cycle', 'after'].join(
'\n'
);
const out = scrub(input);
expect(out).toContain('before');
expect(out).toContain('after');
expect(out).not.toMatch(/bypasspermissionson/);
expect(out).not.toMatch(/pasteagaintoexpand/);
});

it('strips thinking-status fragments without ellipsis anchors', () => {
const input = [
'meaningful: round 3 codex-player guess=19 feedback=correct',
'thinking with high effort',
'↓ 13 tokens · thinking with high effort',
'Crunched for 32s',
'Sautéed for 4s',
'Gitifying…55',
].join('\n');
const out = scrub(input);
expect(out).toContain('feedback=correct');
expect(out).not.toMatch(/thinking with high effort/);
expect(out).not.toMatch(/Crunched for/);
expect(out).not.toMatch(/Gitifying/);
});

it('preserves real content and OWNER_DECISION signals', () => {
const input = [
'Read 1 file, calling relaycast 2 times',
'Transcript verification reports TRANSCRIPT_OK with all 6 lines well-formed.',
'OWNER_DECISION: COMPLETE',
'REASON: All 6 turns executed, history.log has 6 lines.',
'STEP_COMPLETE: repair-transcript',
].join('\n');
const out = scrub(input);
expect(out).toContain('TRANSCRIPT_OK');
expect(out).toContain('OWNER_DECISION: COMPLETE');
expect(out).toContain('STEP_COMPLETE: repair-transcript');
expect(out).toContain('All 6 turns executed');
});

it('does not strip lines that merely mention model names in prose', () => {
// Guard against the new claudeFooterRe (which looks for `Opus|Sonnet|Haiku <num>
// (...context...) ctx:N%`) being too eager and removing prose that
// mentions a model name.
const input = [
'Compared output from Opus 4.7 against Sonnet 4.6 — both passed.',
'We chose Haiku 4.5 for its latency profile.',
].join('\n');
const out = scrub(input);
expect(out).toContain('Opus 4.7 against Sonnet 4.6');
expect(out).toContain('Haiku 4.5 for its latency profile');
});
});
20 changes: 19 additions & 1 deletion packages/sdk/src/workflows/builder.ts
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,25 @@ export interface AgentOptions {
maxTokens?: number;
timeoutMs?: number;
retries?: number;
/** Seconds of silence before considering the agent idle (for idle nudging). */
/**
* Seconds of silence on the agent's PTY before the runtime marks it idle and
* tears it down. Default: 30s. Set to `0` to disable idle detection entirely.
*
* When to override (per-agent):
* - You expect long quiet stretches by design — a long-running reviewer
* waiting for downstream verdicts, a grader watching a file that updates
* every few minutes, or a `@-mention` recipient whose triggering event
* may arrive >30s after spawn. Setting `0` (or a generous N) prevents
* the runtime from killing the agent before the awaited event arrives.
*
* When NOT to override:
* - One-shot worker steps. The default is right; idle-as-complete is what
* makes `OWNER_DECISION: COMPLETE` + clean exit fast.
*
* See the `writing-agent-relay-workflows` skill ("Idle detection beats
* 'wait for X' prompts") for the trade-offs around long-running interactive
* agents and the Per-turn interactive spawn alternative.
*/
idleThresholdSecs?: number;
/** When false, the agent runs as a non-interactive subprocess (no PTY, no relay messaging).
* Default: true. */
Expand Down
94 changes: 62 additions & 32 deletions packages/sdk/src/workflows/runner.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2111,7 +2111,7 @@ export class WorkflowRunner {
const repairRetries =
existing?.repairRetries ??
(hasRepairAgentCandidate
? existing?.maxRetries ?? DEFAULT_WORKFLOW_REPAIR_RETRIES
? (existing?.maxRetries ?? DEFAULT_WORKFLOW_REPAIR_RETRIES)
: existing?.repairRetries);

return {
Expand Down Expand Up @@ -2843,7 +2843,10 @@ export class WorkflowRunner {
this.validateConfig(resolved);
const runtimeConfig = this.applyReliabilityDefaults(resolved);

const permissionResult = this.validatePermissions(runtimeConfig.agents, runtimeConfig.permission_profiles);
const permissionResult = this.validatePermissions(
runtimeConfig.agents,
runtimeConfig.permission_profiles
);
if (permissionResult.errors.length > 0) {
throw new Error(`Permission validation failed:\n ${permissionResult.errors.join('\n ')}`);
}
Expand Down Expand Up @@ -3005,7 +3008,9 @@ export class WorkflowRunner {
throw new Error(`Run "${runId}" is in status "${run.status}" and cannot be resumed`);
}

const resolvedConfig = this.applyReliabilityDefaults(vars ? this.resolveVariables(run.config, vars) : run.config);
const resolvedConfig = this.applyReliabilityDefaults(
vars ? this.resolveVariables(run.config, vars) : run.config
);

// Resolve path definitions (same as execute()) so workdir lookups work on resume
const pathResult = this.resolvePathDefinitions(resolvedConfig.paths, this.cwd);
Expand Down Expand Up @@ -3728,7 +3733,7 @@ export class WorkflowRunner {
errorHandling: ErrorHandlingConfig | undefined,
lifecycle: WorkflowStepLifecycleExecutor<StepState>
): Promise<void> {
const repairRetries = errorHandling?.strategy === 'retry' ? errorHandling.repairRetries ?? 0 : 0;
const repairRetries = errorHandling?.strategy === 'retry' ? (errorHandling.repairRetries ?? 0) : 0;
const repairAgent =
repairRetries > 0
? this.resolveWorkflowRepairAgent(step, stepStates, agentMap, errorHandling)
Expand Down Expand Up @@ -4730,34 +4735,34 @@ export class WorkflowRunner {
promptTaskText: ownerTask,
}
: await this.spawnAndWait(effectiveOwner, resolvedStep, timeoutMs, {
retryAttempt: attempt,
evidenceStepName: step.name,
evidenceRole: usesOwnerFlow ? 'owner' : 'specialist',
preserveOnIdle: !isHubPattern || !this.isLeadLikeAgent(effectiveOwner) ? false : undefined,
logicalName: effectiveOwner.name,
onSpawned: explicitInteractiveWorker
? ({ agent }) => {
explicitWorkerHandle = agent;
}
: undefined,
onChunk: explicitInteractiveWorker
? ({ chunk }) => {
explicitWorkerOutput += WorkflowRunner.stripAnsi(chunk);
if (
!explicitWorkerCompleted &&
this.hasExplicitInteractiveWorkerCompletionEvidence(
step,
explicitWorkerOutput,
ownerTask,
resolvedTask
)
) {
explicitWorkerCompleted = true;
void explicitWorkerHandle?.release().catch(() => undefined);
retryAttempt: attempt,
evidenceStepName: step.name,
evidenceRole: usesOwnerFlow ? 'owner' : 'specialist',
preserveOnIdle: !isHubPattern || !this.isLeadLikeAgent(effectiveOwner) ? false : undefined,
logicalName: effectiveOwner.name,
onSpawned: explicitInteractiveWorker
? ({ agent }) => {
explicitWorkerHandle = agent;
}
}
: undefined,
});
: undefined,
onChunk: explicitInteractiveWorker
? ({ chunk }) => {
explicitWorkerOutput += WorkflowRunner.stripAnsi(chunk);
if (
!explicitWorkerCompleted &&
this.hasExplicitInteractiveWorkerCompletionEvidence(
step,
explicitWorkerOutput,
ownerTask,
resolvedTask
)
) {
explicitWorkerCompleted = true;
void explicitWorkerHandle?.release().catch(() => undefined);
}
}
: undefined,
});
const output = typeof spawnResult === 'string' ? spawnResult : spawnResult.output;
promptTaskText =
typeof spawnResult === 'string'
Expand Down Expand Up @@ -7727,11 +7732,33 @@ export class WorkflowRunner {
/^(?:[\s\u2580-\u259f✢*·▗▖▘▝]+\s*)?(?:Claude\s+Code(?:\s+v?[\d.]+)?|(?:Sonnet|Haiku|Opus)\s*[\d.]+|claude-(?:sonnet|haiku|opus)-[\w.-]+|Running\s+on\s+claude)/iu;
// TUI directory breadcrumb lines (e.g. " ~/Projects/agent-workforce/relay-...")
const dirBreadcrumbRe = /^\s*~[\\/]/u;
// The `\s*` variants below catch the no-whitespace TUI renderings that
// happen when Claude Code's footer overwrites itself faster than the PTY
// can flush whitespace — we see lines like `bypasspermissionson` and
// `pasteagaintoexpand` in real captures.
// Trailing `\b` is intentionally omitted: real PTY captures often show the
// hint with extra word chars stuck to it ("bypasspermissionson",
// "interrupting") because the TUI overwrote whitespace. Leading `\b` is
// enough to anchor without false-positive matches inside unrelated words.
const uiHintRe =
/\b(?:Press\s+up\s+to\s+edit|tab\s+to\s+queue|bypass\s+permissions|esc\s+to\s+interrupt)\b/iu;
/\b(?:Press\s*up\s*to\s*edit|tab\s*to\s*queue|bypass\s*permissions|esc\s*to\s*interrupt|paste\s*again\s*to\s*expand|shift\s*[+]?\s*tab\s*to\s*cycle|running\s+stop\s+hook|fan\s+out\s+subagents)/iu;
// Vim-style mode indicators that Claude Code's input bar emits when the
// user is mid-edit (e.g. `--INSERT--⏵⏵bypasspermissionson`).
const vimModeRe =
/^[-\s]*--?(?:INSERT|NORMAL|VISUAL|REPLACE)--?[-\s]*$|--?(?:INSERT|NORMAL|VISUAL|REPLACE)--/u;
// Claude Code bottom status bar: "Opus 4.7 (1M context) ctx:5% $1.45"
// or "workflows git:(main) Opus 4.7 (1M context) ctx:5% $1.45".
// The shell-prompt + model-label + ctx% + cost line is pure chrome.
const claudeFooterRe =
/(?:Opus|Sonnet|Haiku)\s*\d[\d.]*\s*\(?(?:1M\s*context|context)?\)?\s*ctx\s*:\s*\d+%/iu;
// Any spinner-prefixed word ending in … — catches all Claude thinking animations
// regardless of the specific word used (Thinking, Cascading, Flibbertigibbeting, etc.)
const thinkingLineRe = new RegExp(`^[\\s${SPINNER}]*\\s*\\w[\\w\\s]*\\u2026\\s*$`, 'u');
// Looser thinking-status filter: catches "thinking with high effort", "↓ 13 tokens · thinking",
// "Crunched for 32s", "Befuddling… running stop hook", token-count interludes — all
// ambient TUI status that the previous ellipsis-anchored regex missed.
const thinkingStatusRe =
/\b(?:thinking\s+(?:with\s+\w+\s+effort|more\s+with|harder)|↓\s*\d+\s*tokens?\b|↑\s*\d+\s*tokens?\b|crunched\s+for\s+\d|sautéed\s+for\s+\d|befuddl|flibbertigib|gitifying|flowing\s*…)/iu;
const cursorOnlyRe = /^[\s❯⎿›»◀▶←→↑↓⟨⟩⟪⟫·]+$/u;
// Cursor Agent TUI lines: generating animations, pasted text indicators, UI chrome
const cursorAgentRe =
Expand Down Expand Up @@ -7777,7 +7804,10 @@ export class WorkflowRunner {
if (claudeHeaderRe.test(trimmed)) continue;
if (dirBreadcrumbRe.test(trimmed)) continue;
if (uiHintRe.test(trimmed)) continue;
if (vimModeRe.test(trimmed)) continue;
if (claudeFooterRe.test(trimmed)) continue;
if (thinkingLineRe.test(trimmed)) continue;
if (thinkingStatusRe.test(trimmed)) continue;
if (cursorOnlyRe.test(trimmed)) continue;
if (cursorAgentRe.test(trimmed)) continue;
if (slashCommandRe.test(trimmed)) continue;
Expand Down
Loading