Skip to content

feat: harden autonomy lifecycle, OOM bounds, and provider-boundary finalization#386

Merged
claude-code-best merged 6 commits into
claude-code-best:mainfrom
amDosion:feat/autonomy-lifecycle-upstream
Apr 29, 2026
Merged

feat: harden autonomy lifecycle, OOM bounds, and provider-boundary finalization#386
claude-code-best merged 6 commits into
claude-code-best:mainfrom
amDosion:feat/autonomy-lifecycle-upstream

Conversation

@amDosion
Copy link
Copy Markdown
Collaborator

@amDosion amDosion commented Apr 29, 2026

feat: harden autonomy lifecycle, OOM bounds, and provider-boundary finalization

This PR consolidates a coordinated batch of fixes around autonomy run/flow lifecycle, scheduled task deduplication, provider-boundary state finalization, and matching memory-bound treatments for adjacent long-running subsystems (REPL fullscreen scrollback, skill-search/skill-learning runtime activation). All changes were developed and reviewed together because they touched the same lifecycle invariants and were uncovered by the same long-running session reproductions.

Lifecycle correctness

  • Queued autonomy prompts are not injected unless the persisted run was successfully claimed; queued run claiming is now terminal-safe so a once-consumed/cancelled/failed run can not slip back into queued.
  • Autonomy run/flow finalization happens on completion, provider error, generator close, and cancellation — not just the happy path. New src/__tests__/queryAutonomyProviderBoundary.test.ts covers these provider-boundary transitions.
  • requestManagedAutonomyFlowCancel and resumeManagedAutonomyFlowPrompt carry rootDir and currentDir explicitly across detached async boundaries (proactive-tick, cron, daemon restart) instead of inferring from process state.
  • Active runs/flows are protected from janitor pruning so a running step can not be garbage-collected mid-flight (src/utils/autonomyAuthority.ts).
  • Heartbeat parser now ignores fenced code blocks; the two-phase commit window for autonomy state transitions is documented in docs/internals/autonomy-jira.md.

Ownership and dedup

  • src/utils/autonomyRuns.ts: ownership stamping (run id + rootDir carried end-to-end), source-based dedup against active runs.
  • src/hooks/useScheduledTasks.ts: scheduled ticks deduplicate against runs already active on the same source label.
  • src/utils/processUserInput/processSlashCommand.tsx: forked slash commands now thread the autonomy runId so completion finalizers can find the originating run for deferred completion.
  • New src/utils/autonomyQueueLifecycle.ts and tests collect the queue-side lifecycle invariants in one place.

Memory bounds (related, same review pass)

  • src/screens/REPL.tsx: caps fullscreen scrollback after the compact boundary and updates trailing progress rows in place. Long-running fullscreen sessions could otherwise retain thousands of post-compaction messages and duplicate progress rows, keeping Ink trees alive long after their useful context had moved on.
  • src/services/skillSearch/* and src/services/skillLearning/*: runtime activation is strictly opt-in via existing env toggles; session caches are capped so long-running processes can not grow them forever. Build presence is preserved so operators can still discover and opt into the slash commands.

CI / test contract

  • tests/integration/dependency-overrides.test.ts: smoke test no longer drives Mermaid's browser renderer; it validates the package-resolution contract directly so CI does not regress on unrelated browser timing.
  • New tests/integration/autonomy-lifecycle-user-flow.test.ts: end-to-end CLI subprocess flow exercising status --deep, flows, flow <id>, flow resume, flow cancel against persisted state.
  • src/entrypoints/cli.tsx: claude autonomy … routes through an entrypoint fast path that reuses the slash-command formatter without booting the full interactive CLI. Stdout is flushed before forced exit so coverage subprocesses do not terminate with empty stdout.
  • packages/builtin-tools/src/tools/RemoteTriggerTool/__tests__/RemoteTriggerTool.test.ts: stabilized to prevent audit flake under coverage.

Tests added

  • src/__tests__/queryAutonomyProviderBoundary.test.ts
  • src/hooks/__tests__/useScheduledTasks.test.ts
  • src/utils/__tests__/autonomyAuthority.test.ts
  • src/utils/__tests__/autonomyFlows.test.ts (extended)
  • src/utils/__tests__/autonomyPersistence.test.ts (extended)
  • src/utils/__tests__/autonomyQueueLifecycle.test.ts
  • src/utils/__tests__/autonomyRuns.test.ts (extended)
  • src/utils/processUserInput/__tests__/processSlashCommand.test.ts
  • tests/integration/autonomy-lifecycle-user-flow.test.ts

Docs

  • docs/agent/sur-loop-scheduled-oom.md: System Understanding Report covering the scheduled/loop OOM problem, the call graphs investigated, and the lifecycle invariants this PR establishes.
  • docs/agent/sur-skill-overflow-bugs.md: SUR for the related skill-overflow context.
  • docs/internals/autonomy-jira.md: documents the two-phase commit window and ownership stamping invariants.
  • docs/memory-leak-audit.md: audit notes covering the REPL/scrollback and skill-search bounds.

Invariants this PR establishes

  1. Queued autonomy prompts are not injected unless the persisted run was successfully claimed.
  2. Terminal run/flow states are terminal — completion, failure, and cancellation all finalize state regardless of which provider/error path triggered them.
  3. Autonomy run/flow rootDir is carried explicitly across detached async boundaries instead of inferred from a shared singleton.
  4. State-only CLI subcommands (autonomy status|runs|flows|flow …) bypass full interactive bootstrap so they do not hold unrelated handles open.
  5. REPL fullscreen scrollback and skill-search/skill-learning session caches are explicitly bounded.

Validation

bun run typecheck
CI=true GITHUB_ACTIONS=true bun test            # 3996 pass / 0 fail across 305 files
bun test src/__tests__/queryAutonomyProviderBoundary.test.ts \
         src/hooks/__tests__/useScheduledTasks.test.ts \
         src/utils/__tests__/autonomy{Runs,Flows,Authority,QueueLifecycle,Persistence}.test.ts \
         src/utils/processUserInput/__tests__/processSlashCommand.test.ts \
         tests/integration/autonomy-lifecycle-user-flow.test.ts

Origin

This PR is the consolidated, upstream-targeted version of two fork-side review PRs (fix/loop-scheduled-autonomy-oom and fix/autonomy-lifecycle). The fork-side review history is preserved at amDosion#7 . The fork's own internal chore: keep fork current with upstream sync commits and the docs: update contributors automation are intentionally not included in this PR.

The autonomy CLI handler rootDir threading that the fork added (78f64d8, 98d04dd) is intentionally omitted here because upstream a2cfaf91 (fix: 修复 RemoteTriggerTool 和 autonomy 测试的全量运行失败) already performed the equivalent change with an additional currentDir option. Keeping the upstream version avoids regressing that improvement.

Summary by CodeRabbit

  • Bug Fixes

    • Fixed OOM/stale “zombie” autonomy runs, improved ownership tracking, stale-run recovery, and reliable finalization across long-lived sessions.
  • New Features

    • Claim/consume autonomy queue lifecycle with consumable-claiming and consolidated turn finalization.
    • Bounded caches to prevent unbounded growth in skill-search/learning.
    • CLI autonomy fast-path and deferred-completion for background slash-commands.
  • Documentation

    • Added system-understanding and reliability reports.
  • Tests

    • Expanded unit and integration tests for autonomy lifecycle, queueing, recovery, and scheduled-task behavior.

…nalization

This PR consolidates a coordinated batch of fixes around autonomy run/flow lifecycle, scheduled task deduplication, provider-boundary state finalization, and matching memory-bound treatments for adjacent long-running subsystems (REPL fullscreen scrollback, skill-search/skill-learning runtime activation). All changes were developed and reviewed together because they touched the same lifecycle invariants and were uncovered by the same long-running session reproductions.

## Lifecycle correctness

- Queued autonomy prompts are not injected unless the persisted run was successfully claimed; queued run claiming is now terminal-safe so a once-consumed/cancelled/failed run can not slip back into `queued`.
- Autonomy run/flow finalization happens on completion, provider error, generator close, and cancellation — not just the happy path. New `src/__tests__/queryAutonomyProviderBoundary.test.ts` covers these provider-boundary transitions.
- `requestManagedAutonomyFlowCancel` and `resumeManagedAutonomyFlowPrompt` carry `rootDir` and `currentDir` explicitly across detached async boundaries (proactive-tick, cron, daemon restart) instead of inferring from process state.
- Active runs/flows are protected from janitor pruning so a running step can not be garbage-collected mid-flight (`src/utils/autonomyAuthority.ts`).
- Heartbeat parser now ignores fenced code blocks; the two-phase commit window for autonomy state transitions is documented in `docs/internals/autonomy-jira.md`.

## Ownership and dedup

- `src/utils/autonomyRuns.ts`: ownership stamping (run id + rootDir carried end-to-end), source-based dedup against active runs.
- `src/hooks/useScheduledTasks.ts`: scheduled ticks deduplicate against runs already active on the same source label.
- `src/utils/processUserInput/processSlashCommand.tsx`: forked slash commands now thread the autonomy `runId` so completion finalizers can find the originating run for deferred completion.
- New `src/utils/autonomyQueueLifecycle.ts` and tests collect the queue-side lifecycle invariants in one place.

## Memory bounds (related, same review pass)

- `src/screens/REPL.tsx`: caps fullscreen scrollback after the compact boundary and updates trailing progress rows in place. Long-running fullscreen sessions could otherwise retain thousands of post-compaction messages and duplicate progress rows, keeping Ink trees alive long after their useful context had moved on.
- `src/services/skillSearch/*` and `src/services/skillLearning/*`: runtime activation is strictly opt-in via existing env toggles; session caches are capped so long-running processes can not grow them forever. Build presence is preserved so operators can still discover and opt into the slash commands.

## CI / test contract

- `tests/integration/dependency-overrides.test.ts`: smoke test no longer drives Mermaid's browser renderer; it validates the package-resolution contract directly so CI does not regress on unrelated browser timing.
- New `tests/integration/autonomy-lifecycle-user-flow.test.ts`: end-to-end CLI subprocess flow exercising `status --deep`, `flows`, `flow <id>`, `flow resume`, `flow cancel` against persisted state.
- `src/entrypoints/cli.tsx`: `claude autonomy …` routes through an entrypoint fast path that reuses the slash-command formatter without booting the full interactive CLI. Stdout is flushed before forced exit so coverage subprocesses do not terminate with empty stdout.
- `packages/builtin-tools/src/tools/RemoteTriggerTool/__tests__/RemoteTriggerTool.test.ts`: stabilized to prevent audit flake under coverage.

## Tests added

- `src/__tests__/queryAutonomyProviderBoundary.test.ts`
- `src/hooks/__tests__/useScheduledTasks.test.ts`
- `src/utils/__tests__/autonomyAuthority.test.ts`
- `src/utils/__tests__/autonomyFlows.test.ts` (extended)
- `src/utils/__tests__/autonomyPersistence.test.ts` (extended)
- `src/utils/__tests__/autonomyQueueLifecycle.test.ts`
- `src/utils/__tests__/autonomyRuns.test.ts` (extended)
- `src/utils/processUserInput/__tests__/processSlashCommand.test.ts`
- `tests/integration/autonomy-lifecycle-user-flow.test.ts`

## Docs

- `docs/agent/sur-loop-scheduled-oom.md`: System Understanding Report covering the scheduled/loop OOM problem, the call graphs investigated, and the lifecycle invariants this PR establishes.
- `docs/agent/sur-skill-overflow-bugs.md`: SUR for the related skill-overflow context.
- `docs/internals/autonomy-jira.md`: documents the two-phase commit window and ownership stamping invariants.
- `docs/memory-leak-audit.md`: audit notes covering the REPL/scrollback and skill-search bounds.

## Invariants this PR establishes

1. Queued autonomy prompts are not injected unless the persisted run was successfully claimed.
2. Terminal run/flow states are terminal — completion, failure, and cancellation all finalize state regardless of which provider/error path triggered them.
3. Autonomy run/flow `rootDir` is carried explicitly across detached async boundaries instead of inferred from a shared singleton.
4. State-only CLI subcommands (`autonomy status|runs|flows|flow …`) bypass full interactive bootstrap so they do not hold unrelated handles open.
5. REPL fullscreen scrollback and skill-search/skill-learning session caches are explicitly bounded.

## Validation

```bash
bun run typecheck
CI=true GITHUB_ACTIONS=true bun test            # 3996 pass / 0 fail across 305 files
bun test src/__tests__/queryAutonomyProviderBoundary.test.ts \
         src/hooks/__tests__/useScheduledTasks.test.ts \
         src/utils/__tests__/autonomy{Runs,Flows,Authority,QueueLifecycle,Persistence}.test.ts \
         src/utils/processUserInput/__tests__/processSlashCommand.test.ts \
         tests/integration/autonomy-lifecycle-user-flow.test.ts
```

## Origin

This PR is the consolidated, upstream-targeted version of two fork-side review PRs (fix/loop-scheduled-autonomy-oom and fix/autonomy-lifecycle). The fork-side review history is preserved at #7 . The fork's own internal `chore: keep fork current with upstream` sync commits and the `docs: update contributors` automation are intentionally not included in this PR.

The autonomy CLI handler `rootDir` threading that the fork added (78f64d8, 98d04dd) is intentionally omitted here because upstream `a2cfaf91` (fix: 修复 RemoteTriggerTool 和 autonomy 测试的全量运行失败) already performed the equivalent change with an additional `currentDir` option. Keeping the upstream version avoids regressing that improvement.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 29, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Implements a queue-based autonomy lifecycle: claim/partition/finalize APIs, persisted run ownership and stale-run recovery, deferred completion for background-forked slash commands, bounded skill-system caches and feature-gate separation, extensive tests, and related docs.

Changes

Cohort / File(s) Summary
Autonomy core & lifecycle
src/utils/autonomyQueueLifecycle.ts, src/utils/autonomyRuns.ts, src/utils/autonomyPersistence.ts
Adds partition/claim/finalize APIs, ownerProcessId/ownerSessionId persisted, stale-run recovery (auto-fail), dedupe by ownerKey, conditional create/commit helpers, and a test-only persistence-lock count helper.
Turn execution & integration
src/query.ts, src/query/transitions.ts, src/utils/handlePromptSubmit.ts, src/screens/REPL.tsx, src/cli/print.ts
Switches to claim/consumable queued-command model, centralizes finalization via finalizeAutonomyCommandsForTurn, updates transition discriminated unions and onQuery/handlePromptSubmit to return execution boolean.
Slash-command / user input plumbing
src/utils/processUserInput/processSlashCommand.tsx, src/utils/processUserInput/processUserInput.ts, src/Tool.ts
Threads autonomy through slash handling, adds deferAutonomyCompletion handshake for background forks, and exposes test-only allowBackgroundForkedSlashCommands option on ToolUseContext.
Scheduled / proactive task handling
src/hooks/useScheduledTasks.ts, src/proactive/useProactive.ts, src/hooks/__tests__/useScheduledTasks.test.ts
Adds createScheduledTaskQueuedCommand with shouldCreate gating, disposed/generating guards, safer enqueue/cancel semantics, and tests for dedupe/behavior.
Skill system bounds & feature flags
src/services/skillSearch/..., src/services/skillLearning/..., src/commands/skill-search/index.ts, src/commands/skill-learning/index.ts, scripts/defines.ts
Adds bounded LRU/FIFO caches, splits compile-time (is*CompiledIn) vs runtime (is*Enabled) checks, and enables experimental features in DEFAULT_BUILD_FEATURES.
Teammate / swarm run plumbing
src/tasks/InProcessTeammateTask/..., src/utils/swarm/..., src/utils/swarm/inProcessRunner.ts, src/utils/swarm/spawnInProcess.ts
Propagates optional autonomyRootDir through teammate messages/runner and uses it in run lifecycle updates and failure handling.
Persistence, cleanup & daemon
src/services/compact/postCompactCleanup.ts, src/daemon/main.ts, src/entrypoints/cli.tsx
Makes post-compact cleanup synchronous (removes LSP awaits), tracks worker restart timers with kill grace, and adds an autonomy CLI fast-path while awaiting main().
Types & queue payloads
src/types/textInputTypes.ts, src/tasks/InProcessTeammateTask/types.ts
Adds optional rootDir / autonomyRootDir fields to queued/autonomy-related types to carry provenance.
Model provider & highlighting
src/utils/model/providers.ts, packages/color-diff-napi/src/index.ts, src/utils/model/__tests__/providers.test.ts
Adds DI param to getAPIProvider(settings?), adjusts tests to pass settings, and switches highlight.js import approach while preserving lazy interop.
Tests & mocks (many)
src/utils/__tests__/*, src/__tests__/*, tests/integration/*, packages/builtin-tools/.../RemoteTriggerTool.test.ts, tests/mocks/auth.ts
Large new/updated test coverage: queue lifecycle, runs, flows, persistence locks, scheduled-task concurrency, background-fork behavior; adds shared auth mock.
Documentation
docs/agent/sur-loop-scheduled-oom.md, docs/agent/sur-skill-overflow-bugs.md, docs/internals/autonomy-jira.md
Adds System Understanding Reports and an autonomy reliability Jira index documenting failures, fixes, tests, and verification steps.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as CLI/REPL
    participant Query as query()
    participant QLC as autonomyQueueLifecycle
    participant Runs as autonomyRuns
    participant Persist as persistence

    CLI->>Query: start turn (may include queuedCommands)
    Query->>QLC: claimConsumableQueuedAutonomyCommands(commands, rootDir)
    QLC->>Runs: inspect runs (ownerProcessId / isProcessRunning)
    Runs-->>QLC: consumable vs stale partition
    QLC->>Runs: atomically mark claimed runs running (set owner meta)
    QLC-->>Query: return claimed attachmentCommands
    Query->>Query: execute turn / onQuery (returns didExecute:boolean)
    Query->>QLC: finalizeAutonomyCommandsForTurn(outcome, claimedCommands, currentDir)
    QLC->>Runs: apply completed/failed/cancelled transitions
    Runs->>Persist: persist runs.json updates
    QLC-->>Query: return nextCommands
    Query-->>CLI: deliver outcome + follow-ups
Loading
sequenceDiagram
    participant REPL as REPL/CLI
    participant Handle as handlePromptSubmit
    participant PSI as processUserInput
    participant PSC as processSlashCommand
    participant Fork as BackgroundFork
    participant Runs as autonomyRuns
    participant QLC as autonomyQueueLifecycle

    REPL->>Handle: handle queuedCommands
    Handle->>QLC: claimConsumableQueuedAutonomyCommands
    Handle->>PSI: processUserInput(autonomy)
    PSI->>PSC: processSlashCommand(autonomy)
    alt background-forked (defer)
        PSC->>Fork: spawn background fork (KAIROS or test-only flag)
        Fork->>Runs: finalizeAutonomyRunCompleted/Failed (passes rootDir)
        Fork-->>PSC: returns deferAutonomyCompletion: true
    else inline
        PSC-->>PSI: inline result (no defer)
    end
    PSI-->>Handle: result + deferAutonomyCompletion flag
    Handle->>QLC: finalizeAutonomyCommandsForTurn(with deferred filtering)
    Handle-->>REPL: finish turn
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~150 minutes

Possibly related PRs

Suggested reviewers

  • KonghaYao

"I hopped through queues and patched each seam,
Stale runs now wake from sleepy dream.
Flags split tidy—compile and run,
Bounded caches stop growth, one by one.
Tests hum loud; the lifecycle sings—hip, hop, hooray!"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.30% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly and concisely summarizes the main changes: hardening autonomy lifecycle, adding OOM bounds, and improving provider-boundary finalization. It is specific, relates directly to the changeset, and conveys the primary improvements without being vague or misleading.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/cli/print.ts (1)

2827-2840: ⚠️ Potential issue | 🟠 Major

Use active-source dedup in the onFire cron path too.

This branch still goes through prepareAutonomyTurnPrompt() + commitAutonomyQueuedPrompt(), so repeated cron fires with the same prompt can stack while an earlier run is still active. That bypasses the new source-dedup invariant the rest of this PR is adding. Please switch this path to createAutonomyQueuedPromptIfNoActiveSource(...) and use the prompt as the source label.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cli/print.ts` around lines 2827 - 2840, The onFire cron callback
currently calls prepareAutonomyTurnPrompt(...) and
commitAutonomyQueuedPrompt(...), which can enqueue duplicate scheduled prompts
while a previous run is active; change this flow to call
createAutonomyQueuedPromptIfNoActiveSource(...) instead to enforce active-source
deduping, passing the prepared prompt (or the original prompt string) as the
source label and including currentDir: cwd() and workload: WORKLOAD_CRON; remove
or replace the commitAutonomyQueuedPrompt(...) invocation and ensure you still
call prepareAutonomyTurnPrompt(...) to build the prompt before calling
createAutonomyQueuedPromptIfNoActiveSource(...).
🧹 Nitpick comments (8)
src/services/compact/postCompactCleanup.ts (1)

31-74: Sync signature now hides async cleanup work

runPostCompactCleanup is now void, but Line 72-74 still launches async cleanup (import(...).then(...)). That makes completion ordering ambiguous for callers expecting “cleanup finished” on return. Consider making this path truly synchronous (or restoring an async contract).

♻️ Proposed change (make cleanup truly synchronous)
 import { resetMicrocompactState } from './microCompact.js'
+import { sweepFileContentCache } from '../../utils/attributionHooks.js'
@@
 export function runPostCompactCleanup(querySource?: QuerySource): void {
@@
   if (feature('COMMIT_ATTRIBUTION')) {
-    void import('../../utils/attributionHooks.js').then(m =>
-      m.sweepFileContentCache(),
-    )
+    sweepFileContentCache()
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/services/compact/postCompactCleanup.ts` around lines 31 - 74,
runPostCompactCleanup currently returns void but launches async work via
import('../../utils/attributionHooks.js').then(...), which hides completion
ordering; change the function signature to async (export async function
runPostCompactCleanup(...): Promise<void>) and replace the .then(...) call with
an awaited dynamic import and awaited sweep (const m = await import(...); await
m.sweepFileContentCache()), ensuring the function only returns after
sweepFileContentCache completes so callers can await completion.
src/utils/autonomyFlows.ts (1)

170-187: Consider documenting the two-phase sort intent.

The double-sort (first for priority selection, then back to updatedAt order) is correct but non-obvious. A brief comment would clarify:

 function selectPersistedAutonomyFlows(
   flows: AutonomyFlowRecord[],
 ): AutonomyFlowRecord[] {
+  // Phase 1: Sort active flows first (so they survive the cap), then by recency
   const retained = flows
     .slice()
     .map(cloneFlowRecord)
     .sort((left, right) => {
       const leftActive = isManagedFlowStatusActive(left.status)
       const rightActive = isManagedFlowStatusActive(right.status)
       if (leftActive !== rightActive) {
         return leftActive ? -1 : 1
       }
       return right.updatedAt - left.updatedAt
     })
     .slice(0, AUTONOMY_FLOWS_MAX)

+  // Phase 2: Re-sort retained flows by updatedAt for consistent persistence order
   return retained.sort((left, right) => right.updatedAt - left.updatedAt)
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/autonomyFlows.ts` around lines 170 - 187, The double-sort in
selectPersistedAutonomyFlows is non-obvious: first it sorts by active status
(using isManagedFlowStatusActive) then by updatedAt to pick the top
AUTONOMY_FLOWS_MAX into retained, and then it re-sorts retained by updatedAt for
final ordering; add a concise comment above selectPersistedAutonomyFlows (or
immediately above the retained creation) explaining this two-phase intent —
i.e., prioritize active flows when selecting the subset, then present the chosen
subset sorted by most-recent updatedAt — and reference the retained variable,
AUTONOMY_FLOWS_MAX, and isManagedFlowStatusActive in the comment.
src/services/skillLearning/featureCheck.ts (1)

11-14: Consider simplifying the return logic.

The current pattern works but could be more concise:

✨ Optional simplification
 export function isSkillLearningCompiledIn(): boolean {
-  if (feature('SKILL_LEARNING')) return true
-  return false
+  return feature('SKILL_LEARNING') ? true : false
 }

Note: The ternary feature('FLAG') ? true : false is compliant with the coding guidelines for feature flag usage in direct conditional positions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/services/skillLearning/featureCheck.ts` around lines 11 - 14, The
function isSkillLearningCompiledIn currently returns true/false via an if
statement; simplify it by returning the feature check result directly (replace
the if/return block in isSkillLearningCompiledIn with a single return of the
call to feature('SKILL_LEARNING') so it returns the boolean from feature
directly).
src/services/skillSearch/featureCheck.ts (1)

11-14: Same simplification opportunity as skillLearning.

✨ Optional simplification
 export function isSkillSearchCompiledIn(): boolean {
-  if (feature('EXPERIMENTAL_SKILL_SEARCH')) return true
-  return false
+  return feature('EXPERIMENTAL_SKILL_SEARCH') ? true : false
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/services/skillSearch/featureCheck.ts` around lines 11 - 14, The
isSkillSearchCompiledIn function currently uses an if-return-true pattern;
simplify it by returning the boolean result of
feature('EXPERIMENTAL_SKILL_SEARCH') directly. Update the function
isSkillSearchCompiledIn to return feature('EXPERIMENTAL_SKILL_SEARCH') without
the conditional wrapper so the code is more concise and idiomatic.
tests/integration/autonomy-lifecycle-user-flow.test.ts (1)

22-48: Consider adding error output to assertion messages for debugging.

When tests fail, knowing what stderr contained can be helpful for debugging CI failures.

🔍 Optional: Include stderr in failure message
-  expect(stderr).toBe('')
-  expect(exitCode).toBe(0)
+  expect(stderr).toBe('') // Fail fast with actual stderr visible
+  expect(exitCode).withContext(`stderr: ${stderr}`).toBe(0)

Note: This depends on whether Bun's expect supports withContext. If not, consider logging stderr on failure.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration/autonomy-lifecycle-user-flow.test.ts` around lines 22 - 48,
The test helper runAutonomyCli currently asserts stderr and exitCode without
surfacing the stderr content on failure; update the assertions in runAutonomyCli
to include stderr (and optionally exitCode) in the failure message — e.g., use
expect(stderr).toBe('', `autonomy CLI stderr: ${stderr}`) or
expect(...).withContext if supported, and similarly include stdout/exitCode in
messages or console.error(stderr) on failure so CI logs show the error output
for debugging; locate these changes in the runAutonomyCli function and adjust
the two expect calls that reference stderr and exitCode.
docs/agent/sur-skill-overflow-bugs.md (1)

81-89: Add language identifier to fenced code block.

The code block should specify a language for consistent markdown rendering and linting compliance.

📝 Proposed fix
-```
+```bash
 bun run typecheck   # 0 errors
 bun test src/services/skillSearch/__tests__/intentNormalize.test.ts
 bun test src/services/skillSearch/__tests__/prefetch.extractQuery.test.ts
 bun test src/services/skillLearning/__tests__/projectContext.test.ts
 bun test src/services/skillLearning/__tests__/promotion.test.ts
 bun run lint
 bun run build
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/agent/sur-skill-overflow-bugs.md` around lines 81 - 89, Update the
fenced code block that contains the bun commands to include a language
identifier for consistent rendering and linting; replace the opening ``` with
```bash (or ```sh) so the block that starts with "bun run typecheck   # 0
errors" becomes a bash-marked fenced block.
src/utils/__tests__/autonomyRuns.test.ts (1)

2-4: Avoid new Node built-in imports in Bun tests.

These fs/path imports bypass the repo's Bun-only runtime contract. Please switch this file to Bun file helpers or the existing test filesystem helpers so the test exercises the same surface area as production.

Based on learnings, "All imports, builds, and execution must use Bun APIs, not Node.js APIs. Runtime is Bun, not Node.js."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/__tests__/autonomyRuns.test.ts` around lines 2 - 4, The test
imports Node built-ins (existsSync, readFileSync, mkdir, writeFile, join,
resolvePath); replace those with Bun-compatible helpers so the test runs in the
Bun-only runtime: remove the Node fs/path imports and use the repo's Bun test
filesystem helpers or Bun APIs (e.g., Bun.file/Bun.read/Bun.write or the
existing test helper functions) to create/read/write files and dirs and to
resolve/join paths; update references to
existsSync/readFileSync/mkdir/writeFile/join/resolvePath in this test to call
the chosen Bun helper functions or the repo test helpers so no Node built-ins
are imported or used.
src/utils/processUserInput/__tests__/processSlashCommand.test.ts (1)

71-73: Use the shared bun:bundle mock instead of redefining it here.

This inline stub can drift from the repo-wide feature-flag mock contract. Please move it to tests/mocks/ and import the shared mock so future bun:bundle changes stay synchronized across suites.

Based on learnings, "Mock only modules with side effects ... Use shared mocks from tests/mocks/ for log.ts, debug.ts, bun:bundle, settings.js, config.ts, and auth.ts."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/processUserInput/__tests__/processSlashCommand.test.ts` around
lines 71 - 73, Replace the inline stubbed module mock for bun:bundle (the call
to mock.module('bun:bundle', () => ({ feature: (name: string) => name ===
'KAIROS', }))) with the shared mock from tests/mocks: remove this local
mock.module declaration and import/require the repository-wide bun:bundle mock
so the test uses the centralized feature-flag behavior; ensure the test file
references the shared mock (same contract exposing feature(name: string))
instead of redefining the KAIROS predicate locally.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/internals/autonomy-jira.md`:
- Around line 262-266: Fix the minor typo in the sentence containing "dropped
due input close" by changing it to a grammatically correct phrase such as
"dropped due to input closing" (or "dropped due to input closure"); locate the
sentence that currently reads "Headless proactive and cron paths cancel any
already-created command that is dropped due input close." and replace that
fragment with "dropped due to input closing" to correct the grammar.

In
`@packages/builtin-tools/src/tools/RemoteTriggerTool/__tests__/RemoteTriggerTool.test.ts`:
- Around line 15-19: Replace the inline mock defined via
mock.module('src/utils/auth.js', ...) with the shared auth mock from
tests/mocks/, and if the test requires different exports
(checkAndRefreshOAuthTokenIfNeeded, getClaudeAIOAuthTokens,
isClaudeAISubscriber) update the shared mock file to include those
implementations; in practice remove the inline mock.module(...) block in
RemoteTriggerTool.test.ts and import/require the common mock from
tests/mocks/auth (extending that file to return the needed async stub and
token/subscriber helpers) so all tests reuse the single shared mock.
- Around line 33-43: Remove the full mock of 'src/constants/oauth.js' in
RemoteTriggerTool.test.ts and use the real constants module instead; locate the
mock.module(...) block that overrides ALL_OAUTH_SCOPES,
CLAUDE_AI_INFERENCE_SCOPE, CLAUDE_AI_OAUTH_SCOPES, CLAUDE_AI_PROFILE_SCOPE,
CONSOLE_OAUTH_SCOPES, MCP_CLIENT_METADATA_URL, OAUTH_BETA_HEADER,
fileSuffixForOauthConfig, and getOauthConfig and delete it so tests import the
actual constants, or if the real module has a side effect, replace the full mock
with a narrow mock only for the specific side-effectful exports (e.g., mock
getOauthConfig or fileSuffixForOauthConfig) while leaving the pure data exports
intact.

In `@src/cli/print.ts`:
- Around line 2841-2845: The code currently calls markAutonomyRunCancelled(...)
when inputClosed is true, which only updates run state and leaves the queued
command record; replace those calls so the queued command is cancelled instead
by invoking cancelQueuedAutonomyCommands({ commands: [command] }) wherever the
inputClosed branches currently call markAutonomyRunCancelled (e.g., in the
handlers using the local variable command); ensure you import/ensure
availability of cancelQueuedAutonomyCommands and remove or stop calling
markAutonomyRunCancelled in those branches so the queue-side record is
retracted.

In `@src/screens/REPL.tsx`:
- Around line 4859-4876: The code currently always finalizes
claim.claimedCommands as completed after calling onQuery, which is wrong if
onQuery returned via the concurrent-guard path; change the call to capture an
execution indicator (e.g., const executed = await onQuery([userMessage],
newAbortController, true, [], mainLoopModel) or have onQuery return a
boolean/enum) and only call finalizeAutonomyCommandsForTurn(..., outcome: {
type: 'completed' }) when executed is true; also move the completed-finalize
into its own try/catch so any error thrown by finalizeAutonomyCommandsForTurn
does not propagate into the outer catch and trigger a second finalize with
outcome: { type: 'failed' } — in the outer catch use the execution indicator to
decide whether to call finalizeAutonomyCommandsForTurn(..., outcome: { type:
'failed', error }) for claim.claimedCommands, and keep enqueue(nextCommand)
behavior unchanged when completed.
- Around line 4853-4855: The code is unsafely forcing command.value to string
when creating the userMessage; update the createUserMessage call (where
userMessage is constructed) to stop casting: use content: command.value instead,
or if you truly need a string-only path add a type guard (e.g., if (typeof
command.value !== 'string') return; ) before calling createUserMessage so
content is passed as a string safely; reference createUserMessage, command, and
userMessage when making the change.

In `@src/utils/autonomyAuthority.ts`:
- Around line 148-155: The fence toggling logic (matching only by character) can
prematurely close an outer fence when encountering a shorter inner fence like
```yaml, leaking in-fence config; fix by detecting multi-char fences and
tracking their length: change the regex to capture a run of the same fence char
(e.g., /^([`~])\1{2,}(.*)$/), compute fenceLen and trailing content from
trimmed, when opening set activeFenceChar and activeFenceLen, and when closing
only clear the fence if fenceChar matches AND fenceLen >= activeFenceLen AND
trailing.trim() === '' (then reset activeFenceLen to 0); keep the existing
masked[i] = '' and continue behavior.

In `@src/utils/autonomyRuns.ts`:
- Around line 254-264: The PID validity check in isValidOwnerProcessId wrongly
rejects the maximum valid Linux PID by using pid < 4_194_304; change the upper
bound to allow the maximum (use pid <= 4_194_304 or pid <= 4194304) so that a
live run owned by pid==4194304 is considered valid; update the numeric literal
(or replace with a named constant like MAX_PID) in isValidOwnerProcessId to
reflect the inclusive maximum.
- Around line 130-147: selectPersistedAutonomyRuns currently enforces
AUTONOMY_RUNS_MAX before preserving active runs, which can evict queued/running
records; change the logic in selectPersistedAutonomyRuns to first separate
cloned records into active (isAutonomyRunActive) and inactive lists, sort both
by createdAt descending, always retain all active runs, then append the most
recent inactive runs up to AUTONOMY_RUNS_MAX - active.length (if that remaining
budget is <= 0, keep only active), and finally return the combined list sorted
by createdAt descending; refer to the function selectPersistedAutonomyRuns,
helper cloneRunRecord, isAutonomyRunActive, and the AUTONOMY_RUNS_MAX constant
when making the change.

In `@src/utils/processUserInput/processSlashCommand.tsx`:
- Around line 203-216: In finalizeDeferredAutonomyRunCompleted, the follow-up
autonomy commands returned by finalizeAutonomyRunCompleted are being enqueued
before the worker's scheduled-task-result is queued, which can advance the flow
out of order; change the sequence so you first create/queue the
scheduled-task-result notification (the background result for
autonomy.runId/rootDir/spawnTimeWorkload) and only after that loop over
nextCommands and call enqueue(nextCommand). Apply the same fix to the other
occurrence mentioned (lines around the second finalizeAutonomyRunCompleted call)
so the result is always enqueued before any follow-up commands.

In `@tests/integration/dependency-overrides.test.ts`:
- Around line 94-96: Replace uses of
streamdownRequire.resolve('mermaid/package.json') with a filesystem-based
lookup: call streamdownRequire.resolve('mermaid') to get the package entrypoint,
compute the package root with path.dirname on that result, then read the
package.json via fs.readFileSync or fs.promises.readFile from
path.join(packageRoot, 'package.json'). Update the mermaidPackagePath usage (and
the similar occurrences around the other
streamdownRequire.resolve('mermaid/package.json') calls) to use this packageRoot
+ 'package.json' lookup so you avoid ERR_PACKAGE_PATH_NOT_EXPORTED in
environments that honor package exports.
- Line 2: The test currently imports readFileSync from node:fs and uses it to
read mermaid/package.json; replace that Node API usage with Bun's async file API
by removing the readFileSync import and changing the call at the
mermaidPackagePath usage to await Bun.file(mermaidPackagePath).text(); keep the
existing async test signature and leave other fs imports (mkdtempSync,
writeFileSync, rmSync) intact so only the read operation switches to
Bun.file(...).text() for compliance with Bun APIs.

---

Outside diff comments:
In `@src/cli/print.ts`:
- Around line 2827-2840: The onFire cron callback currently calls
prepareAutonomyTurnPrompt(...) and commitAutonomyQueuedPrompt(...), which can
enqueue duplicate scheduled prompts while a previous run is active; change this
flow to call createAutonomyQueuedPromptIfNoActiveSource(...) instead to enforce
active-source deduping, passing the prepared prompt (or the original prompt
string) as the source label and including currentDir: cwd() and workload:
WORKLOAD_CRON; remove or replace the commitAutonomyQueuedPrompt(...) invocation
and ensure you still call prepareAutonomyTurnPrompt(...) to build the prompt
before calling createAutonomyQueuedPromptIfNoActiveSource(...).

---

Nitpick comments:
In `@docs/agent/sur-skill-overflow-bugs.md`:
- Around line 81-89: Update the fenced code block that contains the bun commands
to include a language identifier for consistent rendering and linting; replace
the opening ``` with ```bash (or ```sh) so the block that starts with "bun run
typecheck   # 0 errors" becomes a bash-marked fenced block.

In `@src/services/compact/postCompactCleanup.ts`:
- Around line 31-74: runPostCompactCleanup currently returns void but launches
async work via import('../../utils/attributionHooks.js').then(...), which hides
completion ordering; change the function signature to async (export async
function runPostCompactCleanup(...): Promise<void>) and replace the .then(...)
call with an awaited dynamic import and awaited sweep (const m = await
import(...); await m.sweepFileContentCache()), ensuring the function only
returns after sweepFileContentCache completes so callers can await completion.

In `@src/services/skillLearning/featureCheck.ts`:
- Around line 11-14: The function isSkillLearningCompiledIn currently returns
true/false via an if statement; simplify it by returning the feature check
result directly (replace the if/return block in isSkillLearningCompiledIn with a
single return of the call to feature('SKILL_LEARNING') so it returns the boolean
from feature directly).

In `@src/services/skillSearch/featureCheck.ts`:
- Around line 11-14: The isSkillSearchCompiledIn function currently uses an
if-return-true pattern; simplify it by returning the boolean result of
feature('EXPERIMENTAL_SKILL_SEARCH') directly. Update the function
isSkillSearchCompiledIn to return feature('EXPERIMENTAL_SKILL_SEARCH') without
the conditional wrapper so the code is more concise and idiomatic.

In `@src/utils/__tests__/autonomyRuns.test.ts`:
- Around line 2-4: The test imports Node built-ins (existsSync, readFileSync,
mkdir, writeFile, join, resolvePath); replace those with Bun-compatible helpers
so the test runs in the Bun-only runtime: remove the Node fs/path imports and
use the repo's Bun test filesystem helpers or Bun APIs (e.g.,
Bun.file/Bun.read/Bun.write or the existing test helper functions) to
create/read/write files and dirs and to resolve/join paths; update references to
existsSync/readFileSync/mkdir/writeFile/join/resolvePath in this test to call
the chosen Bun helper functions or the repo test helpers so no Node built-ins
are imported or used.

In `@src/utils/autonomyFlows.ts`:
- Around line 170-187: The double-sort in selectPersistedAutonomyFlows is
non-obvious: first it sorts by active status (using isManagedFlowStatusActive)
then by updatedAt to pick the top AUTONOMY_FLOWS_MAX into retained, and then it
re-sorts retained by updatedAt for final ordering; add a concise comment above
selectPersistedAutonomyFlows (or immediately above the retained creation)
explaining this two-phase intent — i.e., prioritize active flows when selecting
the subset, then present the chosen subset sorted by most-recent updatedAt — and
reference the retained variable, AUTONOMY_FLOWS_MAX, and
isManagedFlowStatusActive in the comment.

In `@src/utils/processUserInput/__tests__/processSlashCommand.test.ts`:
- Around line 71-73: Replace the inline stubbed module mock for bun:bundle (the
call to mock.module('bun:bundle', () => ({ feature: (name: string) => name ===
'KAIROS', }))) with the shared mock from tests/mocks: remove this local
mock.module declaration and import/require the repository-wide bun:bundle mock
so the test uses the centralized feature-flag behavior; ensure the test file
references the shared mock (same contract exposing feature(name: string))
instead of redefining the KAIROS predicate locally.

In `@tests/integration/autonomy-lifecycle-user-flow.test.ts`:
- Around line 22-48: The test helper runAutonomyCli currently asserts stderr and
exitCode without surfacing the stderr content on failure; update the assertions
in runAutonomyCli to include stderr (and optionally exitCode) in the failure
message — e.g., use expect(stderr).toBe('', `autonomy CLI stderr: ${stderr}`) or
expect(...).withContext if supported, and similarly include stdout/exitCode in
messages or console.error(stderr) on failure so CI logs show the error output
for debugging; locate these changes in the runAutonomyCli function and adjust
the two expect calls that reference stderr and exitCode.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e3298aca-864c-4aca-9355-0d911549369e

📥 Commits

Reviewing files that changed from the base of the PR and between 4f1649e and f2e9af4.

📒 Files selected for processing (51)
  • docs/agent/sur-loop-scheduled-oom.md
  • docs/agent/sur-skill-overflow-bugs.md
  • docs/internals/autonomy-jira.md
  • packages/builtin-tools/src/tools/RemoteTriggerTool/__tests__/RemoteTriggerTool.test.ts
  • packages/color-diff-napi/src/index.ts
  • scripts/defines.ts
  • src/Tool.ts
  • src/__tests__/handlePromptSubmit.test.ts
  • src/__tests__/queryAutonomyProviderBoundary.test.ts
  • src/cli/print.ts
  • src/commands/skill-learning/index.ts
  • src/commands/skill-search/index.ts
  • src/daemon/main.ts
  • src/entrypoints/cli.tsx
  • src/hooks/__tests__/useScheduledTasks.test.ts
  • src/hooks/useReplBridge.tsx
  • src/hooks/useScheduledTasks.ts
  • src/proactive/useProactive.ts
  • src/query.ts
  • src/query/transitions.ts
  • src/screens/REPL.tsx
  • src/services/compact/postCompactCleanup.ts
  • src/services/skillLearning/featureCheck.ts
  • src/services/skillLearning/projectContext.ts
  • src/services/skillLearning/promotion.ts
  • src/services/skillSearch/featureCheck.ts
  • src/services/skillSearch/intentNormalize.ts
  • src/services/skillSearch/prefetch.ts
  • src/tasks/InProcessTeammateTask/InProcessTeammateTask.tsx
  • src/tasks/InProcessTeammateTask/types.ts
  • src/types/textInputTypes.ts
  • src/utils/__tests__/autonomyAuthority.test.ts
  • src/utils/__tests__/autonomyFlows.test.ts
  • src/utils/__tests__/autonomyPersistence.test.ts
  • src/utils/__tests__/autonomyQueueLifecycle.test.ts
  • src/utils/__tests__/autonomyRuns.test.ts
  • src/utils/autonomyAuthority.ts
  • src/utils/autonomyFlows.ts
  • src/utils/autonomyPersistence.ts
  • src/utils/autonomyQueueLifecycle.ts
  • src/utils/autonomyRuns.ts
  • src/utils/handlePromptSubmit.ts
  • src/utils/model/__tests__/providers.test.ts
  • src/utils/model/providers.ts
  • src/utils/processUserInput/__tests__/processSlashCommand.test.ts
  • src/utils/processUserInput/processSlashCommand.tsx
  • src/utils/processUserInput/processUserInput.ts
  • src/utils/swarm/inProcessRunner.ts
  • src/utils/swarm/spawnInProcess.ts
  • tests/integration/autonomy-lifecycle-user-flow.test.ts
  • tests/integration/dependency-overrides.test.ts

Comment thread docs/internals/autonomy-jira.md
Comment thread src/cli/print.ts Outdated
Comment thread src/screens/REPL.tsx
Comment thread src/utils/autonomyRuns.ts
Comment thread src/utils/autonomyRuns.ts
Comment thread src/utils/processUserInput/processSlashCommand.tsx
Comment thread tests/integration/dependency-overrides.test.ts Outdated
Comment thread tests/integration/dependency-overrides.test.ts Outdated
Twelve actionable items (7 Major + 5 Minor) from the CodeRabbit review on
claude-code-best#386:

- docs/internals/autonomy-jira.md: typo "due input close" → "due to input close".
- src/utils/autonomyRuns.ts:
  - selectPersistedAutonomyRuns no longer evicts active (queued/running) runs
    when the combined list exceeds AUTONOMY_RUNS_MAX. Active runs are kept in
    full and the inactive history is capped to the remaining budget so
    persisted ownership for live work survives.
  - isValidOwnerProcessId now allows pid <= 4_194_304 so a live run owned by
    the maximum Linux PID is not treated as stale.
- src/utils/autonomyAuthority.ts: maskCodeFencedLines tracks the active fence
  length and only closes the fence when a same-character run of equal-or-
  greater length appears with no trailing content, so a nested ```yaml inside
  an outer ```` block no longer leaks fake `tasks:` entries into the parser.
- src/cli/print.ts: late-shutdown branches in the cron and scheduled-task
  paths now call cancelQueuedAutonomyCommands({ commands: [command] }) instead
  of markAutonomyRunCancelled(...). Updating run state alone left the
  queue-side record orphaned for resume/recovery.
- src/utils/processUserInput/processSlashCommand.tsx: scheduled-task-result
  notification is enqueued before finalizeAutonomyRunCompleted (which queues
  follow-up autonomy commands) so both at priority: 'later' land in order and
  the next autonomy step can not run before the worker's output is observed.
- src/screens/REPL.tsx + src/utils/handlePromptSubmit.ts:
  - onQuery now returns Promise<boolean>: false from the concurrent-guard
    skip path, true otherwise. Other call sites use `void onQuery(...)` and
    are unaffected. handlePromptSubmit's onQuery prop type matches.
  - The autonomy-prompt callsite captures the executed flag, finalizes
    claim.claimedCommands as { type: 'completed' } only when onQuery actually
    ran, and runs the completed-finalize in its own try/catch so a failure
    there does not propagate into the outer catch and trigger a second
    finalize as { type: 'failed' } for the same commands.
  - Removed the unsafe `command.value as string` cast; createUserMessage
    already accepts `string | ContentBlockParam[]`.
  - createUserMessage mock in src/__tests__/handlePromptSubmit.test.ts now
    matches the new Promise<boolean> shape.
- packages/builtin-tools/src/tools/RemoteTriggerTool/__tests__/
  RemoteTriggerTool.test.ts:
  - Inline auth mock replaced with the shared tests/mocks/auth (added).
  - The full mock of src/constants/oauth.js is replaced by a narrow
    side-effect-only mock that overrides the env-reading helpers
    (getOauthConfig, fileSuffixForOauthConfig, MCP_CLIENT_METADATA_URL) and
    delegates pure data exports to the real module.
- tests/integration/dependency-overrides.test.ts:
  - mermaid does not export `./package.json` in its exports map, so
    require.resolve('mermaid/package.json') throws
    ERR_PACKAGE_PATH_NOT_EXPORTED in runtimes that honor exports semantics.
    The test now resolves the package entry and walks up to the package
    root via a small findPackageJson helper.
  - readFileSync from node:fs is replaced with `await Bun.file(...).text()`
    to match the project's Bun-API requirement.

Validation:
- bun run typecheck (clean).
- bun test → 3996 pass / 0 fail across 305 test files.

Targets PRs:
- #8 (fork-internal review)
- claude-code-best#386 (upstream review, same head branch)
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/utils/processUserInput/processSlashCommand.tsx (1)

228-276: ⚠️ Potential issue | 🟠 Major

Keep post-success finalization errors out of the worker-failure catch path.

After Line 268 enqueues a successful <scheduled-task-result>, any error from Line 269 falls into the outer catch and emits a second failed result on Lines 272-275. That produces contradictory task output and marks the autonomy run failed even though the slash command itself completed.

Suggested restructuring
-    void (async () => {
+    void (async () => {
+      let resultText: string
+      try {
         // ... runAgent loop ...
-      const resultText = extractResultText(agentMessages, 'Command completed');
-      logForDebugging(`Background forked command /${commandName} completed (agent ${agentId})`);
-      enqueueResult(`<scheduled-task-result command="/${commandName}">\n${resultText}\n</scheduled-task-result>`);
-      await finalizeDeferredAutonomyRunCompleted();
-    })().catch(async err => {
-      logError(err);
-      enqueueResult(
-        `<scheduled-task-result command="/${commandName}" status="failed">\n${err instanceof Error ? err.message : String(err)}\n</scheduled-task-result>`,
-      );
-      await finalizeDeferredAutonomyRunFailed(err);
-    });
+        resultText = extractResultText(agentMessages, 'Command completed');
+      } catch (err) {
+        logError(err);
+        enqueueResult(
+          `<scheduled-task-result command="/${commandName}" status="failed">\n${err instanceof Error ? err.message : String(err)}\n</scheduled-task-result>`,
+        );
+        await finalizeDeferredAutonomyRunFailed(err);
+        return;
+      }
+
+      logForDebugging(`Background forked command /${commandName} completed (agent ${agentId})`);
+      enqueueResult(`<scheduled-task-result command="/${commandName}">\n${resultText}\n</scheduled-task-result>`);
+      await finalizeDeferredAutonomyRunCompleted();
+    })().catch(logError);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/processUserInput/processSlashCommand.tsx` around lines 228 - 276,
The post-success finalization (await finalizeDeferredAutonomyRunCompleted()) can
throw and currently bubbles to the outer catch, causing a conflicting failed
result; wrap the finalizeDeferredAutonomyRunCompleted() call in its own
try/catch so errors are logged but not rethrown (do not call enqueueResult or
finalizeDeferredAutonomyRunFailed on that path). Modify the async IIFE success
path (after enqueueResult(...) and before/around
finalizeDeferredAutonomyRunCompleted()) to use: try { await
finalizeDeferredAutonomyRunCompleted(); } catch (e) { logError(e); } ensuring
the outer .catch only reflects actual run failures from the agent/runAgent
logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@packages/builtin-tools/src/tools/RemoteTriggerTool/__tests__/RemoteTriggerTool.test.ts`:
- Line 2: Update the shared test mock import to include the .ts extension: in
the test that imports authMock (symbol authMock in RemoteTriggerTool.test.ts),
change the import path from '../../../../../../tests/mocks/auth' to
'../../../../../../tests/mocks/auth.ts' so the shared test mock uses the .ts
extension per the test import rule.

In `@src/utils/handlePromptSubmit.ts`:
- Around line 461-468: The early return after processing queued autonomy
commands leaves the freshly published abortController installed, which can leak
stale cancellation state; before returning when commands.length === 0 in
handlePromptSubmit, call the code-path that clears or resets the published
abortController (the same place that published the fresh controller earlier) so
the global/posted abortController is set to null/cleared; update the block that
handles queuedAutonomyClaim (returned from
claimConsumableQueuedAutonomyCommands) to clear abortController (or call
resetAbortController) immediately prior to the return to ensure no stale
controller persists for the next turn.
- Around line 621-649: The catch block is turning a post-success finalizer error
into a run-level failure and double-finalizing claimedAutonomyCommands; change
the flow so finalizer errors do not overwrite a successful turn outcome: keep
the successful-path call that awaits finalizeAutonomyCommandsForTurn(...) and
enqueue nextCommands as-is, and in the catch only mark commands failed when the
exception came from the turn body (not from a separate finalizer call) —
implement this by moving the finalization call(s) into their own try/catch
(referencing claimedAutonomyCommands, finalizeAutonomyCommandsForTurn,
deferredAutonomyRunIds, enqueue, and the outer catch) and on finalizer errors
log/handle the error without re-invoking finalizeAutonomyCommandsForTurn with
outcome {type: 'failed'} for commands that were already finalized by the success
path.

In `@tests/mocks/auth.ts`:
- Around line 13-14: Update the mock implementations so they match the
production contract in src/utils/auth.ts: change
checkAndRefreshOAuthTokenIfNeeded to return a Promise<boolean> (e.g., resolve
true/false as appropriate for tests) and expand getClaudeAIOAuthTokens to return
the full token shape (include accessToken, refreshToken, scopes,
subscriptionType, expiresAt or expiresIn, and any other fields the real function
returns) so tests that branch on these values behave like production.

---

Outside diff comments:
In `@src/utils/processUserInput/processSlashCommand.tsx`:
- Around line 228-276: The post-success finalization (await
finalizeDeferredAutonomyRunCompleted()) can throw and currently bubbles to the
outer catch, causing a conflicting failed result; wrap the
finalizeDeferredAutonomyRunCompleted() call in its own try/catch so errors are
logged but not rethrown (do not call enqueueResult or
finalizeDeferredAutonomyRunFailed on that path). Modify the async IIFE success
path (after enqueueResult(...) and before/around
finalizeDeferredAutonomyRunCompleted()) to use: try { await
finalizeDeferredAutonomyRunCompleted(); } catch (e) { logError(e); } ensuring
the outer .catch only reflects actual run failures from the agent/runAgent
logic.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b72ef657-5093-42b1-b3c6-d8885125aa10

📥 Commits

Reviewing files that changed from the base of the PR and between f2e9af4 and 452a7e6.

📒 Files selected for processing (11)
  • docs/internals/autonomy-jira.md
  • packages/builtin-tools/src/tools/RemoteTriggerTool/__tests__/RemoteTriggerTool.test.ts
  • src/__tests__/handlePromptSubmit.test.ts
  • src/cli/print.ts
  • src/screens/REPL.tsx
  • src/utils/autonomyAuthority.ts
  • src/utils/autonomyRuns.ts
  • src/utils/handlePromptSubmit.ts
  • src/utils/processUserInput/processSlashCommand.tsx
  • tests/integration/dependency-overrides.test.ts
  • tests/mocks/auth.ts
✅ Files skipped from review due to trivial changes (1)
  • src/utils/autonomyRuns.ts
🚧 Files skipped from review as they are similar to previous changes (5)
  • docs/internals/autonomy-jira.md
  • tests/integration/dependency-overrides.test.ts
  • src/tests/handlePromptSubmit.test.ts
  • src/screens/REPL.tsx
  • src/cli/print.ts

} from 'src/bootstrap/state.js'
import { logMock } from '../../../../../../tests/mocks/log'
import { debugMock } from '../../../../../../tests/mocks/debug'
import { authMock } from '../../../../../../tests/mocks/auth'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use .ts extension for shared test mock imports.

Please update this shared mock import to include the .ts extension for consistency with the test import rule.

Patch
-import { authMock } from '../../../../../../tests/mocks/auth'
+import { authMock } from '../../../../../../tests/mocks/auth.ts'

As per coding guidelines, "When importing shared test mocks, use .ts extension ... Never inline mock definitions; update shared mock files instead".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
import { authMock } from '../../../../../../tests/mocks/auth'
import { authMock } from '../../../../../../tests/mocks/auth.ts'
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/builtin-tools/src/tools/RemoteTriggerTool/__tests__/RemoteTriggerTool.test.ts`
at line 2, Update the shared test mock import to include the .ts extension: in
the test that imports authMock (symbol authMock in RemoteTriggerTool.test.ts),
change the import path from '../../../../../../tests/mocks/auth' to
'../../../../../../tests/mocks/auth.ts' so the shared test mock uses the .ts
extension per the test import rule.

Comment thread src/utils/handlePromptSubmit.ts
Comment thread src/utils/handlePromptSubmit.ts Outdated
Comment thread tests/mocks/auth.ts Outdated


Four inline + one outside-diff actionable comment from the second CodeRabbit
review on claude-code-best#386:

- tests/mocks/auth.ts: align mock return contracts with src/utils/auth.ts.
  checkAndRefreshOAuthTokenIfNeeded resolves to a Promise<boolean> and
  getClaudeAIOAuthTokens returns the full token shape (refreshToken, expiresAt,
  scopes, subscriptionType, rateLimitTier) so tests that branch on these
  values can not silently drift away from production.
- src/utils/handlePromptSubmit.ts (461-468): clear the freshly-published
  abortController before the early return when every claimed autonomy command
  was skipped as non-consumable, so this turn's stale controller does not leak
  into the next turn.
- src/utils/handlePromptSubmit.ts (621-649): separate execution failure from
  finalizer failure. The turn body now writes to a `turnError` slot; a single
  pass after the inner try decides whether to finalize claimed commands as
  `completed` or `failed`, with each finalize call wrapped in its own
  try/catch so a failure inside finalize does not flip a successful turn into
  `failed` and double-finalize the same commands. The outer catch only
  rethrows the original turn error.
- src/utils/processUserInput/processSlashCommand.tsx (228-276): wrap the
  post-success `finalizeDeferredAutonomyRunCompleted()` call in its own
  try/catch so a finalize failure no longer falls into the worker-failure
  catch path and emits a contradictory `<scheduled-task-result status="failed">`
  for a slash command that actually succeeded.

Outside scope (not changed) — the CodeRabbit suggestion to add a `.ts`
extension to the shared `tests/mocks/auth` import contradicts the project's
existing convention: every other test imports the shared mocks without the
extension (e.g. `tests/mocks/log`, `tests/mocks/debug`,
`tests/mocks/file-system`), and the project's tsconfig does not enable
`allowImportingTsExtensions`, so adding the extension fails typecheck. The
import is kept extension-less to match the rest of the suite.

Validation:
- bun run typecheck (clean).
- bun test → 3996 pass / 0 fail across 305 test files.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/utils/processUserInput/processSlashCommand.tsx (1)

358-365: Remove forced casts where type guards already narrow the union.

Lines 358/365/374 use as AssistantMessage in branches already narrowed by message.type / normalizedMsg.type. This weakens strict typing and hides mismatches.

Proposed refactor
-        const contentLength = getAssistantMessageContentLength(message as AssistantMessage);
+        const contentLength = getAssistantMessageContentLength(message);
         if (contentLength > 0) {
           context.setResponseLength(len => len + contentLength);
         }

         const normalizedMsg = normalizedNew[0];
         if (normalizedMsg && normalizedMsg.type === 'assistant') {
-          progressMessages.push(createProgressMessage(message as AssistantMessage));
+          progressMessages.push(createProgressMessage(message));
           updateProgress();
         }
       }

       if (message.type === 'user') {
         const normalizedMsg = normalizedNew[0];
         if (normalizedMsg && normalizedMsg.type === 'user') {
-          progressMessages.push(createProgressMessage(normalizedMsg as AssistantMessage));
+          progressMessages.push(createProgressMessage(normalizedMsg));
           updateProgress();
         }
       }
As per coding guidelines: `Use type guards (type narrowing) with union types instead of forced type casting`.

Also applies to: 372-375

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/processUserInput/processSlashCommand.tsx` around lines 358 - 365,
The code is using forced casts like "as AssistantMessage" even though earlier
type checks on message.type and normalizedMsg.type already narrow the union;
remove these unnecessary casts and rely on the narrowed types so the compiler
enforces correct shapes—update occurrences around
getAssistantMessageContentLength(message as AssistantMessage),
createProgressMessage(message as AssistantMessage) and
createProgressMessage(normalizedMsg as AssistantMessage) to pass
message/normalizedMsg directly after the runtime checks, and if needed adjust
the signatures of getAssistantMessageContentLength and createProgressMessage (or
add proper type predicates) so they accept the union but benefit from TypeScript
narrowing in processSlashCommand.tsx.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/utils/processUserInput/processSlashCommand.tsx`:
- Around line 277-283: The detached catch currently calls
finalizeDeferredAutonomyRunFailed(err) directly which can throw and cause an
unhandled rejection; wrap the call to finalizeDeferredAutonomyRunFailed(err)
inside its own try/catch, log any error locally (e.g., via logError) and avoid
rethrowing so the detached promise cannot reject, and keep the existing
enqueueResult(...) and logError(err) behavior intact; reference
finalizeDeferredAutonomyRunFailed, logError, enqueueResult, and commandName when
making this change.

---

Nitpick comments:
In `@src/utils/processUserInput/processSlashCommand.tsx`:
- Around line 358-365: The code is using forced casts like "as AssistantMessage"
even though earlier type checks on message.type and normalizedMsg.type already
narrow the union; remove these unnecessary casts and rely on the narrowed types
so the compiler enforces correct shapes—update occurrences around
getAssistantMessageContentLength(message as AssistantMessage),
createProgressMessage(message as AssistantMessage) and
createProgressMessage(normalizedMsg as AssistantMessage) to pass
message/normalizedMsg directly after the runtime checks, and if needed adjust
the signatures of getAssistantMessageContentLength and createProgressMessage (or
add proper type predicates) so they accept the union but benefit from TypeScript
narrowing in processSlashCommand.tsx.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1a481644-aa1d-4105-8d17-9e5a7ad2dc06

📥 Commits

Reviewing files that changed from the base of the PR and between 452a7e6 and 189766c.

📒 Files selected for processing (3)
  • src/utils/handlePromptSubmit.ts
  • src/utils/processUserInput/processSlashCommand.tsx
  • tests/mocks/auth.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/mocks/auth.ts

Comment on lines +277 to +283
})().catch(async err => {
logError(err);
enqueueResult(
`<scheduled-task-result command="/${commandName}" status="failed">\n${err instanceof Error ? err.message : String(err)}\n</scheduled-task-result>`,
)
})
);
await finalizeDeferredAutonomyRunFailed(err);
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard failure-finalization errors inside the detached catch.

If finalizeDeferredAutonomyRunFailed(err) throws on Line 282, the detached promise can reject unhandled. Wrap it in its own try/catch and log locally.

Proposed fix
     })().catch(async err => {
       logError(err);
       enqueueResult(
         `<scheduled-task-result command="/${commandName}" status="failed">\n${err instanceof Error ? err.message : String(err)}\n</scheduled-task-result>`,
       );
-      await finalizeDeferredAutonomyRunFailed(err);
+      try {
+        await finalizeDeferredAutonomyRunFailed(err);
+      } catch (finalizeError) {
+        logError(finalizeError);
+      }
     });
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
})().catch(async err => {
logError(err);
enqueueResult(
`<scheduled-task-result command="/${commandName}" status="failed">\n${err instanceof Error ? err.message : String(err)}\n</scheduled-task-result>`,
)
})
);
await finalizeDeferredAutonomyRunFailed(err);
});
})().catch(async err => {
logError(err);
enqueueResult(
`<scheduled-task-result command="/${commandName}" status="failed">\n${err instanceof Error ? err.message : String(err)}\n</scheduled-task-result>`,
);
try {
await finalizeDeferredAutonomyRunFailed(err);
} catch (finalizeError) {
logError(finalizeError);
}
});
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/processUserInput/processSlashCommand.tsx` around lines 277 - 283,
The detached catch currently calls finalizeDeferredAutonomyRunFailed(err)
directly which can throw and cause an unhandled rejection; wrap the call to
finalizeDeferredAutonomyRunFailed(err) inside its own try/catch, log any error
locally (e.g., via logError) and avoid rethrowing so the detached promise cannot
reject, and keep the existing enqueueResult(...) and logError(err) behavior
intact; reference finalizeDeferredAutonomyRunFailed, logError, enqueueResult,
and commandName when making this change.

claude added 2 commits April 29, 2026 12:38
应用 PR claude-code-best#386 review 的剩余 nit。pid_max 边界、REPL cast、autonomy-jira typo
三处与远端 fixup (452a7e6) 内容相同,rebase 时已去重,本次提交仅包含 code
fence 语言标签这一项。
- src/cli/print.ts: cron onFire 改用 createAutonomyQueuedPromptIfNoActiveSource
  并以 prompt 文本作为 sourceId,避免同一定时提示在前一次 run 仍活跃时被重复
  入队叠加;顺手移除 4 个已没人引用的 dead import
  (commitAutonomyQueuedPrompt / prepareAutonomyTurnPrompt /
   markAutonomyRunCancelled / createAutonomyQueuedPrompt)
- src/services/compact/postCompactCleanup.ts: 在 void import().then() 处加
  注释,明确 sweepFileContentCache 是有意的 fire-and-forget,函数对外保持
  同步签名是设计而非疏忽
- src/utils/autonomyFlows.ts: 给 selectPersistedAutonomyFlows 的两阶段排序
  加文档注释(先按 active+updatedAt 选 top-N,再统一按 updatedAt 重排)
- tests/integration/autonomy-lifecycle-user-flow.test.ts: stderr 断言失败时
  把实际 stderr 内容写进 message,方便 CI 失败时定位
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/cli/print.ts (1)

2865-2883: ⚠️ Potential issue | 🟠 Major

Route this headless scheduled-task failure through the queue lifecycle helper.

This branch creates a queued autonomy command and then calls markAutonomyRunFailed(...) directly. Everywhere else in this file now goes through the queue lifecycle helpers, so this path can drift and leave queue-side state behind if markAutonomyRunFailed() does not retract the created command as well.

Verify whether markAutonomyRunFailed() also cleans up the queue-side record created by createAutonomyQueuedPromptIfNoActiveSource(). If it does not, this branch should use the same queue lifecycle helper path as the other autonomy branches.

#!/bin/bash
set -euo pipefail

rg -n -C4 \
  'export async function (createAutonomyQueuedPromptIfNoActiveSource|cancelQueuedAutonomyCommands|markAutonomyRunFailed)|\b(createAutonomyQueuedPromptIfNoActiveSource|cancelQueuedAutonomyCommands|markAutonomyRunFailed)\(' \
  src
🧹 Nitpick comments (1)
tests/integration/autonomy-lifecycle-user-flow.test.ts (1)

5-14: Prefer src/* alias imports over deep relative paths.

Please switch these imports to the configured alias for consistency and easier refactors.

Proposed refactor
 import {
   resetStateForTests,
   setOriginalCwd,
   setProjectRoot,
-} from '../../src/bootstrap/state'
+} from 'src/bootstrap/state'
 import {
   listAutonomyRuns,
   startManagedAutonomyFlowFromHeartbeatTask,
-} from '../../src/utils/autonomyRuns'
-import { listAutonomyFlows } from '../../src/utils/autonomyFlows'
+} from 'src/utils/autonomyRuns'
+import { listAutonomyFlows } from 'src/utils/autonomyFlows'

As per coding guidelines "Path alias src/* is available via tsconfig and maps to ./src/*. Use this for imports throughout the codebase".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration/autonomy-lifecycle-user-flow.test.ts` around lines 5 - 14,
Replace the deep relative imports with the project's tsconfig path alias by
updating the import statements that reference resetStateForTests,
setOriginalCwd, setProjectRoot, listAutonomyRuns,
startManagedAutonomyFlowFromHeartbeatTask, and listAutonomyFlows to use the
src/* alias (e.g., import { resetStateForTests, setOriginalCwd, setProjectRoot }
from 'src/bootstrap/state' and import the autonomy utilities from
'src/utils/autonomyRuns' and 'src/utils/autonomyFlows') so the module resolution
uses the configured path alias instead of ../../src/ deep relative paths.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/cli/print.ts`:
- Around line 2299-2331: The code currently treats any non-explicit-error path
from ask() as a completed turn; modify the ask() handling around
finalizeAutonomyCommandsForTurn and the loop that enqueues nextCommands to track
whether a terminal result was observed (e.g., a boolean seenTerminalResult
initialized before iterating ask()), set it true only when you actually receive
a terminal result, and on iterator/abort/early-close paths call
finalizeAutonomyCommandsForTurn with outcome type 'cancelled' or 'failed' (not
'completed') using claimedAutonomyCommands and
currentDir/cmd.workload/options.workload; ensure the existing enqueue(...) +
randomUUID() logic runs only when seenTerminalResult is true and map caught
exceptions to the failed outcome as already done.

---

Nitpick comments:
In `@tests/integration/autonomy-lifecycle-user-flow.test.ts`:
- Around line 5-14: Replace the deep relative imports with the project's
tsconfig path alias by updating the import statements that reference
resetStateForTests, setOriginalCwd, setProjectRoot, listAutonomyRuns,
startManagedAutonomyFlowFromHeartbeatTask, and listAutonomyFlows to use the
src/* alias (e.g., import { resetStateForTests, setOriginalCwd, setProjectRoot }
from 'src/bootstrap/state' and import the autonomy utilities from
'src/utils/autonomyRuns' and 'src/utils/autonomyFlows') so the module resolution
uses the configured path alias instead of ../../src/ deep relative paths.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 54cdb153-cace-4984-805e-ad13af0beb2e

📥 Commits

Reviewing files that changed from the base of the PR and between f8388e4 and 6b7cfda.

📒 Files selected for processing (4)
  • src/cli/print.ts
  • src/services/compact/postCompactCleanup.ts
  • src/utils/autonomyFlows.ts
  • tests/integration/autonomy-lifecycle-user-flow.test.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/services/compact/postCompactCleanup.ts
  • src/utils/autonomyFlows.ts

Comment thread src/cli/print.ts
Comment on lines +2299 to +2331
await finalizeAutonomyCommandsForTurn({
commands: claimedAutonomyCommands,
outcome: {
type: 'failed',
message: 'ask() returned an error result',
},
currentDir: cwd(),
priority: 'later',
workload: cmd.workload ?? options.workload,
})
} else {
for (const runId of autonomyRunIds) {
const nextCommands = await finalizeAutonomyRunCompleted({
runId,
currentDir: cwd(),
priority: 'later',
workload: cmd.workload ?? options.workload,
const nextCommands = await finalizeAutonomyCommandsForTurn({
commands: claimedAutonomyCommands,
outcome: { type: 'completed' },
currentDir: cwd(),
priority: 'later',
workload: cmd.workload ?? options.workload,
})
for (const nextCommand of nextCommands) {
enqueue({
...nextCommand,
uuid: randomUUID(),
})
for (const nextCommand of nextCommands) {
enqueue({
...nextCommand,
uuid: randomUUID(),
})
}
}
}
} catch (error) {
for (const runId of autonomyRunIds) {
await finalizeAutonomyRunFailed({
runId,
error: String(error),
})
}
await finalizeAutonomyCommandsForTurn({
commands: claimedAutonomyCommands,
outcome: { type: 'failed', error },
currentDir: cwd(),
priority: 'later',
workload: cmd.workload ?? options.workload,
})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't finalize autonomy turns as completed when ask() exits without a terminal result.

This branch only distinguishes “explicit error result” vs “everything else.” If the async iterator closes early on abort/generator-close without yielding a final result, the code falls into the completed path and advances the autonomy run/flow anyway. Track whether a terminal result was actually seen, and map aborted/closed turns to cancelled/failed before calling finalizeAutonomyCommandsForTurn.

Suggested direction
           let lastResultIsError = false
+          let sawTerminalResult = false
           try {
             await runWithWorkload(
               cmd.workload ?? options.workload,
               async () => {
                 for await (const message of ask({
@@
                   if (message.type === 'result') {
+                    sawTerminalResult = true
                     lastResultIsError = !!(message as Record<string, unknown>)
                       .is_error
@@
-            if (lastResultIsError) {
-              await finalizeAutonomyCommandsForTurn({
-                commands: claimedAutonomyCommands,
-                outcome: {
-                  type: 'failed',
-                  message: 'ask() returned an error result',
-                },
-                currentDir: cwd(),
-                priority: 'later',
-                workload: cmd.workload ?? options.workload,
-              })
-            } else {
-              const nextCommands = await finalizeAutonomyCommandsForTurn({
-                commands: claimedAutonomyCommands,
-                outcome: { type: 'completed' },
-                currentDir: cwd(),
-                priority: 'later',
-                workload: cmd.workload ?? options.workload,
-              })
+            const outcome =
+              !sawTerminalResult
+                ? abortController.signal.aborted
+                  ? { type: 'cancelled' as const }
+                  : {
+                      type: 'failed' as const,
+                      message: 'ask() closed before emitting a terminal result',
+                    }
+                : lastResultIsError
+                  ? {
+                      type: 'failed' as const,
+                      message: 'ask() returned an error result',
+                    }
+                  : { type: 'completed' as const }
+
+            const nextCommands = await finalizeAutonomyCommandsForTurn({
+              commands: claimedAutonomyCommands,
+              outcome,
+              currentDir: cwd(),
+              priority: 'later',
+              workload: cmd.workload ?? options.workload,
+            })
+
+            if (outcome.type === 'completed') {
               for (const nextCommand of nextCommands) {
                 enqueue({
                   ...nextCommand,
                   uuid: randomUUID(),
                 })
               }
             }
           } catch (error) {
             await finalizeAutonomyCommandsForTurn({
               commands: claimedAutonomyCommands,
-              outcome: { type: 'failed', error },
+              outcome: abortController.signal.aborted
+                ? { type: 'cancelled' }
+                : { type: 'failed', error },
               currentDir: cwd(),
               priority: 'later',
               workload: cmd.workload ?? options.workload,
             })
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cli/print.ts` around lines 2299 - 2331, The code currently treats any
non-explicit-error path from ask() as a completed turn; modify the ask()
handling around finalizeAutonomyCommandsForTurn and the loop that enqueues
nextCommands to track whether a terminal result was observed (e.g., a boolean
seenTerminalResult initialized before iterating ask()), set it true only when
you actually receive a terminal result, and on iterator/abort/early-close paths
call finalizeAutonomyCommandsForTurn with outcome type 'cancelled' or 'failed'
(not 'completed') using claimedAutonomyCommands and
currentDir/cmd.workload/options.workload; ensure the existing enqueue(...) +
randomUUID() logic runs only when seenTerminalResult is true and map caught
exceptions to the failed outcome as already done.

简化 (S1, S2):
- src/cli/print.ts: 抽出 dispatchHeadlessCronCommand 本地 helper,把
  cron 三个入口(onFire / onFireTask agent / onFireTask 非-agent)共享的
  「dedup-claim → input-close-recheck → onSuccess」管线集中到一处,
  避免三个分支在「claim 与 dispatch 之间发生 inputClosed」的处理上漂移。
  enqueueAndRun 再抽出来,使两个非-agent 分支共用一个 onSuccess 回调。
  约 -55 行重复模板。
- src/utils/autonomyPersistence.ts: 新增 retainActiveFirst<T> 泛型
  helper —— active 记录无条件保留(不参与 cap),inactive 按 timestamp
  desc 填满剩余预算;统一 selectPersistedAutonomyRuns / Flows 的两阶段
  排序语义。
- src/utils/autonomyRuns.ts、autonomyFlows.ts: 改用 retainActiveFirst,
  删掉重复的内联两阶段排序逻辑。

复用 (R1, review #8):
- tests/mocks/file-system.ts: 新增 readTempFile / tempPathExists 两个
  Bun.file 包装,补齐 Node fs.readFileSync / existsSync 在测试里的
  Bun-only 等价物。
- src/utils/__tests__/autonomyRuns.test.ts: 把全部 Node fs/path 导入
  (existsSync, readFileSync, mkdir, writeFile, path.join/resolve)替换为
  tests/mocks/file-system 的共享 helper + node:path(带 node: 前缀)。
  不再有 6 处 mkdir + writeFile 模板,统一用 writeTempFile(自带 mkdir-p)。
  解决 review #8 (Major) 的 Bun-only 运行时契约违反。

防御 (D1, OOM 早期信号):
- src/services/compact/postCompactCleanup.ts: 在 void import().then() 末尾
  补 .catch(logError)。当前 attributionHooks 是 stub,但当真实现被恢复
  且 sweepFileContentCache 抛错时,这个 .catch 阻止它变成 unhandled
  rejection(函数返回值是 void,调用者无从观察异步失败)。
- src/utils/autonomyRuns.ts: 给 active runs 加 100 条软上限 + 一次性
  warn。selectPersistedAutonomyRuns 仍然永不淘汰 active 记录,但跨过
  阈值时 logError 一次,作为 finalize-leak 早期信号——避免 active 无限
  增长悄悄使 AUTONOMY_RUNS_MAX 失效。
@claude-code-best claude-code-best merged commit edae3a7 into claude-code-best:main Apr 29, 2026
3 checks passed
@amDosion amDosion deleted the feat/autonomy-lifecycle-upstream branch April 30, 2026 06:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants