Skip to content

fix(acp): recover from persist() failure instead of cascading (#3366)#3383

Merged
aegis-gh-agent[bot] merged 3 commits into
developfrom
fix/3366-acp-persist-cascade
May 14, 2026
Merged

fix(acp): recover from persist() failure instead of cascading (#3366)#3383
aegis-gh-agent[bot] merged 3 commits into
developfrom
fix/3366-acp-persist-cascade

Conversation

@OneStepAt4time

Copy link
Copy Markdown
Owner

Summary

Fixes #3366

FileAcpLocalStorageProfile.persist() chained writes onto writeChain without error recovery. A single failed write (ENOENT, EACCES, inode change) permanently poisoned all subsequent writes because every .then() chained onto the rejected promise — blocking ALL session creation.

Root Cause

// Before: chains .then() — one rejection poisons everything
this.writeChain = this.writeChain.then(
  () => writeFile(tmpFile, content, "utf8").then(() => rename(tmpFile, this.config.filePath)),
);
await this.writeChain; // if previous rejected, this rejects too

Fix

// After: catches errors, resets chain, retries on previous failure
const prevChain = this.writeChain;
this.writeChain = prevChain
  .then(
    () => doWrite(),           // previous succeeded → do this write
    () => doWrite(),           // previous failed → still try this write
  )
  .then(() => { this.persistError = null; })
  .catch((err) => {
    this.persistError = err;
    logger.error({ ... });
    unlink(tmpFile).catch(() => {});  // clean up stale tmp
    this.writeChain = Promise.resolve();  // RESET chain
  });
await this.writeChain;

Changes

  • persist(): error recovery with chain reset on failure
  • stop(): catches pending chain rejections so shutdown always completes
  • getPersistError(): new method for diagnostics/health checks
  • 3 regression tests: recovery after failure, cascade prevention across multiple failures, graceful shutdown with pending rejection

Verification

npx tsc --noEmit: ✅ Zero errors
npm run build: ✅ Success
npm test: ✅ 4211 passed (9 pre-existing failures on develop too)
New tests: 3/3 passed

Verification

The writeChain pattern in FileAcpLocalStorageProfile.persist() chained
.then() without error recovery. A single failed write (ENOENT, EACCES,
inode change) poisoned all subsequent writes because every new .then()
chained onto the rejected promise — blocking ALL session creation
permanently.

Fix:
- persist() now catches write errors, logs them, cleans up stale tmp
  files, and resets the writeChain to Promise.resolve()
- If the previous write failed, the current write still gets a fresh
  attempt via the .then(onFulfilled, onRejected) rejection handler
- stop() swallows pending chain rejections so shutdown always completes
- Added getPersistError() for diagnostics/health checks
- Added 3 regression tests covering recovery, cascade prevention, and
  graceful shutdown

@aegis-gh-agent aegis-gh-agent Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved.

P1 fix — clean error recovery pattern for the persist() cascade bug.

  • persist() catches errors, logs via structured logger, cleans up stale tmp file, resets writeChain so next write gets a fresh attempt.
  • Both .then() handlers (success + recovery) execute the write — ensures no cascade.
  • stop() swallows chain errors — clean shutdown guaranteed.
  • getPersistError() on interface for diagnostics.
  • 3 tests covering recovery, multiple sequential failures, and clean stop.

CI green. All merge gates pass. Merging.

@aegis-gh-agent aegis-gh-agent Bot merged commit ed2474c into develop May 14, 2026
17 checks passed
@aegis-gh-agent aegis-gh-agent Bot deleted the fix/3366-acp-persist-cascade branch May 14, 2026 09:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant