Skip to content

Commit 69109d8

Browse files
chore: format with prettier [skip ci]
1 parent 7c7c985 commit 69109d8

9 files changed

Lines changed: 4571 additions & 3644 deletions

docs/plans/session-lifecycle-persistence.md

Lines changed: 20 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ save/restore middle.
1919
2. **The "last-stop manifest" is the existing SQLite state, not a new file.**
2020
`ListAllSessions` already records id, kind (worker/orchestrator), harness,
2121
`is_terminated`, and `Metadata{branch, workspacePath, agentSessionId,
22-
prompt}`. The `session_worktrees` table already has a `preserved_ref` column
22+
prompt}`. The `session_worktrees` table already has a `preserved_ref` column
2323
(migration 0009) that nothing currently writes. No manifest.json, no new
2424
migration, no new format. The manifest is a query.
2525
3. **Uncommitted work is captured as a git commit object pointed to by a ref**
@@ -29,7 +29,7 @@ save/restore middle.
2929
the stable key the rest of the system already uses.
3030
4. **Untracked files: respect `.gitignore`.** Build the preserve commit through
3131
a temp index (`GIT_INDEX_FILE=<tmp> git add -A; git write-tree; git
32-
commit-tree`) so tracked + staged + new (non-ignored) files are captured,
32+
commit-tree`) so tracked + staged + new (non-ignored) files are captured,
3333
side-effect-free, without mutating the working tree or the stash stack.
3434
Ignored paths (`node_modules/`, build output, ignored `.env`) are skipped.
3535
Log a one-line count of skipped ignored paths so it is never silent. (Chosen
@@ -84,6 +84,7 @@ save/restore middle.
8484
## Tasks (smallest coherent diff first; each ends with ONE runnable check)
8585

8686
### Task 1 — `ForceDestroy` on the workspace port + gitworktree adapter
87+
8788
Add `ForceDestroy(ctx, info) error` to the `ports.Workspace` interface and the
8889
gitworktree adapter. It runs `git worktree remove --force <path>`, then prune,
8990
then `os.RemoveAll` as a backstop. New arg builder in `commands.go`; leave the
@@ -93,6 +94,7 @@ comment that ForceDestroy is only safe after the work is captured.
9394
`ForceDestroy`, and asserts the path is gone and the worktree is deregistered.
9495

9596
### Task 2 — `StashUncommitted` + `ApplyPreserved` on the gitworktree adapter
97+
9698
- `StashUncommitted(ctx, info) (ref string, err error)`: build the preserve
9799
commit via a temp index that respects `.gitignore`
98100
(`GIT_INDEX_FILE=<tmp> git add -A``git write-tree``git commit-tree`),
@@ -104,11 +106,12 @@ comment that ForceDestroy is only safe after the work is captured.
104106
the commit). On clean success delete the ref (`git update-ref -d`); on
105107
conflict, keep the ref, leave conflict markers, return a sentinel the caller
106108
logs.
107-
**Check:** Go test that round-trips a tracked edit AND a new non-ignored file
108-
through StashUncommitted → ForceDestroy → re-add → ApplyPreserved and asserts
109-
both reappear; and that a path matched by `.gitignore` does NOT reappear.
109+
**Check:** Go test that round-trips a tracked edit AND a new non-ignored file
110+
through StashUncommitted → ForceDestroy → re-add → ApplyPreserved and asserts
111+
both reappear; and that a path matched by `.gitignore` does NOT reappear.
110112

111113
### Task 3 — `SaveAndTeardownAll` + `RestoreAll` on the session manager
114+
112115
- `SaveAndTeardownAll(ctx)`: `ListAllSessions`; for each live (non-terminated)
113116
session with a non-empty `Metadata.WorkspacePath`: `StashUncommitted`
114117
`UpsertSessionWorktree(preserved_ref=...)` (commit) → `MarkTerminated`
@@ -127,24 +130,26 @@ both reappear; and that a path matched by `.gitignore` does NOT reappear.
127130
gate on `preserved_ref` being non-empty: a clean worktree at shutdown
128131
writes a row with an empty `preserved_ref` and must still be restored.
129132
No new column is needed (consistent with Task 6 leaving `state` alone).
130-
**Check:** Go test with fakes asserting (a) save calls capture-then-force in
131-
order and writes preserved_ref before ForceDestroy, (b) RestoreAll restores BOTH
132-
a worker and an orchestrator, (c) a session the user killed before shutdown is
133-
not resurrected.
133+
**Check:** Go test with fakes asserting (a) save calls capture-then-force in
134+
order and writes preserved_ref before ForceDestroy, (b) RestoreAll restores BOTH
135+
a worker and an orchestrator, (c) a session the user killed before shutdown is
136+
not resurrected.
134137

135138
### Task 4 — Wire into daemon boot/shutdown (`daemon.go`)
139+
136140
- After `startSession` returns and before `srv.Run(ctx)`: call `RestoreAll`
137141
(best-effort; log failures; never block boot).
138142
- After `srv.Run(ctx)` returns and before the store closes: call
139143
`SaveAndTeardownAll` with a fresh bounded context (not the cancelled `ctx`).
140144
- Expose the manager (or a minimal `LifecycleSaver`/`LifecycleRestorer` seam)
141145
from the wiring up to `Run`.
142-
**Check:** Manual run documented in report — spawn a session, edit a tracked
143-
file + add a new file, `POST /shutdown`; assert worktree removed and
144-
`refs/ao/preserved/<id>` exists; restart daemon; assert worktree re-created and
145-
both edits reapplied. Plus `go build ./backend/...` green.
146+
**Check:** Manual run documented in report — spawn a session, edit a tracked
147+
file + add a new file, `POST /shutdown`; assert worktree removed and
148+
`refs/ao/preserved/<id>` exists; restart daemon; assert worktree re-created and
149+
both edits reapplied. Plus `go build ./backend/...` green.
146150

147151
### Task 5 — Frontend: call `/shutdown` before kill (`main.ts`)
152+
148153
In `before-quit`: `event.preventDefault()` once, `await fetch(
149154
http://127.0.0.1:<port>/shutdown, {method:'POST'})` with an ~8s bounded timeout
150155
(port from the running.json the app already reads), then `killDaemon` +
@@ -153,6 +158,7 @@ http://127.0.0.1:<port>/shutdown, {method:'POST'})` with an ~8s bounded timeout
153158
log shows the save ran and exited cleanly (not just SIGTERM-killed).
154159

155160
### Task 6 — Trim the over-built `session_worktrees.state` enum usage
161+
156162
No schema change. Ensure the save/restore code reads/writes only `preserved_ref`
157163
and leaves `state` at its default; add `ponytail:` comments noting the enum is
158164
unused multi-repo scaffolding.
@@ -208,6 +214,5 @@ endpoint. No new file, migration, format, or endpoint.
208214
## Execution order
209215

210216
Tasks are sequential where coupled: Task 2 shares the gitworktree adapter with
211-
Task 1 (do 1 then 2, same package); Task 3 depends on 1 + 2; Task 4 depends on
212-
3. Task 5 (frontend) and Task 6 (storage cleanup) are independent and can run
217+
Task 1 (do 1 then 2, same package); Task 3 depends on 1 + 2; Task 4 depends on 3. Task 5 (frontend) and Task 6 (storage cleanup) are independent and can run
213218
anytime. Suggested order: 1 → 2 → 3 → 4, then 5 and 6.

docs/superpowers/plans/2026-06-24-crash-proof-session-reconcile.md

Lines changed: 26 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -35,10 +35,12 @@
3535
## Task 1: Widen `runtimeController` with `IsAlive` and adopt-alive live pass
3636

3737
**Files:**
38+
3839
- Modify: `backend/internal/session_manager/manager.go:64-67` (interface), add methods near `manager.go:558-623`
3940
- Test: `backend/internal/session_manager/manager_test.go:138-152` (fake), new test fn
4041

4142
**Interfaces:**
43+
4244
- Consumes: `domain.SessionRecord` (`.IsTerminated`, `.Metadata.WorkspacePath`, `.Metadata.Branch`, `.Metadata.RuntimeHandleID`); `runtimeHandle(meta)` -> `ports.RuntimeHandle`; `workspaceInfo(rec)` -> `ports.WorkspaceInfo`; `m.workspace.StashUncommitted`, `m.lcm.MarkTerminated`, `m.store.ListAllSessions`.
4345
- Produces: `func (m *Manager) reconcileLive(ctx context.Context, rec domain.SessionRecord) error`; widened `runtimeController` with `IsAlive(ctx context.Context, handle ports.RuntimeHandle) (bool, error)`.
4446

@@ -210,10 +212,12 @@ git -c user.email=dev@theharshitsingh.com commit -m "feat(session): reconcile li
210212
## Task 2: Reap pass and the `Reconcile` entry point
211213

212214
**Files:**
215+
213216
- Modify: `backend/internal/session_manager/manager.go` (add `reconcileReap`, `Reconcile`; the latter reuses the existing `RestoreAll` body)
214217
- Test: `backend/internal/session_manager/manager_test.go`
215218

216219
**Interfaces:**
220+
217221
- Consumes: `m.store.ListAllSessions`, `m.runtime.IsAlive`, `m.runtime.Destroy`, `reconcileLive` (Task 1), the existing `RestoreAll` method (`manager.go:637`).
218222
- Produces: `func (m *Manager) Reconcile(ctx context.Context) error`; `func (m *Manager) reconcileReap(ctx context.Context, rec domain.SessionRecord) error`.
219223

@@ -334,7 +338,7 @@ func (m *Manager) Reconcile(ctx context.Context) error {
334338
}
335339
```
336340

337-
> Note: the live pass re-reads `rec.IsTerminated` from the pre-pass snapshot, so a session terminated *by* the live pass is not also reaped in the same run. That is fine: its tmux is already gone (that is why it was terminated), so reaping would be a no-op anyway.
341+
> Note: the live pass re-reads `rec.IsTerminated` from the pre-pass snapshot, so a session terminated _by_ the live pass is not also reaped in the same run. That is fine: its tmux is already gone (that is why it was terminated), so reaping would be a no-op anyway.
338342
339343
- [ ] **Step 4: Run the tests, verify they pass**
340344

@@ -354,11 +358,13 @@ git -c user.email=dev@theharshitsingh.com commit -m "feat(session): reconcile re
354358
## Task 3: Wire `Reconcile` into daemon boot
355359

356360
**Files:**
361+
357362
- Modify: `backend/internal/daemon/lifecycle_wiring.go:64-67` (interface)
358363
- Modify: `backend/internal/daemon/daemon.go:144-149` (boot call)
359364
- Test: `backend/internal/daemon/wiring_test.go`
360365

361366
**Interfaces:**
367+
362368
- Consumes: `Manager.Reconcile` (Task 2).
363369
- Produces: `sessionLifecycle` interface gains `Reconcile(ctx context.Context) error`.
364370

@@ -429,9 +435,11 @@ git -c user.email=dev@theharshitsingh.com commit -m "feat(daemon): run Reconcile
429435
## Task 4: Integration test over the sqlite store
430436

431437
**Files:**
438+
432439
- Modify: `backend/internal/integration/lifecycle_sqlite_test.go`
433440

434441
**Interfaces:**
442+
435443
- Consumes: the real `Manager.Reconcile`, a real sqlite store, and the test's runtime fake (find how this file already fakes the runtime; reuse it, scripting `IsAlive` per handle).
436444

437445
- [ ] **Step 1: Read the existing integration harness**
@@ -498,10 +506,12 @@ git -c user.email=dev@theharshitsingh.com commit -m "test(integration): reconcil
498506
## Task 5: Frontend wedged-orphan kill+replace branch
499507

500508
**Files:**
509+
501510
- Modify: `frontend/src/main.ts` (in `startDaemonInner`, around lines 457-495)
502511
- Test: `frontend/src/main.test.ts` or the existing main-process test file
503512

504513
**Interfaces:**
514+
505515
- Consumes: existing `inspectExistingDaemon`, `resolveDaemonFromPort`, `readDaemonProbe`, `killDaemon`, `parseRunFile`/`defaultRunFilePath`, `expectedDaemonPort`.
506516
- Produces: a pure decision helper, e.g. `function planDaemonTakeover(probe: DaemonProbe | null): "reuse" | "replace"`, unit-testable without spawning.
507517

@@ -553,22 +563,26 @@ export function planDaemonTakeover(probe: DaemonProbe | null): "reuse" | "replac
553563
Then, in `startDaemonInner`, after the existing `inspectExistingDaemon` + `resolveDaemonFromPort` attach attempts fail (i.e. just before `spawn`), add: probe the expected port; if something answers but is unhealthy, SIGTERM the holder via the run-file PID and wait for the port to free before spawning. Concretely, before the `spawn(...)` at line 505:
554564

555565
```ts
556-
// A process may hold the port without being a healthy daemon we can attach to
557-
// (wedged orphan from a crash, or a PID-dead-but-port-held run-file). Spawning
558-
// then would make the Go child collide and exit 1. Detect it and clear it.
559-
const holderProbe = await readDaemonProbe(expectedDaemonPort(process.env));
560-
if (planDaemonTakeover(holderProbe) === "replace" && holderProbe) {
561-
const runFile = parseRunFile(await readRunFileSafe(defaultRunFilePath()));
562-
if (runFile?.pid) {
566+
// A process may hold the port without being a healthy daemon we can attach to
567+
// (wedged orphan from a crash, or a PID-dead-but-port-held run-file). Spawning
568+
// then would make the Go child collide and exit 1. Detect it and clear it.
569+
const holderProbe = await readDaemonProbe(expectedDaemonPort(process.env));
570+
if (planDaemonTakeover(holderProbe) === "replace" && holderProbe) {
571+
const runFile = parseRunFile(await readRunFileSafe(defaultRunFilePath()));
572+
if (runFile?.pid) {
573+
try {
574+
process.kill(-runFile.pid, "SIGTERM");
575+
} catch {
563576
try {
564-
process.kill(-runFile.pid, "SIGTERM");
577+
process.kill(runFile.pid, "SIGTERM");
565578
} catch {
566-
try { process.kill(runFile.pid, "SIGTERM"); } catch { /* already gone */ }
579+
/* already gone */
567580
}
568581
}
569-
await waitForPortFree(expectedDaemonPort(process.env), 8_000);
570-
await rmRunFileSafe(defaultRunFilePath());
571582
}
583+
await waitForPortFree(expectedDaemonPort(process.env), 8_000);
584+
await rmRunFileSafe(defaultRunFilePath());
585+
}
572586
```
573587

574588
> Use the file's existing run-file read/parse helpers (`parseRunFile`, `defaultRunFilePath`). If `readRunFileSafe`/`rmRunFileSafe`/`waitForPortFree` do not exist, add small local helpers: `readRunFileSafe` wraps `fs.readFile` returning `""` on ENOENT; `rmRunFileSafe` wraps `fs.rm` ignoring ENOENT; `waitForPortFree` polls `readDaemonProbe` until it returns null or the timeout elapses. Keep each to a few lines, matching the file's existing async style.

docs/superpowers/plans/2026-06-24-restore-recreate-orchestrator.md

Lines changed: 35 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -22,11 +22,13 @@
2222
### Task 1: Typed error for un-resumable restore (fixes the 500)
2323

2424
**Files:**
25+
2526
- Modify: `backend/internal/session_manager/manager.go` (sentinel near line 25; the "nothing to resume from" return at line 480)
2627
- Modify: `backend/internal/service/session/service.go` (`toAPIError`, near line 450)
2728
- Test: `backend/internal/service/session/service_test.go` (new test for the mapping)
2829

2930
**Interfaces:**
31+
3032
- Produces: `sessionmanager.ErrNotResumable` (a sentinel `error`), and the wire contract `409` with code `SESSION_NOT_RESUMABLE` from `POST /api/v1/sessions/{id}/restore` when a terminated session has neither `agent_session_id` nor `prompt`. Task 2 (frontend) consumes the `SESSION_NOT_RESUMABLE` code.
3133

3234
- [ ] **Step 1: Write the failing test**
@@ -117,12 +119,14 @@ Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>"
117119
### Task 2: Restore-unavailable popup + recreate via existing orchestrator endpoint
118120

119121
**Files:**
122+
120123
- Modify: `frontend/src/renderer/lib/spawn-orchestrator.ts` (optional `clean` param)
121124
- Create: `frontend/src/renderer/components/RestoreUnavailableDialog.tsx` (the popup)
122125
- Modify: `frontend/src/renderer/components/TerminalPane.tsx` (route `SESSION_NOT_RESUMABLE` to the dialog)
123126
- Test: `frontend/src/renderer/lib/spawn-orchestrator.test.ts` (new; clean param)
124127

125128
**Interfaces:**
129+
126130
- Consumes from Task 1: the restore response error envelope `{ code: "SESSION_NOT_RESUMABLE", message, ... }`.
127131
- Consumes existing: `spawnOrchestrator(projectId, clean?)` (extended here), `isOrchestrator(session)` from `frontend/src/renderer/types/workspace.ts`, `apiClient`/`apiErrorMessage` from `lib/api-client`, `workspaceQueryKey` already imported in `TerminalPane.tsx`.
128132
- Produces: `RestoreUnavailableDialog` React component with props `{ open: boolean; session: SessionView; onOpenChange: (open: boolean) => void; onRecreated: (newOrchestratorId: string) => void }`.
@@ -251,9 +255,7 @@ export function RestoreUnavailableDialog({ open, session, onOpenChange, onRecrea
251255
<Dialog.Portal>
252256
<Dialog.Overlay className="fixed inset-0 z-50 bg-black/50" />
253257
<Dialog.Content className="fixed left-1/2 top-1/2 z-50 w-[420px] -translate-x-1/2 -translate-y-1/2 rounded-lg border border-border bg-surface p-5 shadow-lg">
254-
<Dialog.Title className="text-sm font-medium text-foreground">
255-
Session can no longer be restored
256-
</Dialog.Title>
258+
<Dialog.Title className="text-sm font-medium text-foreground">Session can no longer be restored</Dialog.Title>
257259
<Dialog.Description className="mt-2 text-[13px] text-muted-foreground">
258260
{orchestrator
259261
? "This orchestrator has no saved agent session to resume. You can create a new orchestrator on the same branch; its committed work is preserved and the old worktree is cleaned."
@@ -287,45 +289,47 @@ In `frontend/src/renderer/components/TerminalPane.tsx`, add state and a dialog m
287289
Add state near the other `useState` hooks in `AttachedTerminal`:
288290

289291
```tsx
290-
const [restoreUnavailable, setRestoreUnavailable] = useState(false);
292+
const [restoreUnavailable, setRestoreUnavailable] = useState(false);
291293
```
292294

293295
Replace the `catch`/error handling inside `restoreSession` so a `SESSION_NOT_RESUMABLE` code opens the dialog instead of setting the inline error. The `restoreError` returned by `apiClient.POST` is the parsed error envelope, so read its `code`:
294296

295297
```tsx
296-
try {
297-
const { error: restoreError } = await apiClient.POST("/api/v1/sessions/{sessionId}/restore", {
298-
params: { path: { sessionId: session.id } },
299-
});
300-
if (restoreError) {
301-
const code = (restoreError as { code?: string }).code;
302-
if (code === "SESSION_NOT_RESUMABLE") {
303-
setRestoreUnavailable(true);
304-
return;
305-
}
306-
throw new Error(apiErrorMessage(restoreError, "Unable to restore session"));
307-
}
308-
await queryClient.invalidateQueries({ queryKey: workspaceQueryKey });
309-
} catch (err) {
310-
setRestoreError(err instanceof Error ? err.message : "Unable to restore session");
311-
} finally {
312-
setIsRestoring(false);
298+
try {
299+
const { error: restoreError } = await apiClient.POST("/api/v1/sessions/{sessionId}/restore", {
300+
params: { path: { sessionId: session.id } },
301+
});
302+
if (restoreError) {
303+
const code = (restoreError as { code?: string }).code;
304+
if (code === "SESSION_NOT_RESUMABLE") {
305+
setRestoreUnavailable(true);
306+
return;
313307
}
308+
throw new Error(apiErrorMessage(restoreError, "Unable to restore session"));
309+
}
310+
await queryClient.invalidateQueries({ queryKey: workspaceQueryKey });
311+
} catch (err) {
312+
setRestoreError(err instanceof Error ? err.message : "Unable to restore session");
313+
} finally {
314+
setIsRestoring(false);
315+
}
314316
```
315317

316318
Mount the dialog inside the component's returned JSX (e.g. just before the closing tag of the root `div` in `AttachedTerminal`, alongside the other absolutely-positioned children):
317319

318320
```tsx
319-
{session && (
320-
<RestoreUnavailableDialog
321-
open={restoreUnavailable}
322-
session={session}
323-
onOpenChange={setRestoreUnavailable}
324-
onRecreated={async () => {
325-
await queryClient.invalidateQueries({ queryKey: workspaceQueryKey });
326-
}}
327-
/>
328-
)}
321+
{
322+
session && (
323+
<RestoreUnavailableDialog
324+
open={restoreUnavailable}
325+
session={session}
326+
onOpenChange={setRestoreUnavailable}
327+
onRecreated={async () => {
328+
await queryClient.invalidateQueries({ queryKey: workspaceQueryKey });
329+
}}
330+
/>
331+
);
332+
}
329333
```
330334

331335
Add the import at the top of the file:

docs/superpowers/specs/2026-06-24-crash-proof-session-reconcile-design.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ restore logic in as one branch. Iterating `ListAllSessions`:
5858

5959
Reconcile iterates `ListAllSessions` and acts per session:
6060

61-
| DB state | tmux via `IsAlive(handle)` | Action |
61+
| DB state | tmux via `IsAlive(handle)` | Action |
6262
| ----------------------------- | -------------------------- | ------------------------------------------------------------------ |
6363
| `is_terminated=0` | alive | **Adopt** — no-op, leave live. Agent keeps running. |
6464
| `is_terminated=0` | gone | `StashUncommitted` (best-effort) -> `MarkTerminated`. No relaunch. |

0 commit comments

Comments
 (0)