Skip to content

Commit 7ea5a1b

Browse files
ZaynJarvisclaude
andcommitted
refactor(plugin/codex): active-window heuristic + idle-TTL sweep at SessionStart (v0.4.0)
Source of truth: examples/codex-memory-plugin/DESIGN.md (added in this commit). Behavioral changes: - SessionStart matcher widens from `clear` to `clear|startup`. Both sources run the same active-window heuristic; `resume` is a hard no-op (still fires on short reconnects). - Heuristic (DESIGN.md §3): count state files (excluding new session_id) within ACTIVE_WINDOW_MS (default 2 min). 0 → noop, 1 → commit it (just-ended session), ≥2 → skip and rely on idle TTL. Tunable via OPENVIKING_CODEX_ACTIVE_WINDOW_MS. - Idle-TTL sweep returns at the tail of session-start-commit.mjs only (not every Stop). Default IDLE_TTL_MS = 30 min via OPENVIKING_CODEX_IDLE_TTL_MS. Catches SIGTERM/Ctrl+C/`/exit` orphans and the ≥2-active skip path. - Stop hook deliberately does NOT sweep — state-write-on-every-turn already gives us the freshness signal. Marker comment added. - Stop hook adds post-compact transcript-shrink defense: if allTurns.length < state.capturedTurnCount, reset capturedTurnCount = 0. - Commit-on-failure preserves state everywhere (PreCompact, heuristic, idle sweep). A non-2xx /commit no longer clears ovSessionId; the next sweep retries. - session-state.mjs saveState now uses atomic write (tmpfile + rename) for crash safety. listStates ignores the brief `<id>.json.tmp` window. Bump: package.json + .codex-plugin/plugin.json → 0.4.0. Docs: README "How It Works" gained a DESIGN.md pointer and rewrites the SessionStart section to reflect heuristic + idle TTL. VERIFICATION.md step 6 now exercises all four heuristic branches (0/1/≥2 active, idle TTL, resume). Phase-2 resume context inject documented in DESIGN.md but explicitly out of scope here. Verified locally with synthetic stdin tests against a fake OV server: 1-active commit, ≥2-active skip, idle TTL sweep, resume noop, unreachable-server keeps state. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
1 parent 328b89c commit 7ea5a1b

10 files changed

Lines changed: 593 additions & 76 deletions

File tree

examples/codex-memory-plugin/.codex-plugin/plugin.json

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
{
22
"name": "openviking-memory",
3-
"version": "0.3.1",
4-
"description": "Long-term semantic memory for Codex, powered by OpenViking. Recall on UserPromptSubmit, incremental add_message on Stop (per turn), commit on PreCompact, and commit on SessionStart with source=clear (when /clear orphans the prior session).",
3+
"version": "0.4.0",
4+
"description": "Long-term semantic memory for Codex, powered by OpenViking. Recall on UserPromptSubmit, incremental add_message on Stop (per turn), commit on PreCompact, and active-window heuristic + idle-TTL sweep on SessionStart (source=startup|clear).",
55
"author": {
66
"name": "OpenViking",
77
"url": "https://github.com/volcengine/OpenViking"
Lines changed: 275 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,275 @@
1+
# Codex memory plugin — commit decision design
2+
3+
This document records *why* the plugin commits when it commits. The commit
4+
shape (which OpenViking session is sealed by which hook event) is the part
5+
worth understanding before reading code: the codex hook surface gives us
6+
**no clean SessionEnd signal**, so we have to reason about which observable
7+
events imply "context for a particular codex `session_id` is gone".
8+
9+
## Vocabulary
10+
11+
- **codex `session_id`** — the codex thread/session id. Stable across
12+
process restarts when zouk-daemon resumes the same thread; replaced when
13+
`/clear`, `/new`, fresh codex startup, or zouk reset occurs.
14+
- **OV session**`viking://session/<uuid>`. We open one per codex
15+
`session_id`, append messages on every `Stop`, and commit it (which
16+
triggers OV's memory extractor) at session-end-equivalent moments.
17+
- **State file**`~/.openviking/codex-plugin-state/<safe-codex-session-id>.json`,
18+
shape `{ codexSessionId, ovSessionId, capturedTurnCount, createdAt, lastUpdatedAt }`.
19+
- **Active window** — state files whose `lastUpdatedAt` is within
20+
`ACTIVE_WINDOW_MS` (default 2 min) of "now". Used to detect "the codex
21+
session that just ended".
22+
23+
## Codex hook surface (what we observe)
24+
25+
| Codex event | Fires when | What we learn |
26+
|---|---|---|
27+
| `SessionStart` source=`startup` | fresh codex process; `/new`; zouk daemon spawn-without-sessionId; zouk reset | new `session_id` was created |
28+
| `SessionStart` source=`resume` | `/resume`; short reconnect; zouk daemon spawn-with-sessionId | same `session_id` continues |
29+
| `SessionStart` source=`clear` | `/clear` (creates a fresh thread, preserves prior thread on disk as resumable) | new `session_id`; previous one orphaned |
30+
| `UserPromptSubmit` | every user turn before model | recall context inject |
31+
| `Stop` | end of every model turn (NOT end of session) | append turns to OV session |
32+
| `PreCompact` | `/compact` or auto-compact | context is about to be summarized |
33+
| `PostCompact` | after compaction | (unused) |
34+
| SIGTERM / SIGINT / Ctrl+C / `/exit` | process killed | **no hook fires** — confirmed in `codex-rs/hooks/src/events/` |
35+
36+
Verified against codex-rs `main` 2026-05-10. Upstream issues #17421, #20374
37+
have requested a `SessionEnd` hook; OpenAI rejected with two reasons:
38+
"threads can always be resumed" and "/exit only makes sense in TUI". Not
39+
landing.
40+
41+
## Commit triggers
42+
43+
We commit an OV session in exactly these places. Everything else is no-op
44+
or append-only.
45+
46+
### 1. `PreCompact` — deterministic, current session
47+
48+
Codex fires `PreCompact` before summarizing. We catch up with any
49+
unappended turns from the transcript, commit the OV session for this codex
50+
`session_id`, and clear `ovSessionId` so the next `Stop` opens a fresh OV
51+
session for the post-compact half. `capturedTurnCount` is preserved unless
52+
the transcript was truncated by compaction (see "Post-compact transcript
53+
shrink" below).
54+
55+
### 2. `SessionStart` source=`clear` — heuristic, same shape as `startup`
56+
57+
`/clear` creates a brand-new codex `session_id` and orphans the previous
58+
in-memory thread (preserved on disk). Naively committing "every state file
59+
whose `codexSessionId` ≠ new id" would falsely commit concurrent codex
60+
processes' still-active sessions on the same machine.
61+
62+
Instead, we treat `clear` and `startup` identically: both run the
63+
**active-window heuristic** below. `/clear` only invalidates the current
64+
codex process's *previous* session; the heuristic correctly catches that
65+
session (a single recently-touched orphan) without trampling unrelated
66+
parallel codex processes.
67+
68+
### 3. `SessionStart` source=`startup` — heuristic, active-window
69+
70+
Triggered by `/new`, fresh codex CLI startup, and zouk daemon
71+
spawn-without-sessionId (including zouk's "reset codex" UI action).
72+
73+
The hook script gates internally on `source ∈ {startup, clear}`. On a
74+
match, it iterates state files (excluding the new `session_id` itself) and
75+
counts how many were touched within `ACTIVE_WINDOW_MS`:
76+
77+
```
78+
recently-active count ⇒ action
79+
─────────────────────────────────
80+
0 ⇒ no-op (no orphan to commit)
81+
1 ⇒ commit it (the just-ended session)
82+
≥2 ⇒ skip; rely on idle TTL
83+
```
84+
85+
The single-recent case captures the common path: user runs codex, hits
86+
`/new` or `/clear` after a turn or two; the previous session's `Stop` just
87+
fired and bumped `lastUpdatedAt`; we commit it. The multi-recent case
88+
implies concurrent codex sessions are active; we can't tell which one (if
89+
any) ended, so we defer to idle TTL to clean up genuinely-dead ones.
90+
91+
### 4. `SessionStart` source=`resume` — never commits
92+
93+
Short reconnects and `/resume` re-fire `SessionStart` for the same
94+
`session_id`. Committing here would seal a still-active session. So
95+
`resume` is a no-op for commit purposes.
96+
97+
### 5. Idle TTL sweep — fallback
98+
99+
State files whose `lastUpdatedAt` is older than `IDLE_TTL_MS` (default 30
100+
min) get committed and cleared. Mental model: a session not touched for
101+
30 min is "temporarily concluded"; if the user resumes later, they get a
102+
fresh OV session for the new turns (memory will be split, but each chunk
103+
gets extracted).
104+
105+
This covers:
106+
- SIGTERM / Ctrl+C / `/exit` (no hook fires; state file rots)
107+
- Crashes
108+
- Mid-turn zouk reset where `Stop` got cancelled before bumping
109+
`lastUpdatedAt`
110+
- The `≥2 recently-active` skip from rule 3
111+
112+
**Sweep trigger**: at the tail of `session-start-commit.mjs` only. We do
113+
not sweep on every `Stop` because state-write-on-every-turn already gives
114+
us the freshness signal we need; running the sweep once per session start
115+
is the right cadence. The Stop hook contains a comment marking the option
116+
to add sweep there if codex's session creation rate is low enough that
117+
arbitrarily-orphaned state files accumulate.
118+
119+
**Known limitation**: if the user never starts another codex on this
120+
machine, no sweep ever runs and the OV session stays open server-side
121+
forever. Accepted. Future work could add an MCP tool
122+
`openviking_commit_pending` so the model can commit explicitly.
123+
124+
## Stop hook — append only, no commit
125+
126+
Every `Stop` reads `transcript_path`, slices to `[capturedTurnCount, end)`,
127+
and appends each new user/assistant turn to the OV session for this codex
128+
`session_id` (creating one on first append). State is updated:
129+
`{ovSessionId, capturedTurnCount, lastUpdatedAt: now}`. Never commits.
130+
131+
## Edge cases handled
132+
133+
### Post-compact transcript shrink
134+
135+
Codex's `/compact` may rewrite or truncate `transcript_path`. After
136+
compaction, if `allTurns.length < state.capturedTurnCount`, our slice
137+
math underflows and we silently drop new turns. Defensive fix: when this
138+
inequality is detected on `Stop`, reset `capturedTurnCount = 0` so the
139+
next slice captures everything in the new transcript.
140+
141+
### Commit failure
142+
143+
When OV `/commit` returns non-2xx or times out, we currently log and treat
144+
the result as null. We must NOT call `clearState` on failure — keep the
145+
state file so the next sweep / SessionStart can retry. A transient OV
146+
outage shouldn't lose a session's worth of memory.
147+
148+
### Race: SIGTERM before Stop completes
149+
150+
Codex's tokio runtime cancels in-flight async tasks on SIGTERM, so the last
151+
turn's `Stop` hook may be aborted before it bumps `lastUpdatedAt`. This
152+
makes the state look older than it actually is. Consequence: that session
153+
may fall outside the 2 min active window when the user respawns codex and
154+
we can't commit it deterministically — idle TTL will catch it later.
155+
156+
### Commit-then-resume
157+
158+
After PreCompact (or idle sweep, or rule-3 commit) we set `ovSessionId =
159+
null` but keep `capturedTurnCount`. The next `Stop` for the same codex
160+
`session_id` opens a fresh OV session and starts appending from
161+
`capturedTurnCount`. Memory ends up split across two OV sessions; each
162+
gets extracted independently. Acceptable.
163+
164+
## State file schema
165+
166+
```json
167+
{
168+
"codexSessionId": "0193af...", // codex thread id
169+
"ovSessionId": "uuid-or-null", // null means "committed, awaiting next Stop"
170+
"capturedTurnCount": 7, // turns from transcript already appended
171+
"createdAt": 1715000000000,
172+
"lastUpdatedAt": 1715000300000
173+
}
174+
```
175+
176+
State files are atomic-write (tmpfile + rename) to survive crash mid-write.
177+
178+
## Configuration
179+
180+
Env var overrides for tuning without rebuilding:
181+
182+
| Var | Default | Purpose |
183+
|---|---|---|
184+
| `OPENVIKING_CODEX_STATE_DIR` | `~/.openviking/codex-plugin-state` | state file dir |
185+
| `OPENVIKING_CODEX_ACTIVE_WINDOW_MS` | `120000` (2 min) | rule-3 active window |
186+
| `OPENVIKING_CODEX_IDLE_TTL_MS` | `1800000` (30 min) | idle sweep TTL |
187+
| `OPENVIKING_DEBUG` | `0` | enable hook debug log |
188+
189+
## Phase 2: resume context inject (not yet implemented)
190+
191+
When `SessionStart` source=`resume` fires for a codex `session_id` whose
192+
state shows `ovSessionId = null` (already committed via idle TTL or
193+
PreCompact), we have no live OV session to resume into. The model loses
194+
continuity unless the most recent committed memories are surfaced.
195+
196+
Proposed flow:
197+
1. Load state for the resumed `session_id`. If `ovSessionId` is non-null,
198+
no action — the session is still appendable.
199+
2. Otherwise list `viking://session/<codex-session-id>/history/archive_*/`
200+
on the OV server, take the most recent.
201+
3. Read its abstract (L0) / overview (L1).
202+
4. Emit via `hookSpecificOutput.additionalContext` so codex injects the
203+
summary into the resumed turn.
204+
205+
Deferred because (a) it requires a new OV API call shape, (b) the failure
206+
mode is acceptable in v0.3 (model just lacks continuity for one turn,
207+
recovers via auto-recall), and (c) the core commit logic above must be
208+
proven first.
209+
210+
## What changed vs v0.3.1
211+
212+
- `SessionStart` matcher widened from `"clear"` to `"clear|startup"` so the
213+
active-window heuristic runs on both /clear and /new (and zouk reset).
214+
- `session-start-commit.mjs` switches commit logic from "all non-current"
215+
to active-window heuristic.
216+
- Idle TTL sweep brought back, but only at the tail of
217+
`session-start-commit.mjs` (not every `Stop`). Default TTL 30 min.
218+
- `auto-capture.mjs` Stop hook guards against post-compact transcript
219+
shrink (resets `capturedTurnCount` to 0 if `allTurns.length` < cached).
220+
- All commit failure paths preserve state instead of clearing.
221+
- All state writes go through tmpfile + rename for crash safety.
222+
223+
## Open questions / future work
224+
225+
- **Phase 2 resume context inject** (above).
226+
- **MCP tool `openviking_commit_pending`**: explicit commit for the model
227+
to call, useful when user knows they're about to exit.
228+
- **Subagent hook events**: kimicode has them, codex doesn't yet.
229+
When codex adds them, we should hook to keep subagent memory threads
230+
separate from main session.
231+
- **Upstream `SessionEnd`**: rejected by OpenAI. If they reverse, idle
232+
TTL becomes redundant — replace with deterministic SessionEnd commit.
233+
234+
## Verified hook payload reference
235+
236+
```json
237+
// SessionStart input (from codex-rs/hooks/schema/generated/session-start.command.input.schema.json)
238+
{
239+
"session_id": "0193af...",
240+
"source": "startup" | "resume" | "clear",
241+
"cwd": "/path/to/cwd",
242+
"model": "gpt-5.5",
243+
"permission_mode": "default" | "acceptEdits" | "plan" | "dontAsk" | "bypassPermissions",
244+
"transcript_path": "/path/to/rollout.jsonl" | null,
245+
"hook_event_name": "SessionStart"
246+
}
247+
248+
// Stop input
249+
{
250+
"session_id": "0193af...",
251+
"turn_id": "turn-N",
252+
"transcript_path": "/path/to/rollout.jsonl",
253+
"last_assistant_message": "...",
254+
"stop_hook_active": false,
255+
"model": "gpt-5.5",
256+
"permission_mode": "default",
257+
"cwd": "/path/to/cwd",
258+
"hook_event_name": "Stop"
259+
}
260+
261+
// PreCompact input
262+
{
263+
"session_id": "0193af...",
264+
"transcript_path": "/path/to/rollout.jsonl",
265+
"trigger": "manual" | "auto",
266+
"cwd": "/path/to/cwd",
267+
"model": "gpt-5.5",
268+
"hook_event_name": "PreCompact"
269+
}
270+
```
271+
272+
Output schema for SessionStart / UserPromptSubmit supports
273+
`hookSpecificOutput.additionalContext`. Stop / PreCompact only support
274+
`{ continue, stopReason, suppressOutput, systemMessage }``{}` is a
275+
valid no-op.

0 commit comments

Comments
 (0)