Skip to content

Commit 4047338

Browse files
committed
✨ feat(app): add follow-imports health diagnostics to doctor and readiness check
- Add poll-catchup event tracking when notify mode safety poll consumes bytes - Track cumulative poll-catchup counts and emit warning after threshold - Persist follow-imports health snapshot to log directory on each report - Expose last-known follow health through `doctor` command with stale detection - Echo follow-imports health fields in `scripts/readiness-check` output - Add WARN_FOLLOW_IMPORTS_POLL_CATCHUP and WARN_FOLLOW_IMPORTS_HEALTH_STALE codes
1 parent 9e8f5a5 commit 4047338

13 files changed

Lines changed: 836 additions & 37 deletions

File tree

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -73,13 +73,13 @@ These commands are for operators, packagers, and CI.
7373
They are not MCP tools and are not the normal end-user interaction path.
7474

7575
- `codex-mem doctor`
76-
Prints effective config plus runtime readiness and audit diagnostics.
76+
Prints effective config plus runtime readiness, audit diagnostics, and the last-known `follow-imports` watch-health snapshot when one has been written, including stale-snapshot detection for continuous follow mode.
7777
- `codex-mem doctor --json`
7878
Prints the same diagnostics in machine-readable JSON for automation or CI checks.
7979
- `codex-mem ingest-imports --source watcher_import [--input events.jsonl] [--json] [--continue-on-error] [--failed-output failed.jsonl] [--failed-manifest failed.json]`
8080
Imports newline-delimited watcher or relay note events into durable imported notes plus audit records, with optional partial-success handling plus retry-oriented failure exports.
8181
- `codex-mem follow-imports --source watcher_import --input events-a.jsonl [--input events-b.jsonl ...] [--state-file events-a.offset.json --state-file events-b.offset.json ...] [--watch-mode auto|notify|poll] [--poll-interval 5s] [--once] [--json]`
82-
Follows one or more watcher or relay JSONL files incrementally, prefers filesystem notifications with polling fallback by default, keeps one checkpoint per input, automatically retries watcher recovery in `auto` mode, and reports command-level watch state alongside per-input imported-note results.
82+
Follows one or more watcher or relay JSONL files incrementally, prefers filesystem notifications with polling fallback by default, keeps one checkpoint per input, automatically retries watcher recovery in `auto` mode, and reports command-level watch state plus poll-catchup/recovery events and warnings alongside per-input imported-note results.
8383
- `codex-mem migrate`
8484
Opens the configured SQLite database and applies embedded migrations.
8585
- `codex-mem serve`
@@ -151,7 +151,7 @@ See [onboarding-flows.md](docs/spec/appendices/onboarding-flows.md) for the full
151151
- MCP transport/tool availability
152152

153153
Use `codex-mem doctor --json` when the output needs to be consumed by scripts.
154-
The combined readiness gate under `scripts/readiness-check` is for CI and maintainers, not end users.
154+
The combined readiness gate under `scripts/readiness-check` is for CI and maintainers, not end users. It now echoes the last-known `follow-imports` doctor fields as informational runtime summary lines for automation, without turning stale or degraded follow health into a hard startup/readiness failure by itself.
155155

156156
For setup and integration failures, use the Go troubleshooting guide in [troubleshooting.md](docs/go/operator/troubleshooting.md).
157157

docs/go/maintainer/development-tracker.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -379,6 +379,36 @@ Current blockers:
379379
- In progress: none.
380380
- Blockers: none.
381381
- Next step: decide whether the next `follow-imports` slice should focus on explicitly detecting poll-caught dropped events during notify mode, or whether the current fallback-plus-recovery behavior is sufficient for operators.
382+
### 2026-03-16 Session Update
383+
384+
- Completed: Added explicit notify safety-poll catchup observability to `follow-imports`. The runtime loop now distinguishes notify-event-triggered runs from poll-tick runs, and when notify mode remains active but a poll tick consumes appended bytes, the report emits a structured `watch_poll_catchup` event with consumed input and byte counts. This makes it visible when the polling safety net materially contributed to ingestion even though the watcher never fully fell back.
385+
- In progress: none.
386+
- Blockers: none.
387+
- Next step: decide whether the next import slice should escalate `watch_poll_catchup` repeated occurrences into stronger operator warnings/metrics, or whether the current event stream is enough.
388+
### 2026-03-16 Session Update
389+
390+
- Completed: Escalated repeated notify safety-poll catchup into summary-level warnings. `follow-imports` now keeps cumulative `watch_poll_catchups` and `watch_poll_catchup_bytes` counters in runtime state, surfaces them on both single-input and aggregate reports, and emits `WARN_FOLLOW_IMPORTS_POLL_CATCHUP` once the same process has needed poll catchup at least three times. App coverage now verifies the counters, threshold warning, and text output fields.
391+
- In progress: none.
392+
- Blockers: none.
393+
- Next step: decide whether the next import slice should export these watch-health counters through `doctor` or another machine-readable diagnostics surface, or whether keeping them scoped to runtime follow reports is enough.
394+
### 2026-03-16 Session Update
395+
396+
- Completed: Exposed last-known `follow-imports` watch health through `doctor`. Each emitted follow report now refreshes a `follow-imports.health.json` snapshot in the configured log directory, and `doctor` now reports whether that snapshot exists plus its last-known watch mode, fallback counts, poll-catchup counters, and follow-level warnings. App coverage now verifies both the empty-doctor case and the populated follow-health case.
397+
- In progress: none.
398+
- Blockers: none.
399+
- Next step: decide whether the next import slice should age or prune stale follow-health snapshots, or whether simple last-known state is sufficient for operators.
400+
### 2026-03-16 Session Update
401+
402+
- Completed: Added stale-snapshot detection to the `doctor` follow-health view. Follow-health snapshots now persist whether they came from continuous mode and which poll interval they used, and `doctor` marks a snapshot stale when a continuous follow process has not refreshed it for roughly three poll intervals with a 30-second minimum freshness window. Stale snapshots now add `WARN_FOLLOW_IMPORTS_HEALTH_STALE`, and app coverage verifies both fresh and stale follow-health reporting.
403+
- In progress: none.
404+
- Blockers: none.
405+
- Next step: decide whether the next import slice should prune stale health files automatically, or whether operators should keep last-known stale state available until the next follow run overwrites it.
406+
### 2026-03-16 Session Update
407+
408+
- Completed: Surfaced `doctor` follow-health into the broader `scripts/readiness-check` machine-readable summary without creating a second runtime source of truth. The readiness helper now echoes flat `doctor_follow_imports_*` lines for last-known follow status, staleness, watch mode, fallback/catchup counters, and warning codes straight from `doctor --json`; script-level tests cover both populated and missing follow-health cases; and maintainer/operator docs now call out that these lines are informational by default rather than a hard readiness gate.
409+
- In progress: none.
410+
- Blockers: none.
411+
- Next step: decide whether a later slice should add an explicit JSON summary mode for `scripts/readiness-check`, or whether the flat key/value output is enough for current automation consumers.
382412

383413
## Recommended Next Step
384414

docs/go/maintainer/mcp-integration.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -144,6 +144,8 @@ That combined check now covers:
144144
2. stdio MCP smoke test
145145
3. HTTP MCP smoke test
146146

147+
The summary output from `scripts/readiness-check` also echoes the `doctor.follow_imports` fields as flat `doctor_follow_imports_*` lines so CI or local automation can inspect last-known runtime watch health from the existing sidecar without having to parse the full doctor JSON again. Those fields are informational by default and do not make the readiness helper fail on their own.
148+
147149
## Manual Client Checklist
148150

149151
If you are wiring a real MCP client, confirm this order:

docs/go/operator/import-ingestion.md

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -188,8 +188,8 @@ JSON mode returns the same summary plus per-line results, including the created
188188
When a line fails in `--continue-on-error` mode, that result entry includes a structured `error` payload instead.
189189
If `--failed-output` is set, the report also includes the resolved output path and how many failed lines were written there.
190190
If `--failed-manifest` is set, the report also includes the manifest path and how many failures were captured there.
191-
Single-input `follow-imports` reports the input path, checkpoint file, requested watch mode, active watch mode, fallback count, transition count, last fallback reason, any structured watch events since the previous emitted report, consumed offset, pending trailing bytes, whether the checkpoint was reset, the reset reason, truncation detection, and the nested batch report for whatever newly appended complete lines were imported during that poll.
192-
Multi-input `follow-imports` returns one aggregate report with command-level watch state, per-process watch events, total consumed and pending bytes, and one nested per-input report for each followed file.
191+
Single-input `follow-imports` reports the input path, checkpoint file, requested watch mode, active watch mode, fallback count, transition count, cumulative poll-catchup count and bytes, any warning summaries, any structured watch events since the previous emitted report, consumed offset, pending trailing bytes, whether the checkpoint was reset, the reset reason, truncation detection, and the nested batch report for whatever newly appended complete lines were imported during that poll.
192+
Multi-input `follow-imports` returns one aggregate report with command-level watch state, cumulative poll-catchup counters, warning summaries, per-process watch events, total consumed and pending bytes, and one nested per-input report for each followed file.
193193

194194
## Operational Notes
195195

@@ -200,8 +200,12 @@ Multi-input `follow-imports` returns one aggregate report with command-level wat
200200
- In `auto` mode, if watcher setup fails or a running watcher later closes/errors, `follow-imports` falls back to polling and keeps retrying watcher setup on later poll intervals. When watcher setup succeeds again, the process switches back to notify mode instead of staying degraded forever.
201201
- In `notify` mode, watcher setup or runtime failures stop the command instead of silently switching to polling.
202202
- The follow-mode report now exposes both the requested watch mode and the currently active mode, so operators can tell when `auto` has fallen back to polling and how many fallbacks have happened in the current process.
203-
- Follow-mode reports now also emit structured watch events when the active mode changes, a fallback occurs, or auto mode successfully recovers from polling back to notify. In JSON mode these appear under `watch_events`; in text mode they are flattened as `watch_event_<n>_*` lines.
203+
- Follow-mode reports now also emit structured watch events when the active mode changes, a fallback occurs, auto mode successfully recovers from polling back to notify, or notify mode's safety poll is the thing that actually catches newly appended bytes. In JSON mode these appear under `watch_events`; in text mode they are flattened as `watch_event_<n>_*` lines.
204204
- A watch-state transition or fallback now forces one emitted `follow-imports` report even when the ingestion pass itself is otherwise idle, so long-lived operators can observe notify activation and fallback transitions without waiting for the next imported batch.
205+
- When a notify-mode safety poll catches appended bytes, `follow-imports` records a `watch_poll_catchup` event with consumed input and byte counts. Treat that as evidence that the polling safety net was materially useful, not necessarily proof that the platform dropped an fsnotify event.
206+
- `follow-imports` also keeps cumulative `watch_poll_catchups` and `watch_poll_catchup_bytes` counters for the lifetime of the process. Once poll catchup happens at least three times in the same process, the report adds a `WARN_FOLLOW_IMPORTS_POLL_CATCHUP` warning so operators and automation can treat notify mode as degraded even if it never fully falls back.
207+
- Each emitted `follow-imports` report also refreshes a last-known health sidecar under the normal log directory. `codex-mem doctor` reads that snapshot so operators can inspect the most recent follow-mode watch health even after the long-lived process has already exited.
208+
- For continuous follow mode, `doctor` now marks that sidecar as stale when it has not been refreshed for roughly three poll intervals, with a minimum freshness window of 30 seconds. Stale follow health adds `WARN_FOLLOW_IMPORTS_HEALTH_STALE` so operators can distinguish a healthy last-known state from an old snapshot left behind by a stopped process.
205209
- When multi-input follow mode shares `--failed-output` or `--failed-manifest` base paths, `codex-mem` derives per-input file names before adding the byte-range suffix so retry artifacts from different inputs do not overwrite each other.
206210
- Each event uses the same imported-note workflow as `memory_save_imported_note`.
207211
- Existing explicit memory wins over weaker imported duplicates in the same project.

docs/go/operator/release-readiness.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,8 @@ Confirm:
8080
- `exclusion_audit_ready=true`
8181
- `mcp_tool_count=11`
8282

83+
If your deployment uses `follow-imports`, also inspect the echoed `doctor_follow_imports_*` lines from `go run ./scripts/readiness-check`. Those lines surface the last-known runtime watch-health snapshot from `doctor` for automation, but they remain informational unless your own release gate chooses to fail on stale or degraded follow state.
84+
8385
### 2. Test Suite
8486

8587
Run:

0 commit comments

Comments
 (0)