You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
✨ feat(app): add follow-imports health diagnostics to doctor and readiness check
- Add poll-catchup event tracking when notify mode safety poll consumes bytes
- Track cumulative poll-catchup counts and emit warning after threshold
- Persist follow-imports health snapshot to log directory on each report
- Expose last-known follow health through `doctor` command with stale detection
- Echo follow-imports health fields in `scripts/readiness-check` output
- Add WARN_FOLLOW_IMPORTS_POLL_CATCHUP and WARN_FOLLOW_IMPORTS_HEALTH_STALE codes
Copy file name to clipboardExpand all lines: README.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -73,13 +73,13 @@ These commands are for operators, packagers, and CI.
73
73
They are not MCP tools and are not the normal end-user interaction path.
74
74
75
75
-`codex-mem doctor`
76
-
Prints effective config plus runtime readiness and audit diagnostics.
76
+
Prints effective config plus runtime readiness, audit diagnostics, and the last-known `follow-imports` watch-health snapshot when one has been written, including stale-snapshot detection for continuous follow mode.
77
77
-`codex-mem doctor --json`
78
78
Prints the same diagnostics in machine-readable JSON for automation or CI checks.
Imports newline-delimited watcher or relay note events into durable imported notes plus audit records, with optional partial-success handling plus retry-oriented failure exports.
Follows one or more watcher or relay JSONL files incrementally, prefers filesystem notifications with polling fallback by default, keeps one checkpoint per input, automatically retries watcher recovery in `auto` mode, and reports command-level watch state alongside per-input imported-note results.
82
+
Follows one or more watcher or relay JSONL files incrementally, prefers filesystem notifications with polling fallback by default, keeps one checkpoint per input, automatically retries watcher recovery in `auto` mode, and reports command-level watch state plus poll-catchup/recovery events and warnings alongside per-input imported-note results.
83
83
-`codex-mem migrate`
84
84
Opens the configured SQLite database and applies embedded migrations.
85
85
-`codex-mem serve`
@@ -151,7 +151,7 @@ See [onboarding-flows.md](docs/spec/appendices/onboarding-flows.md) for the full
151
151
- MCP transport/tool availability
152
152
153
153
Use `codex-mem doctor --json` when the output needs to be consumed by scripts.
154
-
The combined readiness gate under `scripts/readiness-check` is for CI and maintainers, not end users.
154
+
The combined readiness gate under `scripts/readiness-check` is for CI and maintainers, not end users. It now echoes the last-known `follow-imports` doctor fields as informational runtime summary lines for automation, without turning stale or degraded follow health into a hard startup/readiness failure by itself.
155
155
156
156
For setup and integration failures, use the Go troubleshooting guide in [troubleshooting.md](docs/go/operator/troubleshooting.md).
Copy file name to clipboardExpand all lines: docs/go/maintainer/development-tracker.md
+30Lines changed: 30 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -379,6 +379,36 @@ Current blockers:
379
379
- In progress: none.
380
380
- Blockers: none.
381
381
- Next step: decide whether the next `follow-imports` slice should focus on explicitly detecting poll-caught dropped events during notify mode, or whether the current fallback-plus-recovery behavior is sufficient for operators.
382
+
### 2026-03-16 Session Update
383
+
384
+
- Completed: Added explicit notify safety-poll catchup observability to `follow-imports`. The runtime loop now distinguishes notify-event-triggered runs from poll-tick runs, and when notify mode remains active but a poll tick consumes appended bytes, the report emits a structured `watch_poll_catchup` event with consumed input and byte counts. This makes it visible when the polling safety net materially contributed to ingestion even though the watcher never fully fell back.
385
+
- In progress: none.
386
+
- Blockers: none.
387
+
- Next step: decide whether the next import slice should escalate `watch_poll_catchup` repeated occurrences into stronger operator warnings/metrics, or whether the current event stream is enough.
388
+
### 2026-03-16 Session Update
389
+
390
+
- Completed: Escalated repeated notify safety-poll catchup into summary-level warnings. `follow-imports` now keeps cumulative `watch_poll_catchups` and `watch_poll_catchup_bytes` counters in runtime state, surfaces them on both single-input and aggregate reports, and emits `WARN_FOLLOW_IMPORTS_POLL_CATCHUP` once the same process has needed poll catchup at least three times. App coverage now verifies the counters, threshold warning, and text output fields.
391
+
- In progress: none.
392
+
- Blockers: none.
393
+
- Next step: decide whether the next import slice should export these watch-health counters through `doctor` or another machine-readable diagnostics surface, or whether keeping them scoped to runtime follow reports is enough.
394
+
### 2026-03-16 Session Update
395
+
396
+
- Completed: Exposed last-known `follow-imports` watch health through `doctor`. Each emitted follow report now refreshes a `follow-imports.health.json` snapshot in the configured log directory, and `doctor` now reports whether that snapshot exists plus its last-known watch mode, fallback counts, poll-catchup counters, and follow-level warnings. App coverage now verifies both the empty-doctor case and the populated follow-health case.
397
+
- In progress: none.
398
+
- Blockers: none.
399
+
- Next step: decide whether the next import slice should age or prune stale follow-health snapshots, or whether simple last-known state is sufficient for operators.
400
+
### 2026-03-16 Session Update
401
+
402
+
- Completed: Added stale-snapshot detection to the `doctor` follow-health view. Follow-health snapshots now persist whether they came from continuous mode and which poll interval they used, and `doctor` marks a snapshot stale when a continuous follow process has not refreshed it for roughly three poll intervals with a 30-second minimum freshness window. Stale snapshots now add `WARN_FOLLOW_IMPORTS_HEALTH_STALE`, and app coverage verifies both fresh and stale follow-health reporting.
403
+
- In progress: none.
404
+
- Blockers: none.
405
+
- Next step: decide whether the next import slice should prune stale health files automatically, or whether operators should keep last-known stale state available until the next follow run overwrites it.
406
+
### 2026-03-16 Session Update
407
+
408
+
- Completed: Surfaced `doctor` follow-health into the broader `scripts/readiness-check` machine-readable summary without creating a second runtime source of truth. The readiness helper now echoes flat `doctor_follow_imports_*` lines for last-known follow status, staleness, watch mode, fallback/catchup counters, and warning codes straight from `doctor --json`; script-level tests cover both populated and missing follow-health cases; and maintainer/operator docs now call out that these lines are informational by default rather than a hard readiness gate.
409
+
- In progress: none.
410
+
- Blockers: none.
411
+
- Next step: decide whether a later slice should add an explicit JSON summary mode for `scripts/readiness-check`, or whether the flat key/value output is enough for current automation consumers.
Copy file name to clipboardExpand all lines: docs/go/maintainer/mcp-integration.md
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -144,6 +144,8 @@ That combined check now covers:
144
144
2. stdio MCP smoke test
145
145
3. HTTP MCP smoke test
146
146
147
+
The summary output from `scripts/readiness-check` also echoes the `doctor.follow_imports` fields as flat `doctor_follow_imports_*` lines so CI or local automation can inspect last-known runtime watch health from the existing sidecar without having to parse the full doctor JSON again. Those fields are informational by default and do not make the readiness helper fail on their own.
148
+
147
149
## Manual Client Checklist
148
150
149
151
If you are wiring a real MCP client, confirm this order:
Copy file name to clipboardExpand all lines: docs/go/operator/import-ingestion.md
+7-3Lines changed: 7 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -188,8 +188,8 @@ JSON mode returns the same summary plus per-line results, including the created
188
188
When a line fails in `--continue-on-error` mode, that result entry includes a structured `error` payload instead.
189
189
If `--failed-output` is set, the report also includes the resolved output path and how many failed lines were written there.
190
190
If `--failed-manifest` is set, the report also includes the manifest path and how many failures were captured there.
191
-
Single-input `follow-imports` reports the input path, checkpoint file, requested watch mode, active watch mode, fallback count, transition count, last fallback reason, any structured watch events since the previous emitted report, consumed offset, pending trailing bytes, whether the checkpoint was reset, the reset reason, truncation detection, and the nested batch report for whatever newly appended complete lines were imported during that poll.
192
-
Multi-input `follow-imports` returns one aggregate report with command-level watch state, per-process watch events, total consumed and pending bytes, and one nested per-input report for each followed file.
191
+
Single-input `follow-imports` reports the input path, checkpoint file, requested watch mode, active watch mode, fallback count, transition count, cumulative poll-catchup count and bytes, any warning summaries, any structured watch events since the previous emitted report, consumed offset, pending trailing bytes, whether the checkpoint was reset, the reset reason, truncation detection, and the nested batch report for whatever newly appended complete lines were imported during that poll.
192
+
Multi-input `follow-imports` returns one aggregate report with command-level watch state, cumulative poll-catchup counters, warning summaries, per-process watch events, total consumed and pending bytes, and one nested per-input report for each followed file.
193
193
194
194
## Operational Notes
195
195
@@ -200,8 +200,12 @@ Multi-input `follow-imports` returns one aggregate report with command-level wat
200
200
- In `auto` mode, if watcher setup fails or a running watcher later closes/errors, `follow-imports` falls back to polling and keeps retrying watcher setup on later poll intervals. When watcher setup succeeds again, the process switches back to notify mode instead of staying degraded forever.
201
201
- In `notify` mode, watcher setup or runtime failures stop the command instead of silently switching to polling.
202
202
- The follow-mode report now exposes both the requested watch mode and the currently active mode, so operators can tell when `auto` has fallen back to polling and how many fallbacks have happened in the current process.
203
-
- Follow-mode reports now also emit structured watch events when the active mode changes, a fallback occurs, or auto mode successfully recovers from polling back to notify. In JSON mode these appear under `watch_events`; in text mode they are flattened as `watch_event_<n>_*` lines.
203
+
- Follow-mode reports now also emit structured watch events when the active mode changes, a fallback occurs, auto mode successfully recovers from polling back to notify, or notify mode's safety poll is the thing that actually catches newly appended bytes. In JSON mode these appear under `watch_events`; in text mode they are flattened as `watch_event_<n>_*` lines.
204
204
- A watch-state transition or fallback now forces one emitted `follow-imports` report even when the ingestion pass itself is otherwise idle, so long-lived operators can observe notify activation and fallback transitions without waiting for the next imported batch.
205
+
- When a notify-mode safety poll catches appended bytes, `follow-imports` records a `watch_poll_catchup` event with consumed input and byte counts. Treat that as evidence that the polling safety net was materially useful, not necessarily proof that the platform dropped an fsnotify event.
206
+
-`follow-imports` also keeps cumulative `watch_poll_catchups` and `watch_poll_catchup_bytes` counters for the lifetime of the process. Once poll catchup happens at least three times in the same process, the report adds a `WARN_FOLLOW_IMPORTS_POLL_CATCHUP` warning so operators and automation can treat notify mode as degraded even if it never fully falls back.
207
+
- Each emitted `follow-imports` report also refreshes a last-known health sidecar under the normal log directory. `codex-mem doctor` reads that snapshot so operators can inspect the most recent follow-mode watch health even after the long-lived process has already exited.
208
+
- For continuous follow mode, `doctor` now marks that sidecar as stale when it has not been refreshed for roughly three poll intervals, with a minimum freshness window of 30 seconds. Stale follow health adds `WARN_FOLLOW_IMPORTS_HEALTH_STALE` so operators can distinguish a healthy last-known state from an old snapshot left behind by a stopped process.
205
209
- When multi-input follow mode shares `--failed-output` or `--failed-manifest` base paths, `codex-mem` derives per-input file names before adding the byte-range suffix so retry artifacts from different inputs do not overwrite each other.
206
210
- Each event uses the same imported-note workflow as `memory_save_imported_note`.
207
211
- Existing explicit memory wins over weaker imported duplicates in the same project.
Copy file name to clipboardExpand all lines: docs/go/operator/release-readiness.md
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -80,6 +80,8 @@ Confirm:
80
80
-`exclusion_audit_ready=true`
81
81
-`mcp_tool_count=11`
82
82
83
+
If your deployment uses `follow-imports`, also inspect the echoed `doctor_follow_imports_*` lines from `go run ./scripts/readiness-check`. Those lines surface the last-known runtime watch-health snapshot from `doctor` for automation, but they remain informational unless your own release gate chooses to fail on stale or degraded follow state.
0 commit comments