Skip to content

Commit 504c1f3

Browse files
committed
✨ feat(app): add partial-success mode and failure exports to ingest-imports
- Add --continue-on-error for partial-success batch processing - Add --failed-output for exporting failed lines for retry - Add --failed-manifest for structured JSON retry manifest - Extract App.IngestImports for reusable embedded integration - Update CLI and operator documentation
1 parent 7708622 commit 504c1f3

10 files changed

Lines changed: 1756 additions & 101 deletions

File tree

README.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ It stores structured notes and handoffs in SQLite, restores continuity across re
1010
- `serve-http` runs a native MCP HTTP server for remote or private deployment.
1111
- `doctor` reports config, database readiness, migration status, provenance coverage, and MCP tool availability.
1212
- AGENTS template installation is implemented for global and project workflows.
13-
- one-shot watcher/relay batch ingestion is available through `ingest-imports`.
13+
- watcher/relay import ingestion is available as one-shot batches through `ingest-imports` and as a checkpointed long-lived adapter through `follow-imports`.
1414

1515
Normative product docs live in [docs/spec/README.md](docs/spec/README.md).
1616
Go implementation docs now live under [docs/go/README.md](docs/go/README.md), grouped into user, operator, and maintainer directories.
@@ -24,7 +24,7 @@ Use the docs by audience:
2424
- [Operator docs](docs/go/operator/README.md)
2525
Client registration, deployment/readiness, packaging, and troubleshooting.
2626
- [Import ingestion guide](docs/go/operator/import-ingestion.md)
27-
JSONL batch ingestion for watcher and relay artifacts through `ingest-imports`.
27+
JSONL batch and checkpointed follow-mode ingestion for watcher and relay artifacts.
2828
- [Maintainer docs](docs/go/maintainer/README.md)
2929
Source-tree MCP integration, implementation planning, and development tracking.
3030

@@ -76,8 +76,10 @@ They are not MCP tools and are not the normal end-user interaction path.
7676
Prints effective config plus runtime readiness and audit diagnostics.
7777
- `codex-mem doctor --json`
7878
Prints the same diagnostics in machine-readable JSON for automation or CI checks.
79-
- `codex-mem ingest-imports --source watcher_import [--input events.jsonl] [--json]`
80-
Imports newline-delimited watcher or relay note events into durable imported notes plus audit records.
79+
- `codex-mem ingest-imports --source watcher_import [--input events.jsonl] [--json] [--continue-on-error] [--failed-output failed.jsonl] [--failed-manifest failed.json]`
80+
Imports newline-delimited watcher or relay note events into durable imported notes plus audit records, with optional partial-success handling plus retry-oriented failure exports.
81+
- `codex-mem follow-imports --source watcher_import --input events.jsonl [--state-file events.offset.json] [--poll-interval 5s] [--once] [--json]`
82+
Follows a watcher or relay JSONL file incrementally, checkpoints the last consumed offset, and reuses the same imported-note workflow for newly appended complete lines.
8183
- `codex-mem migrate`
8284
Opens the configured SQLite database and applies embedded migrations.
8385
- `codex-mem serve`

docs/go/maintainer/development-tracker.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -313,6 +313,36 @@ Current blockers:
313313
- In progress: none.
314314
- Blockers: none.
315315
- Next step: decide whether `ingest-imports` should remain the main watcher/relay bridge for now or whether a more direct long-lived integration path is worth adding later.
316+
### 2026-03-16 Session Update
317+
318+
- Completed: Added `ingest-imports --continue-on-error` so watcher/relay batches can keep importing valid lines while collecting per-line decode/write failures in the text/JSON report. Default behavior remains fail-fast for compatibility, but partial-success mode now reports `status`, attempted/failed counts, and structured line errors; app coverage verifies partial success and the all-failed path.
319+
- In progress: none.
320+
- Blockers: none.
321+
- Next step: decide whether partial-success mode is enough for watcher/relay operators for now, or whether they also need richer retry/export behavior for failed lines.
322+
### 2026-03-16 Session Update
323+
324+
- Completed: Added `ingest-imports --failed-output <path>` for `--continue-on-error` batches so failed raw JSONL lines can be exported unchanged for later replay. The CLI report now includes the resolved failed-output path plus written count, and app coverage verifies both partial-success export and all-failed export behavior.
325+
- In progress: none.
326+
- Blockers: none.
327+
- Next step: decide whether failed-line export is enough for operators or whether the next slice should add a richer retry manifest with error metadata alongside the raw replay file.
328+
### 2026-03-16 Session Update
329+
330+
- Completed: Added `ingest-imports --failed-manifest <path>` so `--continue-on-error` batches can emit a JSON retry manifest with line numbers, error payloads, raw failed lines, and failed-output line numbers. The main report now surfaces the manifest path/count, and app coverage verifies manifest validation plus partial/all-failed export behavior.
331+
- In progress: none.
332+
- Blockers: none.
333+
- Next step: decide whether the operator path is now sufficient, or whether the next import slice should focus on a more direct watcher/relay integration instead of more CLI/reporting polish.
334+
### 2026-03-16 Session Update
335+
336+
- Completed: Extracted the import batch workflow into a reusable app-level entrypoint `(*App).IngestImports(...)` so future in-process watcher/relay integrations can reuse the same scope resolution, session creation, imported-note materialization, and failure-export behavior without shelling out to the CLI. The CLI command now delegates to that method, and app coverage verifies the embedded path directly.
337+
- In progress: none.
338+
- Blockers: none.
339+
- Next step: decide whether to build an actual in-tree watcher/relay adapter on top of `App.IngestImports`, or stop here and treat the reusable app method as sufficient integration scaffolding for now.
340+
### 2026-03-16 Session Update
341+
342+
- Completed: Added `follow-imports` as a checkpointed long-lived adapter on top of `App.IngestImports(...)`. The new command polls a JSONL file for newly appended complete lines, persists byte-offset state in a sidecar checkpoint file, resets cleanly on truncation, and derives per-batch failed-output / failed-manifest paths so operator retry artifacts are not overwritten.
343+
- In progress: none.
344+
- Blockers: none.
345+
- Next step: decide whether polling-based follow mode is sufficient for watcher/relay integration for now, or whether a later slice should add native filesystem notifications, rotation metadata, or multi-input fan-in.
316346

317347
## Recommended Next Step
318348

docs/go/maintainer/implementation-plan.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -120,6 +120,7 @@ Responsibilities:
120120
- initialize services
121121
- register MCP tools
122122
- start server mode
123+
- expose reusable app-level workflows such as import ingestion for future in-process integrations
123124

124125
### `internal/config`
125126

docs/go/operator/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ Start here:
77
- [Client Examples](./client-examples.md)
88
Real MCP client registration examples for local stdio and remote HTTP.
99
- [Import Ingestion](./import-ingestion.md)
10-
JSONL batch ingestion for watcher or relay artifacts through `ingest-imports`.
10+
JSONL batch ingestion through `ingest-imports` plus checkpointed follow-mode ingestion through `follow-imports`.
1111
- [Release Readiness](./release-readiness.md)
1212
Packaging, readiness, and release checklist.
1313
- [Troubleshooting](./troubleshooting.md)

docs/go/operator/import-ingestion.md

Lines changed: 67 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
## Purpose
44

5-
This document explains how operators can use `codex-mem ingest-imports` to turn watcher or relay batches into durable imported notes plus import audit records.
5+
This document explains how operators can use `codex-mem ingest-imports` for one-shot batches and `codex-mem follow-imports` for long-lived incremental consumption of watcher or relay JSONL feeds.
66

77
Audience:
88

@@ -12,6 +12,7 @@ Audience:
1212
Use this when:
1313

1414
- you need a one-shot batch bridge into the imported-note workflow
15+
- you need a checkpointed long-lived bridge for a growing JSONL file
1516
- your upstream process can emit newline-delimited JSON events
1617

1718
Do not use this for:
@@ -21,6 +22,9 @@ Do not use this for:
2122

2223
## Command Shape
2324

25+
Use `ingest-imports` when you already have a bounded batch to replay.
26+
Use `follow-imports` when another process keeps appending to the same JSONL file and you want `codex-mem` to checkpoint progress between polling passes.
27+
2428
Minimal stdin example:
2529

2630
```powershell
@@ -33,12 +37,43 @@ Read from a file and print JSON:
3337
codex-mem.exe ingest-imports --source relay_import --input .\relay-events.jsonl --json
3438
```
3539

40+
Continue past bad lines and keep successful imports:
41+
42+
```powershell
43+
codex-mem.exe ingest-imports --source watcher_import --input .\events.jsonl --continue-on-error --json
44+
```
45+
46+
Export failed lines for retry after the batch finishes:
47+
48+
```powershell
49+
codex-mem.exe ingest-imports --source watcher_import --input .\events.jsonl --continue-on-error --failed-output .\failed-events.jsonl --json
50+
```
51+
52+
Export a machine-readable retry manifest alongside the raw failed lines:
53+
54+
```powershell
55+
codex-mem.exe ingest-imports --source watcher_import --input .\events.jsonl --continue-on-error --failed-output .\failed-events.jsonl --failed-manifest .\failed-events.json --json
56+
```
57+
58+
Follow a growing JSONL file once and checkpoint the consumed offset:
59+
60+
```powershell
61+
codex-mem.exe follow-imports --source watcher_import --input .\events.jsonl --once --json
62+
```
63+
64+
Run as a long-lived poller with an explicit checkpoint file:
65+
66+
```powershell
67+
codex-mem.exe follow-imports --source relay_import --input .\relay-events.jsonl --state-file .\relay-events.offset.json --poll-interval 10s
68+
```
69+
3670
Useful flags:
3771

3872
- `--source watcher_import|relay_import`
39-
Required. Declares the provenance source for every event in the batch.
73+
Required. Declares the provenance source for every event in the input stream.
4074
- `--input <path>`
41-
Optional. Reads JSONL from a file instead of stdin.
75+
Optional for `ingest-imports`. Reads JSONL from a file instead of stdin.
76+
Required for `follow-imports`.
4277
- `--cwd <path>`
4378
Optional. Resolves scope from a specific workspace root.
4479
- `--branch-name <name>`
@@ -49,6 +84,20 @@ Useful flags:
4984
Optional. Overrides the default ingestion session task summary.
5085
- `--json`
5186
Optional. Prints a structured report instead of line-oriented text output.
87+
- `--continue-on-error`
88+
`ingest-imports` only. Keeps scanning after per-line decode or import failures and returns a partial-success report when at least one event succeeds.
89+
- `--failed-output <path>`
90+
Optional. For `ingest-imports`, requires `--continue-on-error` and writes the original failed input lines to a JSONL file for manual fix-up or replay.
91+
For `follow-imports`, each polling batch derives a range-suffixed file from the provided base path so earlier failures are not overwritten.
92+
- `--failed-manifest <path>`
93+
Optional. For `ingest-imports`, requires `--continue-on-error` and writes a JSON manifest with per-line error metadata and raw failed input.
94+
For `follow-imports`, each polling batch derives a range-suffixed manifest path from the provided base path.
95+
- `--state-file <path>`
96+
`follow-imports` only. Optional. Stores the consumed byte offset checkpoint. Defaults to `<input>.offset.json`.
97+
- `--poll-interval <duration>`
98+
`follow-imports` only. Optional. Controls how often the input file is polled for appended complete lines. Defaults to `5s`.
99+
- `--once`
100+
`follow-imports` only. Optional. Runs one poll/ingest pass and exits instead of staying in the polling loop.
52101

53102
## Event Schema
54103

@@ -103,11 +152,15 @@ Text mode prints a compact summary such as:
103152

104153
```text
105154
ingest imports ok
155+
status=ok
106156
source=watcher_import
107157
input=stdin
108158
session_id=sess_20260316_001
109159
resolved_by=repo_remote
160+
continue_on_error=false
161+
attempted=2
110162
processed=2
163+
failed=0
111164
materialized=1
112165
suppressed=1
113166
note_deduplicated=0
@@ -116,10 +169,20 @@ warnings=1
116169
```
117170

118171
JSON mode returns the same summary plus per-line results, including the created or reused `note_id` and `import_id`.
172+
When a line fails in `--continue-on-error` mode, that result entry includes a structured `error` payload instead.
173+
If `--failed-output` is set, the report also includes the resolved output path and how many failed lines were written there.
174+
If `--failed-manifest` is set, the report also includes the manifest path and how many failures were captured there.
175+
`follow-imports` reports the input path, checkpoint file, consumed offset, pending trailing bytes, truncation detection, and the nested batch report for whatever newly appended complete lines were imported during that poll.
119176

120177
## Operational Notes
121178

122179
- `ingest-imports` starts one fresh session for the whole batch after resolving scope.
180+
- `follow-imports` starts one fresh session per consumed polling batch, not one session for the lifetime of the process.
123181
- Each event uses the same imported-note workflow as `memory_save_imported_note`.
124182
- Existing explicit memory wins over weaker imported duplicates in the same project.
125-
- The current implementation is fail-fast: the first invalid line stops the batch and returns an error.
183+
- The default implementation is fail-fast: the first invalid line stops the batch and returns an error.
184+
- `--continue-on-error` preserves successful lines, reports per-line failures, and still exits with an error if nothing in the batch imports successfully.
185+
- `--failed-output` writes the original failed JSONL lines without wrapping them, so operators can edit that file and replay it through the same command later.
186+
- `--failed-manifest` writes a structured JSON sidecar with line numbers, error codes, error messages, raw failed lines, and failed-output line numbers when available.
187+
- `follow-imports` only consumes complete newline-terminated lines. A partially written trailing line is left in place until a later poll sees its terminating newline.
188+
- If the followed input file is truncated or rotated to a smaller size, `follow-imports` resets its checkpoint to byte offset `0` and continues from the start of the new file contents.

0 commit comments

Comments
 (0)