You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
✨ feat(imports): enrich audit-only import reporting and fixtures
- add audit-only summary counters and suppression-reason aggregates for ingest and follow workflows
- cover richer operator output with checked-in ingest/follow sample fixtures and shared test helpers
- update README, operator guidance, and the development tracker for the new reporting shape
JSONL batch and checkpointed follow-mode ingestion for watcher and relay artifacts.
27
+
JSONL batch and checkpointed follow-mode ingestion for watcher and relay artifacts, including checked-in `ingest-imports` and `follow-imports` sample outputs for richer audit-only reporting.
28
28
-[Maintainer docs](docs/go/maintainer/README.md)
29
29
Source-tree MCP integration, implementation planning, and development tracking.
30
30
@@ -76,10 +76,10 @@ They are not MCP tools and are not the normal end-user interaction path.
76
76
Prints effective config plus runtime readiness, audit diagnostics, and the last-known `follow-imports` watch-health snapshot when one has been written, including stale-snapshot detection for continuous follow mode.
77
77
-`codex-mem doctor --json`
78
78
Prints the same diagnostics in machine-readable JSON for automation or CI checks.
Imports newline-delimited watcher or relay note events into durable imported notes plus audit records, with optional partial-success handling plus retry-oriented failure exports.
Follows one or more watcher or relay JSONL files incrementally, prefers filesystem notifications with polling fallback by default, keeps one checkpoint per input, automatically retries watcher recovery in `auto` mode, and reports command-level watch state plus poll-catchup/recovery events and warnings alongside per-input imported-note results.
Imports newline-delimited watcher or relay note events into durable imported notes plus audit records, or uses `--audit-only` to store only import-audit provenance while still applying the same privacy and explicit-memory precedence rules. Audit-only reports now distinguish new-note candidates from existing-note links and can aggregate suppression reasons.
Follows one or more watcher or relay JSONL files incrementally, prefers filesystem notifications with polling fallback by default, keeps one checkpoint per input, automatically retries watcher recovery in `auto` mode, and reports command-level watch state plus poll-catchup/recovery events and warnings alongside per-input imported-note results or audit-only import records, including nested audit-only summary counters and suppression-reason counts.
Removes selected follow-imports checkpoint, retry-artifact, and stale-health artifacts. `--target-profile` can enable common cleanup target sets before you add path, age, dry-run, or `--summary-only` report filters.
Copy file name to clipboardExpand all lines: docs/go/operator/import-ingestion.md
+51-2Lines changed: 51 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -25,6 +25,7 @@ Do not use this for:
25
25
Use `ingest-imports` when you already have a bounded batch to replay.
26
26
Use `follow-imports` when another process keeps appending to the same JSONL file and you want `codex-mem` to checkpoint progress between notification or polling passes.
27
27
`follow-imports` can now fan in multiple files by repeating `--input`.
28
+
Add `--audit-only` to either import command when you want import-audit provenance without creating or reusing durable imported notes.
28
29
Use `audit-follow-imports` when you want a read-only hygiene report for pending checkpoint, retry-artifact, or stale-health cleanup work before deciding whether to run deletion.
29
30
30
31
Minimal stdin example:
@@ -39,6 +40,12 @@ Read from a file and print JSON:
Run as a long-lived poller with an explicit checkpoint file:
67
80
68
81
```powershell
@@ -159,6 +172,8 @@ Useful flags:
159
172
Optional. Overrides the default ingestion session task summary.
160
173
-`--json`
161
174
Optional. Prints a structured report instead of line-oriented text output.
175
+
-`--audit-only`
176
+
`ingest-imports` and `follow-imports` only. Optional. Evaluates each event against the same imported-note dedupe, privacy, and explicit-memory precedence rules, but writes only the import-audit record instead of materializing or reusing a durable note. The event schema stays the same so the audit-only path can answer whether the artifact would have been suppressed or linked to an existing note.
162
177
-`--continue-on-error`
163
178
`ingest-imports` only. Keeps scanning after per-line decode or import failures and returns a partial-success report when at least one event succeeds.
164
179
-`--failed-output <path>`
@@ -259,6 +274,13 @@ Behavior to expect from this batch:
259
274
- the first event creates an imported durable note plus an import audit record
260
275
- the second event creates only a suppressed import audit record
261
276
277
+
When `--audit-only` is set for `ingest-imports` or `follow-imports`:
278
+
279
+
- the same event schema is still required because the command evaluates imported-note precedence rather than dropping to raw `memory_save_import`
280
+
- new non-suppressed artifacts create only import-audit records and leave `materialized=0`
281
+
- imported duplicates can still link the created audit record to an existing imported note
282
+
- stronger explicit memory still suppresses the import audit with `suppression_reason=explicit_memory_exists`
283
+
262
284
## Output Semantics
263
285
264
286
Text mode prints a compact summary such as:
@@ -282,6 +304,7 @@ warnings=1
282
304
```
283
305
284
306
JSON mode returns the same summary plus per-line results, including the created or reused `note_id` and `import_id`.
307
+
When `--audit-only` is active, the report also includes `audit_only=true`, `materialized` stays `0`, `would_materialize` counts unsuppressed artifacts that would have created a new imported note, and `linked_existing_note` counts unsuppressed artifacts that only linked an already-imported durable note. Whenever any suppression happens, the JSON report also includes a `suppression_reasons` object keyed by normalized reason (for example `privacy_intent`, `explicit_memory_exists`, or the fallback `import_policy` bucket), and text mode flattens those same counts as `suppression_reason_<reason>=<count>`. Each per-line result can still surface `suppression_reason`, and `note_id` stays omitted for newly audited artifacts that would have taken the new-note path.
285
308
When a line fails in `--continue-on-error` mode, that result entry includes a structured `error` payload instead.
286
309
If `--failed-output` is set, the report also includes the resolved output path and how many failed lines were written there.
287
310
If `--failed-manifest` is set, the report also includes the manifest path and how many failures were captured there.
@@ -291,18 +314,34 @@ Multi-input `follow-imports` returns one aggregate report with command-level wat
291
314
`audit-follow-imports` reports the same target-selection metadata and matched-versus-skipped counts as a read-only hygiene pass, plus whether `--summary-only` was active, whether the follow-health snapshot is present, when it was last updated, whether it is stale, and any warning summaries carried by that snapshot.
292
315
When `--summary-only` is set, the aggregate counts stay the same but the detailed checkpoint and retry-artifact path lists are omitted from both text and JSON output.
293
316
294
-
Checked-in sample outputs for common cleanup flows live under [../../../internal/app/testdata](../../../internal/app/testdata/):
317
+
Checked-in sample outputs for import and follow workflows live under [../../../internal/app/testdata](../../../internal/app/testdata/):
If a deliberate output change makes those fixtures drift, refresh the cleanup fixtures from the repository root through the test-only maintainer helper:
330
+
If a deliberate output change makes those fixtures drift, refresh the ingest fixtures from the repository root through the test-only maintainer helper:
331
+
332
+
```powershell
333
+
$env:CODEX_MEM_REFRESH_INGEST_EXAMPLES = "all"
334
+
go test ./internal/app -run TestRefreshIngestImportsExampleFixtures
@@ -338,6 +385,7 @@ go test ./internal/app -run "Test(Audit|Cleanup)FollowImportsExampleOutputsStayI
338
385
339
386
-`ingest-imports` starts one fresh session for the whole batch after resolving scope.
340
387
-`follow-imports` starts one fresh session per consumed polling batch, not one session for the lifetime of the process.
388
+
-`--audit-only` keeps the same session, checkpoint, retry-export, and follow-health behavior as the materializing path so operators can switch between audit-only and imported-note materialization without learning a second ingestion flow.
341
389
- When `follow-imports` fans in multiple files, each input keeps its own checkpoint sidecar and each consumed input still starts its own ingestion session for that batch.
342
390
- In `auto` mode, `follow-imports` prefers filesystem notifications for lower latency and keeps the poll timer as a safety net in case a platform drops an event.
343
391
- In `auto` mode, if watcher setup fails or a running watcher later closes/errors, `follow-imports` falls back to polling and keeps retrying watcher setup on later poll intervals. When watcher setup succeeds again, the process switches back to notify mode instead of staying degraded forever.
@@ -365,6 +413,7 @@ go test ./internal/app -run "Test(Audit|Cleanup)FollowImportsExampleOutputsStayI
365
413
- When multi-input follow mode shares failed-output or failed-manifest bases, pass the same `--input` set to `cleanup-follow-imports` so it derives the same per-input filenames before scanning for range-suffixed artifacts.
366
414
- When multi-input follow mode shares `--failed-output` or `--failed-manifest` base paths, `codex-mem` derives per-input file names before adding the byte-range suffix so retry artifacts from different inputs do not overwrite each other.
367
415
- Each event uses the same imported-note workflow as `memory_save_imported_note`.
416
+
-`--audit-only` intentionally still uses that imported-note workflow instead of the lower-level `memory_save_import` contract, because operators usually want privacy suppression, explicit-memory precedence, and imported-note dedupe to stay aligned between audit-only and materializing runs.
368
417
- Existing explicit memory wins over weaker imported duplicates in the same project.
369
418
- The default implementation is fail-fast: the first invalid line stops the batch and returns an error.
370
419
-`--continue-on-error` preserves successful lines, reports per-line failures, and still exits with an error if nothing in the batch imports successfully.
0 commit comments