Skip to content

Commit a25dd47

Browse files
committed
✨ feat(imports): enrich audit-only import reporting and fixtures
- add audit-only summary counters and suppression-reason aggregates for ingest and follow workflows - cover richer operator output with checked-in ingest/follow sample fixtures and shared test helpers - update README, operator guidance, and the development tracker for the new reporting shape
1 parent b8daabd commit a25dd47

16 files changed

Lines changed: 1420 additions & 205 deletions

README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ Use the docs by audience:
2424
- [Operator docs](docs/go/operator/README.md)
2525
Client registration, deployment/readiness, packaging, and troubleshooting.
2626
- [Import ingestion guide](docs/go/operator/import-ingestion.md)
27-
JSONL batch and checkpointed follow-mode ingestion for watcher and relay artifacts.
27+
JSONL batch and checkpointed follow-mode ingestion for watcher and relay artifacts, including checked-in `ingest-imports` and `follow-imports` sample outputs for richer audit-only reporting.
2828
- [Maintainer docs](docs/go/maintainer/README.md)
2929
Source-tree MCP integration, implementation planning, and development tracking.
3030

@@ -76,10 +76,10 @@ They are not MCP tools and are not the normal end-user interaction path.
7676
Prints effective config plus runtime readiness, audit diagnostics, and the last-known `follow-imports` watch-health snapshot when one has been written, including stale-snapshot detection for continuous follow mode.
7777
- `codex-mem doctor --json`
7878
Prints the same diagnostics in machine-readable JSON for automation or CI checks.
79-
- `codex-mem ingest-imports --source watcher_import [--input events.jsonl] [--json] [--continue-on-error] [--failed-output failed.jsonl] [--failed-manifest failed.json]`
80-
Imports newline-delimited watcher or relay note events into durable imported notes plus audit records, with optional partial-success handling plus retry-oriented failure exports.
81-
- `codex-mem follow-imports --source watcher_import --input events-a.jsonl [--input events-b.jsonl ...] [--state-file events-a.offset.json --state-file events-b.offset.json ...] [--watch-mode auto|notify|poll] [--poll-interval 5s] [--once] [--json]`
82-
Follows one or more watcher or relay JSONL files incrementally, prefers filesystem notifications with polling fallback by default, keeps one checkpoint per input, automatically retries watcher recovery in `auto` mode, and reports command-level watch state plus poll-catchup/recovery events and warnings alongside per-input imported-note results.
79+
- `codex-mem ingest-imports --source watcher_import [--input events.jsonl] [--audit-only] [--json] [--continue-on-error] [--failed-output failed.jsonl] [--failed-manifest failed.json]`
80+
Imports newline-delimited watcher or relay note events into durable imported notes plus audit records, or uses `--audit-only` to store only import-audit provenance while still applying the same privacy and explicit-memory precedence rules. Audit-only reports now distinguish new-note candidates from existing-note links and can aggregate suppression reasons.
81+
- `codex-mem follow-imports --source watcher_import --input events-a.jsonl [--input events-b.jsonl ...] [--state-file events-a.offset.json --state-file events-b.offset.json ...] [--watch-mode auto|notify|poll] [--poll-interval 5s] [--once] [--audit-only] [--json]`
82+
Follows one or more watcher or relay JSONL files incrementally, prefers filesystem notifications with polling fallback by default, keeps one checkpoint per input, automatically retries watcher recovery in `auto` mode, and reports command-level watch state plus poll-catchup/recovery events and warnings alongside per-input imported-note results or audit-only import records, including nested audit-only summary counters and suppression-reason counts.
8383
- `codex-mem cleanup-follow-imports [--target-profile all|artifacts|state|retry|health] [...]`
8484
Removes selected follow-imports checkpoint, retry-artifact, and stale-health artifacts. `--target-profile` can enable common cleanup target sets before you add path, age, dry-run, or `--summary-only` report filters.
8585
- `codex-mem audit-follow-imports [--target-profile all|artifacts|state|retry|health] [...]`

docs/go/maintainer/development-tracker.md

Lines changed: 16 additions & 167 deletions
Large diffs are not rendered by default.

docs/go/operator/import-ingestion.md

Lines changed: 51 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ Do not use this for:
2525
Use `ingest-imports` when you already have a bounded batch to replay.
2626
Use `follow-imports` when another process keeps appending to the same JSONL file and you want `codex-mem` to checkpoint progress between notification or polling passes.
2727
`follow-imports` can now fan in multiple files by repeating `--input`.
28+
Add `--audit-only` to either import command when you want import-audit provenance without creating or reusing durable imported notes.
2829
Use `audit-follow-imports` when you want a read-only hygiene report for pending checkpoint, retry-artifact, or stale-health cleanup work before deciding whether to run deletion.
2930

3031
Minimal stdin example:
@@ -39,6 +40,12 @@ Read from a file and print JSON:
3940
codex-mem.exe ingest-imports --source relay_import --input .\relay-events.jsonl --json
4041
```
4142

43+
Store only import-audit records for the batch while preserving the same privacy and explicit-memory precedence checks:
44+
45+
```powershell
46+
codex-mem.exe ingest-imports --source watcher_import --input .\events.jsonl --audit-only --json
47+
```
48+
4249
Continue past bad lines and keep successful imports:
4350

4451
```powershell
@@ -63,6 +70,12 @@ Follow a growing JSONL file once and checkpoint the consumed offset:
6370
codex-mem.exe follow-imports --source watcher_import --input .\events.jsonl --once --json
6471
```
6572

73+
Follow the same stream in audit-only mode when another system should inspect imported provenance before materializing notes:
74+
75+
```powershell
76+
codex-mem.exe follow-imports --source watcher_import --input .\events.jsonl --once --audit-only --json
77+
```
78+
6679
Run as a long-lived poller with an explicit checkpoint file:
6780

6881
```powershell
@@ -159,6 +172,8 @@ Useful flags:
159172
Optional. Overrides the default ingestion session task summary.
160173
- `--json`
161174
Optional. Prints a structured report instead of line-oriented text output.
175+
- `--audit-only`
176+
`ingest-imports` and `follow-imports` only. Optional. Evaluates each event against the same imported-note dedupe, privacy, and explicit-memory precedence rules, but writes only the import-audit record instead of materializing or reusing a durable note. The event schema stays the same so the audit-only path can answer whether the artifact would have been suppressed or linked to an existing note.
162177
- `--continue-on-error`
163178
`ingest-imports` only. Keeps scanning after per-line decode or import failures and returns a partial-success report when at least one event succeeds.
164179
- `--failed-output <path>`
@@ -259,6 +274,13 @@ Behavior to expect from this batch:
259274
- the first event creates an imported durable note plus an import audit record
260275
- the second event creates only a suppressed import audit record
261276

277+
When `--audit-only` is set for `ingest-imports` or `follow-imports`:
278+
279+
- the same event schema is still required because the command evaluates imported-note precedence rather than dropping to raw `memory_save_import`
280+
- new non-suppressed artifacts create only import-audit records and leave `materialized=0`
281+
- imported duplicates can still link the created audit record to an existing imported note
282+
- stronger explicit memory still suppresses the import audit with `suppression_reason=explicit_memory_exists`
283+
262284
## Output Semantics
263285

264286
Text mode prints a compact summary such as:
@@ -282,6 +304,7 @@ warnings=1
282304
```
283305

284306
JSON mode returns the same summary plus per-line results, including the created or reused `note_id` and `import_id`.
307+
When `--audit-only` is active, the report also includes `audit_only=true`, `materialized` stays `0`, `would_materialize` counts unsuppressed artifacts that would have created a new imported note, and `linked_existing_note` counts unsuppressed artifacts that only linked an already-imported durable note. Whenever any suppression happens, the JSON report also includes a `suppression_reasons` object keyed by normalized reason (for example `privacy_intent`, `explicit_memory_exists`, or the fallback `import_policy` bucket), and text mode flattens those same counts as `suppression_reason_<reason>=<count>`. Each per-line result can still surface `suppression_reason`, and `note_id` stays omitted for newly audited artifacts that would have taken the new-note path.
285308
When a line fails in `--continue-on-error` mode, that result entry includes a structured `error` payload instead.
286309
If `--failed-output` is set, the report also includes the resolved output path and how many failed lines were written there.
287310
If `--failed-manifest` is set, the report also includes the manifest path and how many failures were captured there.
@@ -291,18 +314,34 @@ Multi-input `follow-imports` returns one aggregate report with command-level wat
291314
`audit-follow-imports` reports the same target-selection metadata and matched-versus-skipped counts as a read-only hygiene pass, plus whether `--summary-only` was active, whether the follow-health snapshot is present, when it was last updated, whether it is stale, and any warning summaries carried by that snapshot.
292315
When `--summary-only` is set, the aggregate counts stay the same but the detailed checkpoint and retry-artifact path lists are omitted from both text and JSON output.
293316

294-
Checked-in sample outputs for common cleanup flows live under [../../../internal/app/testdata](../../../internal/app/testdata/):
317+
Checked-in sample outputs for import and follow workflows live under [../../../internal/app/testdata](../../../internal/app/testdata/):
295318

319+
- [ingest-imports-audit-only-summary.txt](../../../internal/app/testdata/ingest-imports-audit-only-summary.txt)
320+
- [ingest-imports-audit-only-linked.json](../../../internal/app/testdata/ingest-imports-audit-only-linked.json)
321+
- [follow-imports-audit-only-single.txt](../../../internal/app/testdata/follow-imports-audit-only-single.txt)
322+
- [follow-imports-audit-only-multi.json](../../../internal/app/testdata/follow-imports-audit-only-multi.json)
296323
- [cleanup-follow-imports-daily-dry-run.txt](../../../internal/app/testdata/cleanup-follow-imports-daily-dry-run.txt)
297324
- [cleanup-follow-imports-filtered-cleanup.json](../../../internal/app/testdata/cleanup-follow-imports-filtered-cleanup.json)
298325
- [cleanup-follow-imports-target-profile-all.txt](../../../internal/app/testdata/cleanup-follow-imports-target-profile-all.txt)
299326
- [audit-follow-imports-daily-audit.txt](../../../internal/app/testdata/audit-follow-imports-daily-audit.txt)
300327
- [audit-follow-imports-filtered-audit.json](../../../internal/app/testdata/audit-follow-imports-filtered-audit.json)
301328
- [audit-follow-imports-target-profile-retry.json](../../../internal/app/testdata/audit-follow-imports-target-profile-retry.json)
302329

303-
If a deliberate output change makes those fixtures drift, refresh the cleanup fixtures from the repository root through the test-only maintainer helper:
330+
If a deliberate output change makes those fixtures drift, refresh the ingest fixtures from the repository root through the test-only maintainer helper:
331+
332+
```powershell
333+
$env:CODEX_MEM_REFRESH_INGEST_EXAMPLES = "all"
334+
go test ./internal/app -run TestRefreshIngestImportsExampleFixtures
335+
Remove-Item Env:CODEX_MEM_REFRESH_INGEST_EXAMPLES
336+
```
337+
338+
Refresh the cleanup fixtures the same way:
304339

305340
```powershell
341+
$env:CODEX_MEM_REFRESH_FOLLOW_IMPORT_EXAMPLES = "all"
342+
go test ./internal/app -run TestRefreshFollowImportsExampleFixtures
343+
Remove-Item Env:CODEX_MEM_REFRESH_FOLLOW_IMPORT_EXAMPLES
344+
306345
$env:CODEX_MEM_REFRESH_CLEANUP_EXAMPLES = "all"
307346
go test ./internal/app -run TestRefreshCleanupFollowImportsExampleFixtures
308347
Remove-Item Env:CODEX_MEM_REFRESH_CLEANUP_EXAMPLES
@@ -319,6 +358,14 @@ Remove-Item Env:CODEX_MEM_REFRESH_AUDIT_EXAMPLES
319358
If you only need one fixture while iterating on a specific report shape, pass a comma-separated fixture-name subset instead of `all`:
320359

321360
```powershell
361+
$env:CODEX_MEM_REFRESH_INGEST_EXAMPLES = "audit-only-linked-json"
362+
go test ./internal/app -run TestRefreshIngestImportsExampleFixtures
363+
Remove-Item Env:CODEX_MEM_REFRESH_INGEST_EXAMPLES
364+
365+
$env:CODEX_MEM_REFRESH_FOLLOW_IMPORT_EXAMPLES = "audit-only-single-text"
366+
go test ./internal/app -run TestRefreshFollowImportsExampleFixtures
367+
Remove-Item Env:CODEX_MEM_REFRESH_FOLLOW_IMPORT_EXAMPLES
368+
322369
$env:CODEX_MEM_REFRESH_CLEANUP_EXAMPLES = "filtered-cleanup-json"
323370
go test ./internal/app -run TestRefreshCleanupFollowImportsExampleFixtures
324371
Remove-Item Env:CODEX_MEM_REFRESH_CLEANUP_EXAMPLES
@@ -338,6 +385,7 @@ go test ./internal/app -run "Test(Audit|Cleanup)FollowImportsExampleOutputsStayI
338385

339386
- `ingest-imports` starts one fresh session for the whole batch after resolving scope.
340387
- `follow-imports` starts one fresh session per consumed polling batch, not one session for the lifetime of the process.
388+
- `--audit-only` keeps the same session, checkpoint, retry-export, and follow-health behavior as the materializing path so operators can switch between audit-only and imported-note materialization without learning a second ingestion flow.
341389
- When `follow-imports` fans in multiple files, each input keeps its own checkpoint sidecar and each consumed input still starts its own ingestion session for that batch.
342390
- In `auto` mode, `follow-imports` prefers filesystem notifications for lower latency and keeps the poll timer as a safety net in case a platform drops an event.
343391
- In `auto` mode, if watcher setup fails or a running watcher later closes/errors, `follow-imports` falls back to polling and keeps retrying watcher setup on later poll intervals. When watcher setup succeeds again, the process switches back to notify mode instead of staying degraded forever.
@@ -365,6 +413,7 @@ go test ./internal/app -run "Test(Audit|Cleanup)FollowImportsExampleOutputsStayI
365413
- When multi-input follow mode shares failed-output or failed-manifest bases, pass the same `--input` set to `cleanup-follow-imports` so it derives the same per-input filenames before scanning for range-suffixed artifacts.
366414
- When multi-input follow mode shares `--failed-output` or `--failed-manifest` base paths, `codex-mem` derives per-input file names before adding the byte-range suffix so retry artifacts from different inputs do not overwrite each other.
367415
- Each event uses the same imported-note workflow as `memory_save_imported_note`.
416+
- `--audit-only` intentionally still uses that imported-note workflow instead of the lower-level `memory_save_import` contract, because operators usually want privacy suppression, explicit-memory precedence, and imported-note dedupe to stay aligned between audit-only and materializing runs.
368417
- Existing explicit memory wins over weaker imported duplicates in the same project.
369418
- The default implementation is fail-fast: the first invalid line stops the batch and returns an error.
370419
- `--continue-on-error` preserves successful lines, reports per-line failures, and still exits with an error if nothing in the batch imports successfully.

internal/app/follow_import_example_fixtures_test.go

Lines changed: 17 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -13,17 +13,17 @@ import (
1313
"codex-mem/internal/domain/common"
1414
)
1515

16-
const cleanupFollowImportsExampleDirName = "testdata"
16+
const commandExampleDirName = "testdata"
1717

18-
type followImportsExampleFixture[T any] struct {
18+
type commandExampleFixture[T any] struct {
1919
Name string
2020
RelativePath string
2121
JSON bool
2222
Report T
2323
}
2424

25-
type cleanupFollowImportsExampleFixture = followImportsExampleFixture[cleanupFollowImportsReport]
26-
type auditFollowImportsExampleFixture = followImportsExampleFixture[auditFollowImportsReport]
25+
type cleanupFollowImportsExampleFixture = commandExampleFixture[cleanupFollowImportsReport]
26+
type auditFollowImportsExampleFixture = commandExampleFixture[auditFollowImportsReport]
2727

2828
func normalizeFollowImportsExampleName(value string) string {
2929
return strings.ToLower(strings.TrimSpace(value))
@@ -49,17 +49,17 @@ func parseFollowImportsExampleNames(raw string) ([]string, error) {
4949
return names, nil
5050
}
5151

52-
func selectFollowImportsExampleFixtures[T any](fixtures []followImportsExampleFixture[T], names []string, command string) ([]followImportsExampleFixture[T], error) {
52+
func selectCommandExampleFixtures[T any](fixtures []commandExampleFixture[T], names []string, command string) ([]commandExampleFixture[T], error) {
5353
if len(names) == 0 {
5454
return fixtures, nil
5555
}
5656

57-
byName := make(map[string]followImportsExampleFixture[T], len(fixtures))
57+
byName := make(map[string]commandExampleFixture[T], len(fixtures))
5858
for _, fixture := range fixtures {
5959
byName[normalizeFollowImportsExampleName(fixture.Name)] = fixture
6060
}
6161

62-
selected := make([]followImportsExampleFixture[T], 0, len(names))
62+
selected := make([]commandExampleFixture[T], 0, len(names))
6363
seen := make(map[string]struct{}, len(names))
6464
for _, name := range names {
6565
normalized := normalizeFollowImportsExampleName(name)
@@ -79,8 +79,8 @@ func selectFollowImportsExampleFixtures[T any](fixtures []followImportsExampleFi
7979
return selected, nil
8080
}
8181

82-
func writeFollowImportsExampleFixtures[T any](baseDir string, names []string, command string, fixtures []followImportsExampleFixture[T], render func(T, bool) ([]byte, error)) ([]string, error) {
83-
selected, err := selectFollowImportsExampleFixtures(fixtures, names, command)
82+
func writeCommandExampleFixtures[T any](baseDir string, names []string, command string, fixtures []commandExampleFixture[T], render func(T, bool) ([]byte, error)) ([]string, error) {
83+
selected, err := selectCommandExampleFixtures(fixtures, names, command)
8484
if err != nil {
8585
return nil, err
8686
}
@@ -102,13 +102,13 @@ func writeFollowImportsExampleFixtures[T any](baseDir string, names []string, co
102102
return writtenPaths, nil
103103
}
104104

105-
func listFollowImportsExamples[T any](fixtures []followImportsExampleFixture[T], w io.Writer) error {
105+
func listCommandExamples[T any](fixtures []commandExampleFixture[T], w io.Writer) error {
106106
for _, fixture := range fixtures {
107107
format := "text"
108108
if fixture.JSON {
109109
format = "json"
110110
}
111-
if _, err := fmt.Fprintf(w, "example=%s path=%s format=%s\n", fixture.Name, filepath.Join(cleanupFollowImportsExampleDirName, fixture.RelativePath), format); err != nil {
111+
if _, err := fmt.Fprintf(w, "example=%s path=%s format=%s\n", fixture.Name, filepath.Join(commandExampleDirName, fixture.RelativePath), format); err != nil {
112112
return err
113113
}
114114
}
@@ -429,11 +429,11 @@ func auditFollowImportsExampleFixtures() []auditFollowImportsExampleFixture {
429429
}
430430

431431
func selectCleanupFollowImportsExampleFixtures(names []string) ([]cleanupFollowImportsExampleFixture, error) {
432-
return selectFollowImportsExampleFixtures(cleanupFollowImportsExampleFixtures(), names, "cleanup-follow-imports")
432+
return selectCommandExampleFixtures(cleanupFollowImportsExampleFixtures(), names, "cleanup-follow-imports")
433433
}
434434

435435
func selectAuditFollowImportsExampleFixtures(names []string) ([]auditFollowImportsExampleFixture, error) {
436-
return selectFollowImportsExampleFixtures(auditFollowImportsExampleFixtures(), names, "audit-follow-imports")
436+
return selectCommandExampleFixtures(auditFollowImportsExampleFixtures(), names, "audit-follow-imports")
437437
}
438438

439439
func renderCleanupFollowImportsExample(report cleanupFollowImportsReport, jsonOutput bool) ([]byte, error) {
@@ -459,17 +459,17 @@ func renderAuditFollowImportsExample(report auditFollowImportsReport, jsonOutput
459459
}
460460

461461
func writeCleanupFollowImportsExampleFixtures(baseDir string, names []string) ([]string, error) {
462-
return writeFollowImportsExampleFixtures(baseDir, names, "cleanup-follow-imports", cleanupFollowImportsExampleFixtures(), renderCleanupFollowImportsExample)
462+
return writeCommandExampleFixtures(baseDir, names, "cleanup-follow-imports", cleanupFollowImportsExampleFixtures(), renderCleanupFollowImportsExample)
463463
}
464464

465465
func writeAuditFollowImportsExampleFixtures(baseDir string, names []string) ([]string, error) {
466-
return writeFollowImportsExampleFixtures(baseDir, names, "audit-follow-imports", auditFollowImportsExampleFixtures(), renderAuditFollowImportsExample)
466+
return writeCommandExampleFixtures(baseDir, names, "audit-follow-imports", auditFollowImportsExampleFixtures(), renderAuditFollowImportsExample)
467467
}
468468

469469
func listCleanupFollowImportsExamples(w io.Writer) error {
470-
return listFollowImportsExamples(cleanupFollowImportsExampleFixtures(), w)
470+
return listCommandExamples(cleanupFollowImportsExampleFixtures(), w)
471471
}
472472

473473
func listAuditFollowImportsExamples(w io.Writer) error {
474-
return listFollowImportsExamples(auditFollowImportsExampleFixtures(), w)
474+
return listCommandExamples(auditFollowImportsExampleFixtures(), w)
475475
}

0 commit comments

Comments
 (0)