Skip to content

Commit b87150a

Browse files
author
john
committed
Add QIHSE compaction recovery coverage
1 parent b8a37a3 commit b87150a

3 files changed

Lines changed: 249 additions & 20 deletions

File tree

plans/qihse_persistence_layer.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -13,9 +13,9 @@ Current landed state:
1313
- PR-2 is complete for the planned WAL structure: file-backed adds write ADD and COMMIT WAL records, records carry previous-record offsets, open replays committed batches newer than the snapshot, and writable open truncates torn or uncommitted WAL tails.
1414
- PR-3 read-only mmap candidate work now covers `vectors.qvec`, `metadata.qmeta`, validated `idmap.qid`, and validated direct mapping of `index.qidx` for clean snapshots.
1515
- PR-4 physical compaction is complete for the current row model: `compact()` rewrites the in-memory table, vector blob, and metadata arena with live rows only, then publishes the compact snapshot and regenerated `idmap.qid`/`vectors.qtri` through the existing atomic flush path.
16-
- PR-4 compaction fixtures now verify compact-after-mutation row/index/idmap/trinary behavior, high-ID idmap consistency, and stale/corrupt derived sidecar rebuild.
16+
- PR-4 compaction fixtures now verify compact-after-mutation row/index/idmap/trinary behavior, high-ID idmap consistency, stale/corrupt derived sidecar rebuild, stale `.tmp` file ignore-on-open behavior, and WAL mutation compaction clearing WAL without resurrecting pruned rows.
1717
- PR-5 search-path benchmark scaffolding is present: `make bench-trinary-search-path` compares full float32 DB search against DB-backed `vectors.qtri` candidate selection plus exact float32 rerank and reports recall/order/latency. Pure trinary storage remains out of scope.
18-
- Latest pushed checkpoint before this slice: `7df0e3e` on `codex/qihse-file-persistence`.
18+
- Latest pushed checkpoint before this slice: `b8a37a3` on `codex/qihse-file-persistence`.
1919

2020
Resume commands:
2121

@@ -45,15 +45,15 @@ trinary_search_path_bench rows=2048 dims=64 qtri_row_bytes=13 candidates=64 topk
4545
Current continuation:
4646

4747
- PR-3: validate the newly landed `index.qidx` mmap path under more corruption and compatibility cases, then decide whether UMA should wrap mapped rows directly or keep the current vector DB-owned mapping path.
48-
- PR-4: add compaction crash/recovery fixtures around tmp files, manifest publication, and WAL-plus-compaction interactions. Public mutation APIs, mutation WAL replay/truncation, and physical compaction are present.
48+
- PR-4: add deeper manifest-publication crash fixtures if needed. Public mutation APIs, mutation WAL replay/truncation, physical compaction, stale temp-file handling coverage, and WAL-plus-compaction interaction coverage are present.
4949
- PR-5: use the search-path benchmark to drive optional vector DB trinary acceleration. Current benchmark proves recall/order on the synthetic fixture and exposes latency variance; it is not yet a production search-path optimization.
5050
- PR-6: add persisted anchor hints and optimizer statistics only as rebuildable, explicit-format sidecars.
5151

5252
Recommended 3-agent split:
5353

54-
- Agent 1 owns PR-4 compaction crash/recovery fixtures.
55-
- Agent 2 owns WAL-plus-compaction interaction tests and any recovery fixes they expose.
56-
- Agent 3 owns PR-5 optional search-path acceleration and broader recall/performance datasets.
54+
- Agent 1 owns any remaining manifest-publication crash fixtures.
55+
- Agent 2 owns PR-5 optional search-path acceleration wiring.
56+
- Agent 3 owns broader recall/performance datasets for trinary candidate generation.
5757

5858
## 1. Background
5959

plans/qihse_persistence_pr0_pr1_enhancement.md

Lines changed: 12 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -29,9 +29,9 @@ Current implementation state:
2929
- PR-2 WAL/recovery hardening is implemented for ADD/COMMIT records, previous-record offsets, committed-batch replay, and writable torn-tail truncation.
3030
- PR-3 candidate work has started: read-only mmap mode maps `vectors.qvec`, `metadata.qmeta`, validated `idmap.qid`, and validated direct `index.qidx` rows for clean snapshots.
3131
- PR-4 physical compaction is present: delete/update/upsert API symbols write committed WAL records, replay committed mutation batches newer than the snapshot, writable open truncates torn/uncommitted mutation tails, and `compact()` rewrites live rows only before publishing the regenerated snapshot/sidecars.
32-
- PR-4 compaction fixture coverage now enforces row/index/live/idmap counts after physical pruning, high unsigned IDs, valid `vectors.qtri` after compact, and stale/corrupt derived sidecar rebuild.
32+
- PR-4 compaction fixture coverage now enforces row/index/live/idmap counts after physical pruning, high unsigned IDs, valid `vectors.qtri` after compact, stale/corrupt derived sidecar rebuild, stale `.tmp` file ignore-on-open behavior, and WAL mutation compaction clearing WAL without resurrecting pruned rows.
3333
- PR-5 search-path benchmark scaffolding is present: standalone tryte top-k exists with `make bench-trinary-codec`, DB-backed candidate generation plus exact float32 rerank exists with `make bench-trinary-db-candidate`, and `make bench-trinary-search-path` compares full float32 DB search against trinary candidates plus rerank with recall/order/latency reporting.
34-
- Latest pushed checkpoint before this slice: `7df0e3e` on `codex/qihse-file-persistence`.
34+
- Latest pushed checkpoint before this slice: `b8a37a3` on `codex/qihse-file-persistence`.
3535
- `qihse/qihse_vector_db.c` was restored after a disk-full truncation and now contains the native persistence implementation.
3636
- `qihse_vector_db_create(..., db_path)` opens a file-backed native database.
3737
- `qihse_vector_db_open()` supports ephemeral, file-copy, read-only, and read-only mmap modes.
@@ -545,22 +545,22 @@ PR-1 is complete when:
545545
- Read-only open can search but cannot mutate.
546546
- Native persistence diagnostics report generation, storage mode, ID-map rebuild state, and trinary sidecar status.
547547
- Python has not implemented any storage format.
548-
- Writable mmap, production vector DB trinary acceleration, pure trinary storage, compaction crash fixtures, and anchor persistence are still deferred.
548+
- Writable mmap, production vector DB trinary acceleration, pure trinary storage, deeper manifest-publication crash fixtures, and anchor persistence are still deferred.
549549

550550
## Follow-On Phases
551551

552552
After the current checkpoint:
553553

554554
- PR-3: harden mmap compatibility and corruption tests now that read-only `vectors.qvec`, `metadata.qmeta`, `idmap.qid`, and `index.qidx` mapping are present for clean snapshots.
555-
- PR-4: public delete/update/upsert API behavior, mutation WAL replay/truncation, and physical tombstone compaction are implemented and covered by persistence tests. Compaction crash/recovery fixtures remain.
555+
- PR-4: public delete/update/upsert API behavior, mutation WAL replay/truncation, physical tombstone compaction, stale temp-file ignore behavior, and WAL-plus-compaction interactions are implemented and covered by persistence tests. Deeper manifest-publication crash fixtures can still be added.
556556
- PR-5: DB-backed candidate generation, exact rerank, and search-path benchmark scaffolding are present; production search-path acceleration, broader recall measurement, and optional pure trinary storage remain.
557557
- PR-6: optional persisted anchor hints and optimizer statistics as rebuildable sidecars.
558558

559559
## PR-4: Mutation and Compaction Plan
560560

561561
PR-4 should make mutation explicit without changing QIHSE's program boundary. The public contract belongs to native QIHSE; Framewerx and other callers remain clients.
562562

563-
Status: the public delete/update/upsert declarations are staged in `qihse/qihse_vector_db.h` and the native implementation is present in `qihse/qihse_vector_db.c`. Executable tests now cover delete, update, upsert, read-only rejection, compact-after-mutation search correctness, unflushed mutation WAL replay, and physical compact row pruning.
563+
Status: the public delete/update/upsert declarations are staged in `qihse/qihse_vector_db.h` and the native implementation is present in `qihse/qihse_vector_db.c`. Executable tests now cover delete, update, upsert, read-only rejection, compact-after-mutation search correctness, unflushed mutation WAL replay, physical compact row pruning, stale compact temp files, and WAL mutation compaction.
564564

565565
### Public API
566566

@@ -704,23 +704,21 @@ The persistence test file carries a compile-safe TODO backlog for these cases. C
704704

705705
Use this split when resuming the remaining plan with multiple agents. QIHSE remains its own native program; none of these tasks should introduce Framewerx-specific persistence behavior.
706706

707-
Agent 1: PR-4 compaction crash fixtures.
707+
Agent 1: PR-4 manifest-publication crash fixtures.
708708

709-
- Add tmp-file and manifest-publication crash/recovery fixtures around compact.
709+
- Add deeper manifest-publication crash/recovery fixtures around compact if the current atomic helper is extended to support injection.
710710
- Verify old generation remains authoritative when compact publication is incomplete.
711711
- Verify derived sidecars rebuild after compact interruption.
712712
- Keep compaction manual until stats-driven thresholds exist.
713713

714-
Agent 2: PR-4 WAL-plus-compaction fixtures.
714+
Agent 2: PR-5 search-path trinary acceleration.
715715

716-
- Add WAL-plus-compaction interaction tests.
717-
- Verify committed mutation WAL replay before compact, compact clearing checkpointed WAL, and no resurrection of pruned rows.
718-
- Fix any recovery bugs exposed by those tests.
719-
- Keep derived sidecar rebuild behavior explicit.
716+
- Wire the benchmarked qtri candidate path into optional vector DB search behavior.
717+
- Keep exact float32 rerank as the correctness boundary.
718+
- Preserve `vectors.qvec` as authoritative storage.
720719

721-
Agent 3: PR-5 search-path trinary acceleration.
720+
Agent 3: PR-5 recall/performance dataset expansion.
722721

723-
- Use the search-path benchmark to guide optional vector DB search acceleration.
724722
- Add broader recall/performance datasets beyond the synthetic fixture.
725723
- Keep reporting speed honestly; the current benchmark proves recall/order but not consistent speedup.
726724
- Keep `vectors.qvec` authoritative until pure trinary storage has recovery, migration, and recall tests.

0 commit comments

Comments
 (0)