Skip to content

feat(vault): deletion deny-list store + Mark/Filter RPCs (#73)#75

Draft
jh-lee-cryptolab wants to merge 1 commit into
mainfrom
issue-73-logical-delete-denylist
Draft

feat(vault): deletion deny-list store + Mark/Filter RPCs (#73)#75
jh-lee-cryptolab wants to merge 1 commit into
mainfrom
issue-73-logical-delete-denylist

Conversation

@jh-lee-cryptolab

Copy link
Copy Markdown
Contributor

Context

Memories cannot be hard-deleted: enVector v1.2.2 has no per-vector delete (only delete_index). So we delete logically — Vault keeps a per-index deny-list of deleted item_ids and serves it. Clients filter. Vault side of #73.

TL;DR

Vault stores a per-index deny-list of deleted item_ids; clients consult it and drop deleted hits.

Summary

flowchart LR
    C[rune-mcp client] -- "MarkDeleted(ids)" --> V
    C -- "FilterDeleted(candidates)" --> V
    V[(Vault deny-list<br/>SSOT, per index)] -- "deleted subset" --> C
    V -. "never talks to" .-x E[enVector]
Loading
  • denylist.Store — file-backed, debounce-persisted. Per index: a set of item_ids + a monotonic version.
  • MarkDeleted (write) — unions ids into the deny-list. Idempotent; bumps version.
  • FilterDeleted (read) — returns the deleted subset of the given candidates. Cost is O(candidates), independent of total deny-list size.
  • Scopesmark_deleted (admin only), filter_deleted (admin + member, because recall needs it).
  • item_id is uint64 — matches enVector's stable logical id (see Test plan for the stability check).
  • Configtokens.deny_list_file, optional; defaults to deny_list.yml next to tokens_file. No change needed for existing deployments.
  • Interceptor — both new RPCs added to the validation method set + token-safety switch, with a ServiceDesc-derived test so a future RPC cannot silently skip the check.

Alternatives

  • Vault filters scores / talks to enVector — rejected. Keeps Vault pure (no enVector client, no score filtering) and respects the fixed metadata-decryption flow.
  • Key by (shard, row) — rejected. Physical position moves on compaction; item_id is the stable logical id.
  • Client-side sync of the full deny-list — deferred. This tool is low-QPS, so a per-query read-through is cheap and strongly consistent. The version field leaves room for a future sync RPC.

Test plan

  • mise run check green (gofmt + vet + race tests)
  • denylist store: mark/filter, idempotent union, per-index isolation, persist + reload, missing-file
  • handlers: invalid token, scope denied (member mark_deleted), mark→filter round-trip, member filter_deleted allowed
  • interceptor: ServiceDesc-derived completeness guard for vaultMethods
  • enVector item_id stability verified against es2-msa v1.2.2item_id is an auto-increment PK; merge updates only shard_id (index_shardmap.go updateShardMapTable); no reindex/rebuild/counter-reset exists in the backend. The deny-list key is safe.

Logical-delete SSOT for organizational memory. enVector has no per-vector
delete (only delete_index), so deletions are recorded as a per-index deny-list
of stable item_ids. Vault stores the deny-list; clients (rune-mcp) consult it
and filter out deleted hits. Vault never talks to enVector and never filters
scores itself.

- denylist.Store: file-backed, debounce-persisted, per-index set + monotonic
  version. MarkDeleted unions ids (idempotent, bumps version); FilterDeleted
  returns the deleted subset of candidates at O(candidates), independent of
  total deny-list size.
- gRPC: MarkDeleted (write, scope mark_deleted) and FilterDeleted (read, scope
  filter_deleted, granted to admin+member for the recall path). item_ids are
  uint64 to match enVector's stable item_id.
- Wiring: TokensConfig.deny_list_file (optional, defaults to deny_list.yml
  beside tokens_file); loaded and shut down in the daemon.
- Register both RPCs in the validation interceptor's method set + token-safety
  switch, with a ServiceDesc-derived test guarding future RPCs from omission.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jh-lee-cryptolab

Copy link
Copy Markdown
Contributor Author

Storage scaling note

The deny-list is a single file-backed YAML (per-index sets of item_id, debounce + atomic write — same pattern as tokens.Store).

Confirmed sufficient up to ~100k entries with the single-file design — no embedded DB needed at this scale:

Path Cost at 100k
Memory (map[uint64]struct{}) ~5–8 MB, negligible
FilterDeleted (recall hot path) O(candidates), independent of deny-list size — recall does not slow down
MarkDeleted O(ids in call); rare admin op
persist (full rewrite) ~tens of ms, coalesced by 100ms debounce
startup load ~50–150ms one-time

The only inefficiency is rewriting the whole file on each persist, but deletes are infrequent (low write QPS) and debounce absorbs bursts, so it is not felt in practice.

Embedded DB (issue open question) is deferred — it only matters when writes are frequent AND volume reaches hundreds-of-thousands+. If that day comes, cheap next steps are per-index files (deny_list/<index>.yml) or an append-only log with periodic compaction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant