Skip to content

Latest commit

 

History

History
165 lines (128 loc) · 20.7 KB

File metadata and controls

165 lines (128 loc) · 20.7 KB

Code Agent Platform — Invariants Checklist (v5.3.2)

Updated to reflect debug CLI in Phase 1, context budget separation, concurrent query model, path normalization, InvalidationPlanner phasing, golden set sizing, and the following fixes from executive review: file move ordering (P0-1), BLOB(16) type consistency (P0-2), disambiguator in UNIQUE index with declaration-header hash (P0-3), uniform 500-node transaction bound (P0-4), multi-span content_hash definition (P0-5), edge invalidation reframed on node_id (P1-6), PRAGMA split for writer-only settings (P1-7), non-null project_id with synthetic root (P1-9), symbol-exact lookup chain clarification (P1-10), and grounded cancellation semantics (P1-12). Consistency review (v5.2): symbol_disambiguator sole scheme with no file-path fallback, multi-span chunk_hash ordering by span_hash bytes (not span_id), node_identity_map DDL with BLOB(16) and disambiguator fields, multi-span-aware file deletion logic, skip-if-unchanged clarified for move handling, symbol channel overload handling, license correction (Apache 2.0), TS adapter alignment, and project detection collision logging. Further fixes (v5.3): C# tree-sitter fallback includes signature for overload safety, TS identity scopes non-exported symbols by file_id, file deletion test split for single-span vs multi-span, transaction bound exception clarified for edge replace, symbol_disambiguator NOT NULL DEFAULT '', primary-span partial unique index, §6.2 phase labeling. Consistency check fixes (v5.3.1): TS export_scope classification (package/module/file) replacing underspecified "exported vs non-exported" split; Plan §5 intro and §6.2 corrected for eager-embedding reality; node_spans.file_id FK action specified (ON DELETE CASCADE); file deletion operation order made unambiguous; C# Roslyn key reconciliation mechanism specified (in-place mutation); file node journaling explicitly required; retrieval channel renamed "qualified-name" (was "symbol-exact"); atomic edge replace scaling mitigation note added; file node eager embedding input specified; project_id assignment sequencing clarified. Lazy-summary removal (v5.3.2): Removed all references to lazy summary generation, ensure_summary, summary_prompt_version_used, quality_flag, quality guards, and SummaryConfig — feature was dropped from codebase. Summary invalidation reframed as content-hash invariant. Phase 3 test assertions rewritten for EmbeddingProvider/vec_nodes reality.


1) Identity & Stability Invariants

  • Stable logical identity: symbol_key MUST be stable across moves when the symbol identity is unchanged (C# all symbols; TS package-scoped exports only — see export_scope below).
    • C#: derived from Roslyn symbol identity (kind + fully-qualified + signature). Phase 1 fallback (tree-sitter only): qualified_name + kind + parameter count + parameter types-as-written + generic arity (overload-safe).
    • TS: Each symbol is classified by export_scope (package | module | file):
      • package: reachable from a stable package entrypoint/barrel export surface. symbol_key = package_id + package-level export path + kind + normalized signature. Move-stable; file path MUST NOT appear in identity. export default is disambiguated by its unique package-level export path.
      • module: has export keyword but NOT reachable from package entrypoint. symbol_key = package_id + file_id + local export name + kind + normalized signature. NOT move-stable (includes file_id).
      • file (non-exported): no export keyword. Same derivation as module scope (includes file_id). NOT move-stable.
      • Phase 1: tree-sitter cannot trace the export graph; all exported symbols are classified as export_scope = module (conservative, includes file_id). Move-stability for TS is NOT achieved in Phase 1. Phase 2a: TS Language Service resolves re-exports; symbols reachable from package entrypoints are upgraded to export_scope = package via in-place symbol_key mutation on the existing node_id.
      • Collision handling via symbol_disambiguator (declaration-header hash — hashes the declaration line shape, not the body). File path MUST NOT be used as a secondary disambiguator for package-scoped exports; this is the sole disambiguation scheme to preserve move-stability.
  • Identity reconciliation (Phase 2a): When semantic enrichment produces a different symbol_key for the same symbol (C# Roslyn-derived keys replacing tree-sitter fallback keys, or TS export_scope upgrade from module→package), the system MUST mutate symbol_key in-place on the existing node_id to preserve cached data. Match by file location + symbol kind + name. If a UNIQUE conflict occurs, fall back to delete+create with a node_identity_map entry.
  • (Phase 2b) Move/rename preservation: When a move/rename is detected with high confidence, the system prefers reusing node_id. In Phase 1, renames are treated as hard-delete (with journal entry) + create.
  • Collisions: If symbol_key collides, the system MUST disambiguate without breaking move-stability, and MUST NOT "flip-flop" identities across runs. The full identity key is (language, project_id, symbol_key, symbol_disambiguator), enforced by a UNIQUE index.
  • File move preservation (Phase 1): Within a ChangeBatch, creates/modifications MUST be processed before deletes. The pipeline upserts by (language, project_id, symbol_key, symbol_disambiguator), updating node_spans to the new file_id/file_path, then processes deletes. This preserves node_id and cached data for pure file moves without requiring rename detection.
  • (Phase 2b) Edge identity preservation: When rename detection is added, edges MUST NOT be remapped by "guessing." Either reuse node_id or use explicit node_identity_map.

2) Invalidation & Recomputation Invariants

  • Two kinds of invalidation exist and both are required:
    1. Source-driven invalidation (file content / chunk_hash changes).
    2. Semantic-context invalidation (project/package config changes). Phase 1 detects and logs; Phase 2a acts on them.
  • InvalidationPlanner: Centralized component. Phase 1: source-driven invalidation. Phase 2a: adds semantic-context invalidation. Phase 2b: adds rename detection integration. Must be unit-testable against Section 6.6 decision matrix.
  • Semantic-context triggers (Phase 2a): Changes to *.sln, *.csproj, Directory.Build.props/targets, tsconfig*.json, package.json, lockfiles MUST trigger semantic scope recompute.
  • Atomic semantic edge replace (Phase 2a): Prior semantic edges MUST be deleted/replaced atomically (single write transaction) to prevent mixed-stale graphs.
  • Content-hash invariant: For multi-span nodes, nodes.chunk_hash stores a composite content_hash: SHA-256 of the concatenation of all node_spans.span_hash values, ordered by span_hash bytes ascending (not by span_id, which is insertion-order and unstable across reparse). chunk_hash is purely content-derived.
  • Embedding invalidation: Embeddings MUST be recomputed when embedding model id/dimensionality changes, or the text basis changes.
  • Parse-status truthfulness: If semantic enrichment is unavailable/failed, parse_status MUST reflect it.

3) MCP Tool Behavior Invariants

  • No implicit network calls (highest priority):
    • All MCP tools MUST be purely local. MUST NOT trigger network calls.
    • get_source_spans, get_symbol, read_file — always local SQLite or filesystem reads.
  • Offline correctness: Graph traversal, retrieval, source access, and cached embeddings all work offline. The engine makes no network calls.
  • File system sandboxing: All file system tools resolve paths relative to the repo root. Paths escaping the root are rejected. Symlinks outside the root are not followed.
  • No graph mutation via MCP: External clients cannot directly write to the graph. Only index_files triggers graph writes through the ingest pipeline.

4) Storage & Schema Invariants

  • Single-writer discipline: All writes through dedicated writer thread; readers use read-only pooled connections.
  • Foreign keys enforced: Every connection (reader and writer) MUST set PRAGMA foreign_keys = ON, PRAGMA mmap_size, and PRAGMA busy_timeout. WAL mode (journal_mode), synchronous, and wal_autocheckpoint are set by the writer thread (or DB creation path) only; read-only connections inherit WAL mode automatically.
  • Multi-span correctness: Every multi-span node has at least one node_spans row with is_primary = true.
  • Hard-delete with deletion journal: Node removal is a hard DELETE with ON DELETE CASCADE for edges. Before each delete, a snapshot is written to deletion_log for Phase 2b rename matching. This applies to all node types including file nodes — file nodes are journaled so Phase 2b file-level rename correlation can query the journal. All read paths are clean — no is_deleted filter anywhere.
  • File deletion operation order (transactional): When a file is deleted, the following steps execute within a single write transaction, in this exact order: (1) collect affected node_ids from node_spans; (2) delete node_spans rows for that file_id; (3) for each affected node: if zero spans remain → journal + hard-delete; if spans remain → reassign primary + update convenience fields; (4) journal + delete the file node last. The file node MUST NOT be deleted first because nodes.file_id ON DELETE SET NULL would null-out convenience fields before the writer can process affected nodes.
  • node_spans.file_id FK action: ON DELETE CASCADE. When a file node is deleted, associated node_spans rows are automatically removed. The explicit deletion algorithm (above) processes spans before the file node deletion, but the CASCADE acts as a safety net.
  • Journal sweep: Deletion log entries older than retention period (default 1 hour) are swept during idle time.
  • Embedding storage is simple: One row per node in sqlite-vec. Overwritten in-place. WAL ensures reader consistency. No versioning.
  • FTS/vector sync: For any node, FTS and vector representations MUST be consistent with the node's current state.
  • Migrations are monotonic: Newer on-disk schema refuses older binaries; forward migrations are transactional.
  • UUID storage: node_id, file_id, and edge endpoints stored as BLOB(16). All DDL examples (including node_spans) MUST use BLOB(16) for UUID columns and BLOB(32) for hash columns.
  • Path normalization: All stored paths are repo-relative POSIX. normalize_path called at every system boundary. Never mid-pipeline.
  • project_id is non-null: Nodes are assigned to a synthetic "repo-root" project per language until project detection completes. This ensures the UNIQUE index on (language, project_id, symbol_key, symbol_disambiguator) cannot be bypassed by NULL values (SQLite treats each NULL as distinct in UNIQUE constraints).
  • project_id assignment sequencing: Project detection (scanning for .csproj, package.json workspaces) MUST complete before symbol node indexing begins, so that nodes receive their correct project_id from the start. Late reassignment of project_id after initial indexing requires re-indexing affected symbols (since project_id participates in the UNIQUE constraint) and may cause transient constraint violations.
  • symbol_disambiguator is non-null: Defined as TEXT NOT NULL DEFAULT '' (empty string when no disambiguation is needed). This prevents SQLite's NULL-distinct-in-UNIQUE behavior from silently allowing duplicate identity keys.
  • Primary-span uniqueness enforced: Exactly one is_primary = true per node in node_spans, enforced by a partial unique index: CREATE UNIQUE INDEX idx_node_spans_one_primary ON node_spans(node_id) WHERE is_primary = 1.
  • Bounded write transactions: All node upsert write transactions are bounded to at most 500 nodes per transaction. This applies uniformly, including burst-recovery and large refactoring batches. The sole exception is atomic semantic edge replace (one transaction per affected project/package scope), which is bounded per scope rather than by a fixed node count.

5) Retrieval Invariants

  • Context budget separation: Retrieval pipeline has retrieval.max_output_tokens (hard cap). MCP clients pass the desired limit when invoking retrieval tools. Retrieval does not know about conversation history or system prompts.
  • Cold-start awareness: Behavioral queries will have low vector recall until eager embeddings have been computed for the full codebase. Phase 4b eval should include behavioral queries to establish baseline.

6) Security Invariants

  • Safe mode is real (Phase 2a):
    • indexing.safe_mode = true: MUST NOT evaluate MSBuild, MUST NOT execute TS plugins. C# runs as syntactic_only.
    • Any relaxation MUST be explicit and user-approved.
  • NuGet restore separately gated: indexing.allow_nuget_restore (default false) even when safe mode is off.
  • No unsupported sandbox promises: Primary mitigation is safe mode, not filesystem sandboxing.

7) Concurrency & Cancellation Invariants

  • Bounded queues: Write queue and child-process request queues bounded with deterministic backpressure.
  • Cancellation correctness: Superseded semantic requests cancellable via CancellationToken (tokio_util), propagated to rayon CPU tasks (periodic token checks) and child process RPC calls (request_id cancellation messages). The writer thread MUST check the token before COMMIT — if cancelled, the transaction is rolled back. MUST NOT commit partial results.
  • No long-lived write transactions: Bounded batch size (max 500 nodes/tx) for node upsert transactions to avoid WAL growth. Applies uniformly to all node upsert batch types. Exception: atomic semantic edge replace (Phase 2a) may exceed this bound because it operates on edges within a single project/package scope and must be atomic to prevent mixed-staleness. Edge replace transactions are bounded per project/package scope, not by a fixed node count.

8) Embedding Quality Invariants

  • No constant boilerplate: Embedding uses only discriminative content.
  • File node embedding input: File nodes use file_path + file_name + language + import/export summary (top-N imported/exported symbol names) + file header doc comment. Must NOT collapse all file nodes into the same vector space region.
  • Normalization: All embeddings use the same model and consistent normalization.

9) Phase-Dependent Field Accuracy

  • reference_count: Phase 1 = approximate (tree-sitter edge count). Phase 2a = precise (Roslyn/TS findReferences).
  • is_public_api (TS): Phase 1 = direct exports only (export keyword). Phase 2a = includes re-exports via barrel file resolution.

Minimal Test Assertions by Phase

Phase 1

  • TS file move in Phase 1: all TS exported symbols are classified export_scope = module (tree-sitter cannot trace the export graph), so symbol_key includes file_id and is NOT move-stable. A pure file move is treated as delete+create for TS symbols in Phase 1 (cache loss, not correctness failure). Verify that two identically-named exported helpers in different files produce distinct identity keys (because file_id differs). Full identity key (language, project_id, symbol_key, symbol_disambiguator) is unique across files. Non-exported file-scoped symbols also include file_id in identity.
  • File move preserves node_id and cached embeddings (ChangeBatch processes creates before deletes, upserts by identity key, updates node_spans to new file).
  • get_source() and get_node() never trigger network calls.
  • File deletion hard-deletes all single-span nodes whose only span was in that file; deletion journal entries written for each. Multi-span nodes whose other spans survive are NOT deleted (see multi-span deletion test below). The file node itself is journaled and then hard-deleted last. Verify the operation order: collect affected nodes → delete spans → process nodes (journal+delete or reassign primary) → journal+delete file node. Verify the entire sequence executes within a single write transaction.
  • Hard-deleted nodes leave no orphaned edges, vec_nodes, or fts_nodes rows.
  • Journal sweep removes entries older than retention period.
  • Rename produces new node_id; old node hard-deleted with journal entry.
  • Multi-span correctness: partial class maintains one primary span. Content_hash (chunk_hash) is SHA-256 of span hashes ordered by span_hash bytes ascending (not span_id). chunk_hash is purely content-derived.
  • File deletion with multi-span nodes: deleting a file that contains one span of a multi-span node removes only that span's node_spans row, reassigns primary span if needed, and does NOT delete the node. Deleting a file whose nodes have no remaining spans hard-deletes those nodes.
  • Semantic-context file changes detected and logged (no action until Phase 2a).
  • InvalidationPlanner: unit test against decision matrix for all (edge_type, change_type) pairs. Decision matrix includes node_id lifecycle rows (preserved → no action; replaced → CASCADE + re-establish), not symbol_key-based triggers.
  • Path normalization: Windows backslash paths → repo-relative POSIX at boundary.
  • Debug CLI: all commands produce valid JSON on sample project.
  • Backend integration: health check authenticates and completes a round-trip.
  • project_id is non-null: nodes without a detected project are assigned to a synthetic "repo-root" project per language. Project detection MUST complete before symbol node indexing begins. Collisions from multi-project repos with identically-named symbols MUST be detected and logged with an actionable warning.
  • node_spans DDL uses BLOB(16) for UUID columns and BLOB(32) for hash columns, consistent with all other schema definitions. node_spans.file_id FK uses ON DELETE CASCADE.
  • Write transactions bounded to ≤500 nodes/tx, including burst-recovery batches.
  • Cancellation: writer thread rolls back uncommitted transaction on cancellation token; no partial results persisted.

Phase 2a

  • Context change invalidation: changing tsconfig.json or .csproj triggers semantic recompute even with zero source edits.
  • Atomic replace: no observable mixed old/new edge state during semantic recompute.
  • Safe mode enforcement: MSBuildWorkspace and TS plugins blocked when safe mode is on.
  • NuGet gating: with safe mode off and allow_nuget_restore = false, evaluation proceeds but restore is blocked.
  • reference_count upgraded to precise count from Roslyn/TS.
  • is_public_api for TS includes barrel file re-exports.
  • Edge invalidation reframe: when a target node's node_id is replaced (delete+create), ON DELETE CASCADE removes stale edges. Callers re-establish edges on next semantic pass. No ad-hoc remapping by symbol_key.
  • Identity reconciliation (C#): When Roslyn produces a different symbol_key than the tree-sitter fallback for the same symbol, verify that symbol_key is mutated in-place on the existing node_id (not delete+create). Verify cached embeddings are preserved. Verify that if the new key conflicts with an existing node, the system falls back to delete+create with a node_identity_map entry.
  • TS export_scope upgrade: After semantic enrichment resolves re-exports, verify that symbols reachable from package entrypoints are upgraded from export_scope = module to export_scope = package. Verify symbol_key is mutated in-place (file_id removed, package-level export path added) on the existing node_id. Verify that a subsequent pure file move preserves symbol_key for package-scoped exports. Verify that module-scoped exports (not reachable from entrypoints) retain file_id in identity.

Phase 2b

  • Chunk fingerprint: identical bodies / different names → high similarity (≥0.95).
  • Detected rename reuses node_id; edges automatically preserved.
  • Uncommitted rename (delete + create in debounce window) correctly correlated via deletion journal.
  • When node_id cannot be preserved, node_identity_map entry created with BLOB(16) UUID columns and both old/new symbol_disambiguator fields.

Phase 3

  • EmbeddingProvider trait produces deterministic embeddings; HashEmbeddingProvider used in tests.
  • vec_nodes table created by ensure_vec_nodes_table; embeddings stored as BLOB keyed by node_id BLOB(16).
  • load_sqlite_vec() registers sqlite-vec as a global auto-extension before any connection is opened.

Phase 4b

  • Eval harness includes behavioral queries with expected lower baseline (cold-start gap).
  • Reference codebases: ≥1 C#, ≥1 TS/React, each >10K nodes.

Phase 5

  • All MCP tools return valid JSON responses.
  • File system tools reject paths escaping the repo root with path_escape error.
  • index_files routes through the ingest pipeline and updates the graph.
  • get_status returns accurate counts for indexed files, symbols, and embeddings.