fix(watch): purge embeddings before nodes to stop FK crash in rebuildFile#1182
fix(watch): purge embeddings before nodes to stop FK crash in rebuildFile#1182carlos-alm wants to merge 4 commits into
Conversation
…File The watcher's `rebuildFile` had its own `purgeAncillaryData` helper that missed the `embeddings` table. With `codegraph embed` populated and better-sqlite3's default `foreign_keys = ON`, the next `deleteNodes` fired `SQLITE_CONSTRAINT_FOREIGNKEY` (embeddings.node_id → nodes.id) and killed the watcher process. Switch to the shared `purgeFileData` helper used by the main builder — it purges embeddings, cfg_*, dataflow, complexity, node_metrics, ast_nodes, edges, and nodes in the correct order — and drop the now unused stmt prep. Wrap per-file `rebuildFile` calls in try/catch so any future SQLite or parse error logs+skips instead of crashing the long-running watcher. Closes #1176
|
Claude finished @carlos-alm's task —— View job Reviewing PR #1182: Foreign key constraint fix in watcher Todo List
Phase 1: Root Cause Analysis ✓Problem Being Solved: The watcher's Why This Approach: The fix correctly switches from the watcher's custom purge logic to the shared Risk Assessment: Low risk. This is a defensive fix that prevents crashes and uses existing, well-tested purge logic rather than duplicating/maintaining separate deletion code. Phase 2: Configuration Change Detection ✓No configuration changes detected. This PR makes no changes to:
Phase 3: Test Integrity Verification ✓New Regression Test:
Existing Tests: Test Quality: High-quality regression test that directly reproduces the user-reported failure scenario. Phase 4: Code Quality Review ✅Excellent Cleanup:
Code Quality: Very clean implementation with excellent comments explaining the FK ordering requirement and referencing issue #1176. No Security/Performance Concerns: Uses existing, vetted database operations. No command injection, SQL injection, or performance regressions. Phase 5: Backlog Compliance Assessment ✅Zero-dep: ✓ — No new runtime dependencies added. Uses existing SQLite operations. Foundation-aligned: ✓ — Fully compliant with FOUNDATION.md principles:
Problem-fit: 4/5 — Directly prevents watcher crashes that would disrupt AI agents during long-running sessions. While not directly addressing "lost AI" token waste, it ensures the core dependency tracking infrastructure remains stable and available. Breaking: No — This is a pure bug fix with no API changes. The Tier: 1 (Zero-dep ✓ + Foundation-aligned ✓) Root Cause AnalysisProblem Being Solved: Foreign key constraint violation crashes in watcher when embeddings exist Backlog Compliance
Critical ConcernsNone identified. This is a well-executed fix with excellent test coverage. Final Recommendation
|
Greptile SummaryThis PR fixes a
Confidence Score: 5/5Safe to merge — the fix correctly delegates purge ordering to the shared purgeFileData utility, and the new regression test directly verifies the FK-safe deletion sequence. The change is narrowly scoped to deleting the old hand-rolled purge path and wiring in the battle-tested shared helper. The FK-ordering bug is fully addressed, the regression test seeds a real embeddings row and checks the total count goes to zero (not just a JOIN that would miss orphans), and the prior review issues (instanceof narrowing, orphan-row assertion) are both reflected in the updated code. No files require special attention. Important Files Changed
Sequence DiagramsequenceDiagram
participant W as Watcher (processPendingFiles)
participant R as rebuildFile
participant P as purgeFileData
participant DB as SQLite DB
W->>R: rebuildFile(db, rootDir, filePath, stmts, ...)
R->>P: "purgeFileData(db, relPath, {purgeHashes:false})"
P->>DB: "DELETE FROM embeddings WHERE node_id IN (SELECT id FROM nodes WHERE file=?)"
P->>DB: DELETE FROM cfg_edges / cfg_blocks / dataflow / complexity / node_metrics / ast_nodes
P->>DB: "DELETE FROM edges WHERE source_id/target_id IN (SELECT id FROM nodes WHERE file=?)"
P->>DB: "DELETE FROM nodes WHERE file=?"
P-->>R: done (FK-safe)
R->>DB: INSERT new nodes and edges
R-->>W: RebuildResult
W->>W: push result to writeJournalAndChangeEvents
note over W,DB: On error: catch, warn+debug log, skip file - watcher continues
Reviews (4): Last reviewed commit: "Merge branch 'main' into fix/1176-watch-..." | Re-trigger Greptile |
| warn(`Failed to rebuild ${relPath}: ${(err as Error).message} — skipping`); | ||
| debug((err as Error).stack ?? String(err)); |
There was a problem hiding this comment.
If
err is a non-Error value (e.g., a plain string or null thrown by a third-party dependency), (err as Error).message evaluates to undefined, so the logged warning reads "Failed to rebuild …: undefined — skipping", which is hard to diagnose. Narrowing with instanceof produces a safe message in all cases.
| warn(`Failed to rebuild ${relPath}: ${(err as Error).message} — skipping`); | |
| debug((err as Error).stack ?? String(err)); | |
| const message = err instanceof Error ? err.message : String(err); | |
| warn(`Failed to rebuild ${relPath}: ${message} — skipping`); | |
| debug(err instanceof Error ? (err.stack ?? message) : String(err)); |
There was a problem hiding this comment.
Fixed in d2db316 — replaced (err as Error).message with instanceof Error narrowing so a non-Error throw logs a usable message instead of "undefined".
| const remaining = db | ||
| .prepare( | ||
| `SELECT COUNT(*) AS c FROM embeddings e | ||
| JOIN nodes n ON e.node_id = n.id | ||
| WHERE n.file = ?`, | ||
| ) | ||
| .get('shared/constants.js') as { c: number }; | ||
| expect(remaining.c).toBe(0); |
There was a problem hiding this comment.
Secondary assertion does not verify the seeded row was deleted
The query JOIN nodes n ON e.node_id = n.id WHERE n.file = ? counts embeddings that reference currently existing nodes for shared/constants.js. After rebuildFile the old node IDs are gone, so even if the embedding row were left as an orphan the JOIN would find nothing and the count would still be 0. A more reliable check is to count all rows in embeddings (there was exactly one seeded row) or use WHERE node_id = ? with the known seeded ID, so the assertion actually fails if the row survives as an orphan.
There was a problem hiding this comment.
Fixed in d2db316 — switched the assertion to SELECT COUNT(*) FROM embeddings (exactly one row was seeded), so it now catches the orphan-row regression the JOIN check missed.
…1182) The watcher's local getNodeId.get used a typed parameter list (name, kind, file, line) but IncrementalStmts declares get: (...params: unknown[]). TS2322 rejected the narrower form, which broke the build job and all downstream CI checks. Switch to (...params: unknown[]) and destructure inside the body so the implementation matches the interface contract.
Addresses Greptile P2 review feedback: - src/domain/graph/watcher.ts: replace `(err as Error).message` with `instanceof Error` narrowing so a non-Error throw (plain string, null, or any value from a third-party dependency) logs a usable message instead of "undefined". - tests/integration/watcher-fk-embeddings.test.ts: count all rows in `embeddings` directly instead of joining on `nodes.file`. The JOIN-based check would pass even if the seeded row survived as an orphan with a dangling node_id; counting the table itself catches that regression.
Codegraph Impact Analysis3 functions changed → 7 callers affected across 2 files
|
Summary
rebuildFilerolled its ownpurgeAncillaryDatahelper that purgedcfg_*,dataflow,complexity,node_metrics, andast_nodes— but notembeddings. Withcodegraph embedpopulated and better-sqlite3's defaultPRAGMA foreign_keys = ON, the subsequentDELETE FROM nodesfiredSQLITE_CONSTRAINT_FOREIGNKEY(embeddings.node_id→nodes.id) and killed the long-running watcher process.rebuildFileto the sharedpurgeFileDatahelper already used by the main builder — it deletes from all child tables (includingembeddings) in the correct FK-safe order, thenedges, thennodes. Drop the now-unuseddeleteEdgesForFile/deleteNodes/countEdgesForFileplumbing fromIncrementalStmtsand the watcher's statement prep.rebuildFilecalls inprocessPendingFileswith try/catch so any future SQLite/parse/filesystem error logs and skips instead of crashing the watcher. The watcher is a long-running session — one bad file should not bring it down.Test plan
tests/integration/watcher-fk-embeddings.test.tsreproduces the failure (seeds anembeddingsrow, enablesPRAGMA foreign_keys = ON, callsrebuildFile) and asserts no FK violation + that the embedding row is purged alongside the rebuilt file's nodes.tests/integration/watcher-rebuild.test.tsstill passes with the slimmerIncrementalStmtsshape.npm test— 2732 passed, 24 skipped, 0 failed.npm run lint— clean.Closes #1176