Skip to content

v0.6.0: Write pipeline with sqlite-vec migration#7

Merged
devwhodevs merged 20 commits intomainfrom
worktree-v0.6-write-pipeline
Mar 25, 2026
Merged

v0.6.0: Write pipeline with sqlite-vec migration#7
devwhodevs merged 20 commits intomainfrom
worktree-v0.6-write-pipeline

Conversation

@devwhodevs
Copy link
Copy Markdown
Owner

Summary

  • sqlite-vec migration: Replaced hnsw_rs with sqlite-vec for vector search — all data (metadata, chunks, vectors, FTS5, edges) unified in a single SQLite database. DB size dropped from 18MB to 9.6MB. HNSW rebuild eliminated — vectors are immediately queryable after insert.
  • Write pipeline: 5-step pipeline (content analysis → tag resolution → link discovery → folder placement → atomic write) with 6 new MCP tools: create, append, update_metadata, move_note, archive, unarchive
  • Tag registry: Fuzzy Levenshtein resolution against existing tags prevents duplicates
  • Link discovery: Auto-detects note names and aliases in content, converts to [[wikilinks]]
  • Folder placement: Type-based rules → precomputed semantic centroids → inbox fallback
  • Archive/unarchive: Soft-delete moves notes to 04-Archive/, removes from index. Indexer auto-excludes archive folder. Unarchive restores to original location and re-indexes.
  • Safety: All writes use BEGIN IMMEDIATE SQLite transactions + temp-file-then-rename. Mtime conflict detection prevents overwriting external edits. Crash recovery cleans up orphan .tmp files on startup.

Stats

Metric v0.5.0 v0.6.0
Modules 14 21
Tests 146 190
Lines ~8,500 ~10,800
MCP tools 7 13
Vector backend hnsw_rs sqlite-vec

Test plan

  • 190 unit tests passing
  • 3 integration tests (create searchable, append indexed, conflict detection)
  • Clippy clean (-D warnings)
  • cargo fmt --check clean
  • Live-tested against personal vault: create → search → append → archive → unarchive cycle verified
  • CI (cargo test + clippy on macOS + Ubuntu)

🤖 Generated with Claude Code

devwhodevs and others added 20 commits March 25, 2026 10:14
Wrap sqlite-vec for vector search, replacing HNSW-based approach.
Provides init, insert, delete, search (with tombstone filtering),
and clear operations on a vec0 virtual table. Includes 5 unit tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All search code paths now use store.search_vec() instead of
HnswIndex::search(). The hnsw module remains but is unused — deletion
is deferred to Task 5.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove HnswIndex import and HNSW rebuild steps (11-12)
- Insert vectors into vec0 table during chunk write loop
- Delete from vec0 when files are deleted or changed
- Clear vec0 on full rebuild
- Use store.next_vector_id() instead of scanning all vectors
- Add folder centroid computation and storage after indexing
- Add folder_centroids table migration and upsert/get methods in Store

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…sqlite-vec

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds `migrate_vectors_to_vec0()` which copies BLOB vectors from
`chunks.vector` into the `chunks_vec` vec0 virtual table. Called from
`init()` after `init_vec_table()` so the virtual table is guaranteed
to exist. No-ops when vec0 is already populated or no BLOBs are present.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Scans note content for potential wikilink targets using exact filename
and alias matching. Supports case-insensitive search, word boundary
checking, existing wikilink skipping, and longest-match-first priority.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three-strategy cascade: type-based rules (person/daily/workout + content
pattern detection) → semantic centroid matching against precomputed folder
embeddings → inbox fallback. 12 tests covering all strategies.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…d move

Implements the writer module that ties together content analysis, tag
resolution, link discovery, folder placement, and atomic write+index.

- CreateNoteInput: 5-step pipeline (filename, tags, links, placement, write)
- AppendInput: append content with mtime conflict detection
- UpdateMetadataInput: frontmatter-only updates without re-chunking
- move_note: relocate files with store record updates
- All writes use temp+rename for atomicity with transaction rollback
- Pre-computes embeddings before holding DB lock
- Adds Store::resolve_file() for path/basename/#docid resolution
- Adds time crate for date formatting

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extends the MCP server with 4 write tools that expose the writer module
pipeline to Claude Code clients, completing the read-write tool surface.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds `engraph write create` and `engraph write append` subcommands backed
by the writer module pipeline. Both support --content flag or stdin for
content input, with --json output mode.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Scans the vault for leftover `.md.tmp` files on both `engraph index` and
`engraph serve` startup, removing any that survived a previous crash mid-write.
Logs the count if any are removed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three #[ignore] tests covering create_note searchability, append index
update, and mtime conflict detection. Run with:
  cargo test --test write_pipeline -- --ignored

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Remove redundant tombstone writes from indexer (delete_vec handles it).
Replace tombstone loading in search with empty set. Fix clippy warning
in writer.rs. Apply cargo fmt across all modules. Bump version to 0.6.0.
Update CLAUDE.md with 19 modules, 190 tests, write pipeline docs, and
sqlite-vec architecture.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- archive: moves note to 04-Archive/, adds archived frontmatter, removes from index
- unarchive: restores to original location (via archived_from), re-indexes
- indexer auto-excludes archive folder during walks
- MCP tools: archive, unarchive (13 total tools now)
- CLI: engraph write archive/unarchive

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…r, incremental centroids, orphan cleanup, tag queries

- Gap 1: Add suggestion field to PlacementResult; add ticket ID detection
  (BRE-XXXX/DRIFT-XXX), meeting note detection, decision type_hint
- Gap 2: Inject suggested_folder frontmatter when semantic placement finds
  a below-threshold match during inbox fallback
- Gap 3: Incrementally update folder centroids after each note creation
  (weighted merge with existing centroid)
- Gap 4: Add verify_index_integrity() to clean orphan DB entries for files
  that no longer exist on disk; called on index and serve startup
- Gap 5: Add agent_created_tags(), low_usage_tags(), stale_tags() queries
  to store for tag hygiene tooling

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@devwhodevs devwhodevs merged commit 6250f94 into main Mar 25, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant