feat: engraph v2.0 — hybrid search, smart chunking, vault profiles#1
Merged
devwhodevs merged 11 commits intomainfrom Mar 24, 2026
Merged
feat: engraph v2.0 — hybrid search, smart chunking, vault profiles#1devwhodevs merged 11 commits intomainfrom
devwhodevs merged 11 commits intomainfrom
Conversation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Port qmd's scored break-point chunking to Rust. Replaces heading-only splitting with a scoring system that finds optimal break points near the token target. Code fence protection prevents splitting inside code blocks. 15% overlap for context continuity. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Every indexed file gets a deterministic #docid (SHA-256 of path, truncated to 6 hex chars). Shown in search results. Supports direct lookup via 'engraph get #abc123'. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add SQLite FTS5 virtual table as second search lane. Populated during indexing alongside vector embeddings. Supports exact keyword matches for ticket IDs, names, and dates that semantic search misses. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reciprocal Rank Fusion combines ranked results from HNSW semantic search and FTS5 keyword search. Supports lane weighting and --explain flag showing per-lane score contributions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Auto-detects vault structure (PARA, folders, flat), wikilinks, frontmatter, tags. Generates vault.toml with detected settings. Interactive configure command for guided customization. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extract embedding into a ModelBackend trait. Existing ONNX embedder implements the trait. Users can configure models in vault.toml and manage them via 'engraph models list/info'. Registry ships with known-good models. Prepare for future GGUF adapter. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Version bump and integration for engraph v2.0: - Smart chunking with break-point scoring (replaces heading-only splitting) - 6-char docid system for quick file reference - FTS5 full-text search lane (BM25 keyword matching) - RRF fusion engine merging semantic + FTS5 results - Vault profile auto-detection (PARA/Folders/Flat, Obsidian/Logseq) - Pluggable ModelBackend trait for future model swapping - All code formatted, clippy clean, 91 tests passing Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Inline string literal into format string to avoid print_literal warning on newer clippy versions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rder Two bugs found during real vault testing: 1. Smart chunker panicked on multi-byte UTF-8 chars (em dash, etc.) when byte offsets from break-point scoring landed inside multi-byte sequences. Fixed by snapping all byte offsets to valid char boundaries before slicing. 2. Schema migration failed on existing v0.1 databases: the SCHEMA constant tried to CREATE INDEX on docid column before migration added it. Moved index creation into the migration path so it runs after the column exists. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
engraph v2.0 upgrades the search engine from pure semantic search to a hybrid system with significantly improved relevance and new vault management capabilities.
What's new
#abc123) shown in search results for quick reference.--explainflag shows per-lane score breakdown.engraph initauto-detects vault structure (PARA/Folders/Flat), type (Obsidian/Logseq/Plain), wikilinks, frontmatter, and tags. Writesvault.tomlfor future configuration.ModelBackendtrait enables future model swapping.engraph models list/infofor model management. Registry ships with known-good models.New CLI commands
Architecture changes
docid,fts,fusion,model,profileadded)docidcolumn onfiles,chunks_ftsFTS5 virtual tableTest plan
cargo test --lib)cargo clippy -- -D warnings)cargo fmt --check)cargo test --test integration --no-run)engraph index ~/vault && engraph search "test query" --explainengraph init ~/vaulton an Obsidian vaultengraph models list🤖 Generated with Claude Code