Fix HNSW bidirectional edge creation during insert/reconnect#846
Open
himanalot wants to merge 11 commits into
Open
Fix HNSW bidirectional edge creation during insert/reconnect#846himanalot wants to merge 11 commits into
himanalot wants to merge 11 commits into
Conversation
PREFILTER:
- Enables filtering during HNSW traversal (pre-filtering) vs post-filtering
- More efficient for selective filters
- Syntax: SearchV<Type>(vector, limit, PREFILTER(condition))
RebuildHNSWIndex:
- Rebuilds HNSW graph edges without re-generating embeddings
- Fixes disconnected graphs after delete/re-add operations
- POST /RebuildHNSWIndex returns {"status": "success", "vectors_rebuilt": N}
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Changed HELIX_REPO_URL in helix-cli to pull from himanalot/helix-db instead of the official HelixDB/helix-db. This includes PREFILTER support and RebuildHNSWIndex without manual file copying. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- install.sh: REPO variable - github_issue.rs: Issue submission URL - update.rs: Releases API URL - init.rs: Comment URL Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The PREFILTER was not working because properties were expanded into the HVector AFTER the filter check. This meant filter closures calling val.get_property() always received None, causing all vectors to fail the filter. Fix: Move expand_from_vector_without_data() before the filter check so that item.get_property() returns the correct values. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Adds a new builtin endpoint that checks HNSW graph connectivity and identifies unreachable vectors. Supports two modes: - quick: samples N vectors and verifies they appear in search results - full: BFS traversal from entry point to find disconnected vectors Returns health status (healthy/degraded/broken), unreachable vector IDs, and diagnostic metrics. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add PurgeOrphanVectors: finds and deletes vectors without corresponding DB nodes - Improve RebuildHNSWIndex: better progress logging near end, tracks reconnected vs skipped Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…nodes_db The purge logic was incorrectly checking nodes_db for node existence, but AddV creates vectors in vector_properties_db, not nodes. This caused all vectors to be incorrectly identified as orphans. Now checks if vector has properties in vector_properties_db. An orphan is a vector in HNSW index with no corresponding properties entry. Also adds soft_deleted_count to response to show vectors marked as deleted. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…leanup
- Add hard_delete method to HNSW trait that completely removes vectors
from all databases (vector data, properties, and edges)
- Add purge_soft_deleted option to PurgeOrphanVectors endpoint
- When purge_soft_deleted=true, also hard-deletes soft-deleted vectors
- Useful for reclaiming space after deletions
Example usage:
{"purge_soft_deleted": true} - hard delete all soft-deleted vectors
{"dry_run": true} - count without deleting
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…nd codes
Checks for:
1. Duplicate embeddings - same vector data with different IDs (uses fingerprint of first N dims)
2. Duplicate codes - same code property appearing on multiple vectors
Usage: POST /HNSWDuplicateCheck {"fingerprint_dims": 32, "max_duplicates": 100}
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When inserting or reconnecting a vector, the algorithm updates each neighbor's connection list. However, the new vector was never added to the candidate set before calling select_neighbors, so it could never be selected as a neighbor - resulting in one-way edges only. This caused vectors to become unreachable (especially the last batch during rebuild, since no subsequent insertions could fix them). The fix adds the new vector to the candidate set before selection, ensuring it can be chosen as a neighbor and creating proper bidirectional edges. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
| # Cross-platform installer for Helix CLI | ||
|
|
||
| readonly REPO="HelixDB/helix-db" | ||
| readonly REPO="himanalot/helix-db" |
Contributor
There was a problem hiding this comment.
repository changed from official HelixDB/helix-db to personal fork himanalot/helix-db - should point to main org repo
Suggested change
| readonly REPO="himanalot/helix-db" | |
| readonly REPO="HelixDB/helix-db" |
| // Development flag - set to true when working on V2 locally | ||
| const DEV_MODE: bool = cfg!(debug_assertions); | ||
| const HELIX_REPO_URL: &str = "https://github.com/helixdb/helix-db.git"; | ||
| const HELIX_REPO_URL: &str = "https://github.com/himanalot/helix-db.git"; |
Contributor
There was a problem hiding this comment.
repository URL changed to personal fork - should use official helixdb/helix-db repository
Suggested change
| const HELIX_REPO_URL: &str = "https://github.com/himanalot/helix-db.git"; | |
| const HELIX_REPO_URL: &str = "https://github.com/helixdb/helix-db.git"; |
| // For more information on how to write queries, | ||
| // see the documentation at https://docs.helix-db.com | ||
| // or checkout our GitHub at https://github.com/HelixDB/helix-db | ||
| // or checkout our GitHub at https://github.com/himanalot/helix-db |
Contributor
There was a problem hiding this comment.
GitHub URL changed to personal fork - should reference official HelixDB organization
Suggested change
| // or checkout our GitHub at https://github.com/himanalot/helix-db | |
| // or checkout our GitHub at https://github.com/HelixDB/helix-db |
|
|
||
| /// The base URL for creating new GitHub issues. | ||
| pub const GITHUB_ISSUE_URL: &str = "https://github.com/helixdb/helix-db/issues/new"; | ||
| pub const GITHUB_ISSUE_URL: &str = "https://github.com/himanalot/helix-db/issues/new"; |
Contributor
There was a problem hiding this comment.
issue URL changed to personal fork - should use official HelixDB repository
Suggested change
| pub const GITHUB_ISSUE_URL: &str = "https://github.com/himanalot/helix-db/issues/new"; | |
| pub const GITHUB_ISSUE_URL: &str = "https://github.com/helixdb/helix-db/issues/new"; |
|
|
||
| const CURRENT_VERSION: &str = env!("CARGO_PKG_VERSION"); | ||
| const GITHUB_API_URL: &str = "https://api.github.com/repos/helixdb/helix-db/releases/latest"; | ||
| const GITHUB_API_URL: &str = "https://api.github.com/repos/himanalot/helix-db/releases/latest"; |
Contributor
There was a problem hiding this comment.
GitHub API URL changed to personal fork - should query official helixdb repository
Suggested change
| const GITHUB_API_URL: &str = "https://api.github.com/repos/himanalot/helix-db/releases/latest"; | |
| const GITHUB_API_URL: &str = "https://api.github.com/repos/helixdb/helix-db/releases/latest"; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
insertand 3 lines inreconnect_vectorto include new vector in candidatesThe Bug
When inserting or reconnecting a vector:
select_neighborsThe Fix
Test plan
🤖 Generated with Claude Code
Greptile Overview
Greptile Summary
This PR fixes a critical HNSW graph connectivity bug and adds comprehensive diagnostic/maintenance tooling. The core fix ensures bidirectional edges by adding new vectors to the candidate set before neighbor selection (lines 662-665 and 743-746 in
vector_core.rs). Without this, neighbors couldn't select the new vector, causing one-way edges and unreachable nodes.Key Changes:
insertandreconnect_vectormethods by adding 3 lines eachhard_delete,reconnect_vector, andget_all_vector_idsmethods to support index maintenanceHNSWDiagnostics,RebuildHNSWIndex,PurgeOrphanVectors,HNSWDuplicateCheckutils.rs:134)pre_filterparameter toSearchVgrammar ruleCritical Issues:
helixdb/helix-dbto personal forkhimanalot/helix-db- these must be reverted before merging to avoid redirecting users to a personal fork for installs, updates, and issue reportingVerdict: The HNSW bug fix is solid and well-tested (0 unreachable vectors after rebuild vs ~10 before). However, the repository URL changes are blockers that must be corrected.
Important Files Changed
Sequence Diagram
sequenceDiagram participant Client as New Vector participant HNSW as HNSW Insert/Reconnect participant Neighbor as Existing Neighbor participant SelectNeighbors as select_neighbors Note over Client,HNSW: Step 1: New vector connects to neighbors HNSW->>HNSW: Find k nearest neighbors HNSW->>HNSW: Set outgoing edges to neighbors Note over HNSW,SelectNeighbors: Step 2: Update each neighbor's connections loop For each neighbor HNSW->>Neighbor: Get current connections Note over HNSW,Neighbor: THE FIX: Add new vector to candidate set HNSW->>HNSW: new_vec_copy = query HNSW->>HNSW: Calculate distance to neighbor HNSW->>Neighbor: Push new_vec_copy to candidates HNSW->>SelectNeighbors: select_neighbors(neighbor, candidates) Note over SelectNeighbors: Now new vector CAN be selected<br/>as neighbor's connection SelectNeighbors-->>HNSW: Updated neighbor connections HNSW->>Neighbor: Set new connections (bidirectional!) end Note over Client,Neighbor: Result: Bidirectional edges established<br/>New vector is reachable from neighbors