Skip to content

Fix HNSW bidirectional edge creation during insert/reconnect#846

Open
himanalot wants to merge 11 commits into
HelixDB:mainfrom
himanalot:fix/hnsw-bidirectional-edges
Open

Fix HNSW bidirectional edge creation during insert/reconnect#846
himanalot wants to merge 11 commits into
HelixDB:mainfrom
himanalot:fix/hnsw-bidirectional-edges

Conversation

@himanalot
Copy link
Copy Markdown

@himanalot himanalot commented Jan 31, 2026

Summary

  • Fix bug where new vectors weren't added to candidate set during neighbor updates
  • This caused one-way edges, making vectors unreachable (especially last batch during rebuild)
  • Adds 3 lines in insert and 3 lines in reconnect_vector to include new vector in candidates

The Bug

When inserting or reconnecting a vector:

  1. Vector connects TO its neighbors (outgoing edges) ✓
  2. Each neighbor updates its connections via select_neighbors
  3. BUG: New vector was NOT in the candidate set → could never be selected → no incoming edges

The Fix

// Before selecting neighbors for e, add the new vector to candidates:
let mut new_vec_copy = query;
new_vec_copy.set_distance(new_vec_copy.distance_to(&e)?);
e_conns.push(new_vec_copy);

Test plan

  • Rebuilt HNSW index with 98,186 vectors
  • HNSWDiagnostics shows 0 unreachable (was ~10 before fix)
  • Search returns semantically correct results

🤖 Generated with Claude Code

Greptile Overview

Greptile Summary

This PR fixes a critical HNSW graph connectivity bug and adds comprehensive diagnostic/maintenance tooling. The core fix ensures bidirectional edges by adding new vectors to the candidate set before neighbor selection (lines 662-665 and 743-746 in vector_core.rs). Without this, neighbors couldn't select the new vector, causing one-way edges and unreachable nodes.

Key Changes:

  • Fixed bidirectional edge creation in both insert and reconnect_vector methods by adding 3 lines each
  • Added hard_delete, reconnect_vector, and get_all_vector_ids methods to support index maintenance
  • Added 4 new diagnostic/maintenance endpoints: HNSWDiagnostics, RebuildHNSWIndex, PurgeOrphanVectors, HNSWDuplicateCheck
  • Fixed filter property access by expanding properties before applying filters (utils.rs:134)
  • Added optional pre_filter parameter to SearchV grammar rule

Critical Issues:

  • All CLI files changed repository URLs from official helixdb/helix-db to personal fork himanalot/helix-db - these must be reverted before merging to avoid redirecting users to a personal fork for installs, updates, and issue reporting

Verdict: The HNSW bug fix is solid and well-tested (0 unreachable vectors after rebuild vs ~10 before). However, the repository URL changes are blockers that must be corrected.

Important Files Changed

Filename Overview
helix-cli/install.sh changed repository from HelixDB org to personal fork - must be reverted
helix-cli/src/commands/build.rs changed repository URL to personal fork - must be reverted
helix-cli/src/commands/init.rs changed GitHub URL in documentation to personal fork - must be reverted
helix-cli/src/github_issue.rs changed issue URL to personal fork - must be reverted
helix-cli/src/update.rs changed GitHub API URL to personal fork - must be reverted
helix-db/src/helix_engine/vector_core/vector_core.rs core HNSW bug fix - adds new vector to candidate set for bidirectional edges, plus new methods for hard_delete, reconnect_vector, and get_all_vector_ids
helix-db/src/helix_engine/vector_core/hnsw.rs added trait methods for hard_delete and reconnect_vector with proper documentation

Sequence Diagram

sequenceDiagram
    participant Client as New Vector
    participant HNSW as HNSW Insert/Reconnect
    participant Neighbor as Existing Neighbor
    participant SelectNeighbors as select_neighbors

    Note over Client,HNSW: Step 1: New vector connects to neighbors
    HNSW->>HNSW: Find k nearest neighbors
    HNSW->>HNSW: Set outgoing edges to neighbors

    Note over HNSW,SelectNeighbors: Step 2: Update each neighbor's connections
    loop For each neighbor
        HNSW->>Neighbor: Get current connections
        
        Note over HNSW,Neighbor: THE FIX: Add new vector to candidate set
        HNSW->>HNSW: new_vec_copy = query
        HNSW->>HNSW: Calculate distance to neighbor
        HNSW->>Neighbor: Push new_vec_copy to candidates
        
        HNSW->>SelectNeighbors: select_neighbors(neighbor, candidates)
        Note over SelectNeighbors: Now new vector CAN be selected<br/>as neighbor's connection
        SelectNeighbors-->>HNSW: Updated neighbor connections
        HNSW->>Neighbor: Set new connections (bidirectional!)
    end

    Note over Client,Neighbor: Result: Bidirectional edges established<br/>New vector is reachable from neighbors
Loading

himanalot and others added 11 commits January 28, 2026 02:52
PREFILTER:
- Enables filtering during HNSW traversal (pre-filtering) vs post-filtering
- More efficient for selective filters
- Syntax: SearchV<Type>(vector, limit, PREFILTER(condition))

RebuildHNSWIndex:
- Rebuilds HNSW graph edges without re-generating embeddings
- Fixes disconnected graphs after delete/re-add operations
- POST /RebuildHNSWIndex returns {"status": "success", "vectors_rebuilt": N}

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Changed HELIX_REPO_URL in helix-cli to pull from himanalot/helix-db
instead of the official HelixDB/helix-db. This includes PREFILTER
support and RebuildHNSWIndex without manual file copying.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- install.sh: REPO variable
- github_issue.rs: Issue submission URL
- update.rs: Releases API URL
- init.rs: Comment URL

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The PREFILTER was not working because properties were expanded into
the HVector AFTER the filter check. This meant filter closures calling
val.get_property() always received None, causing all vectors to fail
the filter.

Fix: Move expand_from_vector_without_data() before the filter check
so that item.get_property() returns the correct values.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Adds a new builtin endpoint that checks HNSW graph connectivity and
identifies unreachable vectors. Supports two modes:
- quick: samples N vectors and verifies they appear in search results
- full: BFS traversal from entry point to find disconnected vectors

Returns health status (healthy/degraded/broken), unreachable vector IDs,
and diagnostic metrics.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add PurgeOrphanVectors: finds and deletes vectors without corresponding DB nodes
- Improve RebuildHNSWIndex: better progress logging near end, tracks reconnected vs skipped

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…nodes_db

The purge logic was incorrectly checking nodes_db for node existence,
but AddV creates vectors in vector_properties_db, not nodes. This caused
all vectors to be incorrectly identified as orphans.

Now checks if vector has properties in vector_properties_db. An orphan is
a vector in HNSW index with no corresponding properties entry.

Also adds soft_deleted_count to response to show vectors marked as deleted.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…leanup

- Add hard_delete method to HNSW trait that completely removes vectors
  from all databases (vector data, properties, and edges)
- Add purge_soft_deleted option to PurgeOrphanVectors endpoint
- When purge_soft_deleted=true, also hard-deletes soft-deleted vectors
- Useful for reclaiming space after deletions

Example usage:
  {"purge_soft_deleted": true} - hard delete all soft-deleted vectors
  {"dry_run": true} - count without deleting

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…nd codes

Checks for:
1. Duplicate embeddings - same vector data with different IDs (uses fingerprint of first N dims)
2. Duplicate codes - same code property appearing on multiple vectors

Usage: POST /HNSWDuplicateCheck {"fingerprint_dims": 32, "max_duplicates": 100}

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When inserting or reconnecting a vector, the algorithm updates each
neighbor's connection list. However, the new vector was never added
to the candidate set before calling select_neighbors, so it could
never be selected as a neighbor - resulting in one-way edges only.

This caused vectors to become unreachable (especially the last batch
during rebuild, since no subsequent insertions could fix them).

The fix adds the new vector to the candidate set before selection,
ensuring it can be chosen as a neighbor and creating proper
bidirectional edges.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7 files reviewed, 5 comments

Edit Code Review Agent Settings | Greptile

Comment thread helix-cli/install.sh
# Cross-platform installer for Helix CLI

readonly REPO="HelixDB/helix-db"
readonly REPO="himanalot/helix-db"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

repository changed from official HelixDB/helix-db to personal fork himanalot/helix-db - should point to main org repo

Suggested change
readonly REPO="himanalot/helix-db"
readonly REPO="HelixDB/helix-db"

// Development flag - set to true when working on V2 locally
const DEV_MODE: bool = cfg!(debug_assertions);
const HELIX_REPO_URL: &str = "https://github.com/helixdb/helix-db.git";
const HELIX_REPO_URL: &str = "https://github.com/himanalot/helix-db.git";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

repository URL changed to personal fork - should use official helixdb/helix-db repository

Suggested change
const HELIX_REPO_URL: &str = "https://github.com/himanalot/helix-db.git";
const HELIX_REPO_URL: &str = "https://github.com/helixdb/helix-db.git";

// For more information on how to write queries,
// see the documentation at https://docs.helix-db.com
// or checkout our GitHub at https://github.com/HelixDB/helix-db
// or checkout our GitHub at https://github.com/himanalot/helix-db
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GitHub URL changed to personal fork - should reference official HelixDB organization

Suggested change
// or checkout our GitHub at https://github.com/himanalot/helix-db
// or checkout our GitHub at https://github.com/HelixDB/helix-db


/// The base URL for creating new GitHub issues.
pub const GITHUB_ISSUE_URL: &str = "https://github.com/helixdb/helix-db/issues/new";
pub const GITHUB_ISSUE_URL: &str = "https://github.com/himanalot/helix-db/issues/new";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue URL changed to personal fork - should use official HelixDB repository

Suggested change
pub const GITHUB_ISSUE_URL: &str = "https://github.com/himanalot/helix-db/issues/new";
pub const GITHUB_ISSUE_URL: &str = "https://github.com/helixdb/helix-db/issues/new";

Comment thread helix-cli/src/update.rs

const CURRENT_VERSION: &str = env!("CARGO_PKG_VERSION");
const GITHUB_API_URL: &str = "https://api.github.com/repos/helixdb/helix-db/releases/latest";
const GITHUB_API_URL: &str = "https://api.github.com/repos/himanalot/helix-db/releases/latest";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GitHub API URL changed to personal fork - should query official helixdb repository

Suggested change
const GITHUB_API_URL: &str = "https://api.github.com/repos/himanalot/helix-db/releases/latest";
const GITHUB_API_URL: &str = "https://api.github.com/repos/helixdb/helix-db/releases/latest";

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant