fix: Neo4j vector search pre-filter and metadata handling#1359
Merged
CaralHsi merged 6 commits intoMemTensor:dev-20260323-v2.0.11from Mar 27, 2026
Merged
fix: Neo4j vector search pre-filter and metadata handling#1359CaralHsi merged 6 commits intoMemTensor:dev-20260323-v2.0.11from
CaralHsi merged 6 commits intoMemTensor:dev-20260323-v2.0.11from
Conversation
Made-with: Cursor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Fix Neo4j vector search returning empty results in shared-database multi-tenant mode, along with several related metadata handling bugs.
Core fix: Replace post-filter approach (
db.index.vector.queryNodes+WHERE) with Neo4j 5.18+ pre-filtering (MATCH+WHERE+vector.similarity.cosine()). This ensures filters are applied before similarity computation, so target user's nodes are never excluded by a global top-k truncation.Additional fixes:
metadata["sources"]→metadata.get("sources")to preventKeyErrorwhensourcesis absent_parse_nodesources deserialization:[0] == "}"→[-1] == "}"(was checking first char instead of last, makingjson.loadsnever execute)neo4j_community.pyadd_node: gracefully skip vector DB insert when embedding isNoneinstead of raisingValueErrorneo4j_community.pyadd_nodes_batch: skipVecDBItemcreation for nodes without embeddingsRelated Issue (Required): Fixes #1360
Type of change
How Has This Been Tested?
Unit tests (6 tests, mocked Neo4j driver):
TestVectorSearchPreFilter: Verify pre-filter path usesMATCH+vector.similarity.cosine()when WHERE clauses present, and ANN path usesqueryNodeswhen no filtersTestSourcesKeyErrorRegression: Verifyadd_nodeand_parse_nodework withoutsourceskeyIntegration tests (3 tests, real Neo4j 5.18+):
TestNeo4jPreFilterIntegration: Insert 50 nodes for other users + 3 for target user, verify search with user_name filter returns all 3 target user nodesRun:
pytest tests/graph_dbs/test_neo4j_vector_search.py -vChecklist
Reviewer Checklist