Skip to content

Commit 2ac1787

Browse files
committed
Restructure get_similar_neurons with CALL subqueries
Reviewer flagged a cartesian-product risk in the previous version: chained OPTIONAL MATCHes for (xref, site), (channel, ri, templ), (technique), and (typ) compounded into N × M × K rows per n2 before RETURN DISTINCT collapsed them. On densely-typed or multi-aligned neurons this is wasteful, and `type` was being aggregated within the (channel, technique) grouping rather than once per n2. Refactored each optional branch into a CALL subquery scoped to n2: CALL { WITH n2 OPTIONAL MATCH (n2)-[:INSTANCEOF]->(typ:Class) RETURN apoc.text.join([l IN collect(...) WHERE ...], '|') AS type } CALL { WITH n2 OPTIONAL MATCH (n2)-[rx:database_cross_reference]->(site:Site) WHERE site.is_data_source WITH rx, site LIMIT 1 RETURN rx, site } CALL { WITH n2 OPTIONAL MATCH (n2)<-[:depicts]-(channel)-[ri:in_register_with]->... OPTIONAL MATCH (channel)-[:is_specified_output_of]->(technique:Class) WITH ri, templ, technique LIMIT 1 RETURN ri, templ, technique } `type` aggregates internally (always 1 row, no LIMIT needed). Cross-ref and alignment pick a single representative — matches what the v2 row needs and what prod's `apoc.cypher.run('... LIMIT 5'/'10')` pattern already does inside the XMI. `WITH DISTINCT r, n2` upfront drops the c1/c2 cartesian from the INSTANCEOF anchors. RETURN no longer needs DISTINCT — the row key is guaranteed unique by construction.
1 parent 66b8dc3 commit 2ac1787

1 file changed

Lines changed: 30 additions & 12 deletions

File tree

src/vfbquery/vfb_queries.py

Lines changed: 30 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -2463,19 +2463,37 @@ def get_similar_neurons(neuron, similarity_score='NBLAST_score', return_datafram
24632463
# matches v2 prod's `Type` column from SOLR's `types` collection
24642464
# - template = `[symbol](short_form)` markdown of the alignment template
24652465
# - technique = imaging technique label (channel -[:is_specified_output_of]-> Class)
2466-
main_query = f"""MATCH (c1:Class)<-[:INSTANCEOF]-(n1)-[r:has_similar_morphology_to]-(n2)-[:INSTANCEOF]->(c2:Class)
2466+
#
2467+
# Each OPTIONAL branch is wrapped in a CALL subquery so the outer query
2468+
# carries one row per n2 throughout. Without this, an n2 with N
2469+
# cross-references × M alignments × K types would produce N×M×K rows
2470+
# that DISTINCT then collapses at the end — wasteful, especially on
2471+
# densely-typed neurons. Each subquery either aggregates (for `type`)
2472+
# or LIMIT 1s (for the single representative cross-ref / alignment
2473+
# the V2 row needs), so n2 stays the row key end-to-end.
2474+
main_query = f"""MATCH (c1:Class)<-[:INSTANCEOF]-(n1:Individual)-[r:has_similar_morphology_to]-(n2:Individual)-[:INSTANCEOF]->(c2:Class)
24672475
WHERE n1.short_form = '{neuron}' and exists(r.{similarity_score})
2468-
WITH c1, n1, r, n2, c2
2469-
OPTIONAL MATCH (n2)-[rx:database_cross_reference]->(site:Site)
2470-
WHERE site.is_data_source
2471-
WITH n2, r, c2, rx, site
2472-
OPTIONAL MATCH (n2)<-[:depicts]-(channel:Individual)-[ri:in_register_with]->(:Template)-[:depicts]->(templ:Template)
2473-
OPTIONAL MATCH (channel)-[:is_specified_output_of]->(technique:Class)
2474-
WITH n2, r, c2, rx, site, channel, ri, templ, technique
2475-
OPTIONAL MATCH (n2)-[:INSTANCEOF]->(typ:Class)
2476-
WITH n2, r, rx, site, channel, ri, templ, technique,
2477-
apoc.text.join([l IN collect(DISTINCT typ.label) WHERE l IS NOT NULL AND l <> ''], '|') AS type
2478-
RETURN DISTINCT n2.short_form as id,
2476+
WITH DISTINCT r, n2
2477+
CALL {{
2478+
WITH n2
2479+
OPTIONAL MATCH (n2)-[:INSTANCEOF]->(typ:Class)
2480+
RETURN apoc.text.join([l IN collect(DISTINCT typ.label) WHERE l IS NOT NULL AND l <> ''], '|') AS type
2481+
}}
2482+
CALL {{
2483+
WITH n2
2484+
OPTIONAL MATCH (n2)-[rx:database_cross_reference]->(site:Site)
2485+
WHERE site.is_data_source
2486+
WITH rx, site LIMIT 1
2487+
RETURN rx, site
2488+
}}
2489+
CALL {{
2490+
WITH n2
2491+
OPTIONAL MATCH (n2)<-[:depicts]-(channel:Individual)-[ri:in_register_with]->(:Template)-[:depicts]->(templ:Template)
2492+
OPTIONAL MATCH (channel)-[:is_specified_output_of]->(technique:Class)
2493+
WITH ri, templ, technique LIMIT 1
2494+
RETURN ri, templ, technique
2495+
}}
2496+
RETURN n2.short_form as id,
24792497
apoc.text.format("[%s](%s)", [n2.label, n2.short_form]) AS name,
24802498
r.{similarity_score}[0] AS score,
24812499
apoc.text.join(coalesce(n2.uniqueFacets, []), '|') AS tags,

0 commit comments

Comments
 (0)