Skip to content

Commit b710092

Browse files
committed
TermsForPub: add Template / Imaging Technique / Thumbnail columns
Reviewer report: "List all terms that reference [pub]" results table on v2-dev shows only Name | Type | Gross_Type. v2 prod shows Name | Type | Gross_Type | Template_Space | Imaging_Technique | Images. Three columns missing — parity regression from the SOLR enrichment step that was dropped during the VFBquery migration. Pre-migration the legacy XMI chain was: //@dataSources.0/@queries.29 Cypher: get primary.short_form //@dataSources.0/@queries.24 ProcessQuery: pass ids to SOLR //@dataSources.3/@queries.1 SOLR: fetch denormalised record SOLR's individual record had template / imaging-technique / thumbnail fields baked in for any channel_image / anatomy_channel individual. When the chain collapsed to a single VFBquery call, the Cypher kept only id / name / tags / type and never gained the template/technique/thumbnail walk. Mirror the AnatomyExpressedIn pattern (vfb_queries.py:2837-2870) adapted to TermsForPub. Walk keyed off `primary` (the cited Individual) acting as its own channel: OPTIONAL MATCH (primary)-[irw:in_register_with]->(template)-[:depicts]->(template_anat) OPTIONAL MATCH (primary)-[:is_specified_output_of]->(technique) For channel_image primaries: walks fire, columns populate. For non-image primaries (datasets, EPs, anatomies): OPTIONAL MATCHes return null, REPLACE strips `[null](null)` -> '', so template / technique / thumbnail cells render empty. Matches v2 prod's behaviour on Wolff2018 dataset rows (cells declared but blank). LIMIT applies before the CALL subquery so the multi-hop walk only runs on rows we actually return — same idiom as AnatomyExpressedIn and TransgeneExpressionHere. Verified live against pdb on the reviewer-reported case (FBrf0240744, Wolff and Rubin 2018): Row: id=Wolff2018, name=[Splits targetting CX neurons, Wolff2018](Wolff2018), tags=DataSet, type='', template='', technique='', thumbnail='' Row: id=FlyLight2019Wolff2018, name=[Splits targetting CX neurons, Wolff2018](FlyLight2019Wolff2018), tags=DataSet, type='', template='', technique='', thumbnail='' Both rows now return the full 7-column shape with empty trailing cells — exact parity with the v2 prod screenshot. Considered: route through SOLR enrichment instead, mirroring _owlery_query_to_results pattern (Cypher id-fetch -> SOLR batch lookup via {!terms f=id} fq, same as the legacy XMI's queries.24 step). Benchmarked against pdb + solr.virtualflybrain.org on the four pubs with highest non-Unattributed has_reference counts: pub nrefs | Cypher walk | Cypher+SOLR | speedup FBrf0240744 2 | 129 ms | 206 ms | 0.63x FBrf0256843 9 | 132 ms | 251 ms | 0.53x FBrf0219813 3 | 130 ms | 246 ms | 0.53x FBrf0242477 4 | 131 ms | 255 ms | 0.51x Cypher walk wins because round-trip latency dominates at small N and the SOLR approach adds a second HTTPS hop. Pre-computation helps record compute (sub-ms either way), not record fetching. Also: SOLR's anat_query field isn't populated on dataset ids (only on classes), so a single-field SOLR fetch would miss enrichment on the Wolff2018-shape rows that triggered this report. Per-type field dispatch (anat_query for Class, anat_image_query for Individual image, no enrichment for DataSet) would widen the gap. If a future pub grows TermsForPub past ~100 refs the trade flips. Revisit then. Patch bump 1.14.9 -> 1.14.10. @with_solr_cache('terms_for_pub') buckets invalidated implicitly by the function source-hash change (no bucket rename per [feedback_cache_invalidation]).
1 parent 80c43e6 commit b710092

2 files changed

Lines changed: 71 additions & 11 deletions

File tree

setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33

44
here = path.abspath(path.dirname(__file__))
55

6-
__version__ = "1.14.9"
6+
__version__ = "1.14.10"
77

88
# Get the long description from the README file
99
with open(path.join(here, 'README.md')) as f:

src/vfbquery/vfb_queries.py

Lines changed: 70 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -5404,22 +5404,82 @@ def get_all_datasets(return_dataframe=True, limit: int = -1):
54045404
# ===== Publication Query =====
54055405

54065406
def get_terms_for_pub(pub_short_form: str, return_dataframe=True, limit: int = -1):
5407-
"""List all terms that reference a publication."""
5407+
"""List all terms that reference a publication.
5408+
5409+
v1.14.10: add Template_Space / Imaging_Technique / Images columns to
5410+
restore v2 prod parity. Pre-migration the chain was Cypher id-fetch
5411+
-> SOLR enrichment (queries.29 -> queries.24 -> dataSources.3/@queries.1
5412+
in the legacy XMI). SOLR's denormalised record had template / imaging-
5413+
technique / images baked in for image-bearing individuals. The
5414+
VFBquery rewrite kept only the Cypher id-fetch and dropped the
5415+
enrichment. Mirror the AnatomyExpressedIn CALL subquery pattern,
5416+
keyed off primary acting as its own channel for channel_image
5417+
primaries; non-image primaries (datasets, EPs, anatomies) leave
5418+
these cells empty — matches v2 prod which shows the columns blank
5419+
on dataset rows (e.g. Wolff2018).
5420+
"""
54085421
count_query = f"MATCH (:pub:Individual {{short_form:'{pub_short_form}'}})<-[:has_reference]-(primary:Individual) RETURN count(DISTINCT primary) AS count"
54095422
count_results = vc.nc.commit_list([count_query])
54105423
total_count = get_dict_cursor()(count_results)[0]['count'] if count_results else 0
5411-
5412-
main_query = f"""MATCH (:pub:Individual {{short_form:'{pub_short_form}'}})<-[:has_reference]-(primary:Individual)
5424+
5425+
# Apply LIMIT before the CALL subquery fires so the multi-hop walk
5426+
# only runs on the rows we actually return — same pattern as
5427+
# AnatomyExpressedIn / TransgeneExpressionHere.
5428+
limit_clause = f"LIMIT {limit}" if limit != -1 else ""
5429+
main_query = f"""
5430+
MATCH (:pub:Individual {{short_form:'{pub_short_form}'}})<-[:has_reference]-(primary:Individual)
54135431
OPTIONAL MATCH (primary)-[:INSTANCEOF]->(typ:Class)
5414-
RETURN DISTINCT primary.short_form AS id, '[' + primary.label + '](https://v2.virtualflybrain.org/org.geppetto.frontend/geppetto?id=' + primary.short_form + ')' AS name, apoc.text.join(coalesce(primary.uniqueFacets, []), '|') AS tags, typ.label AS type"""
5415-
if limit != -1: main_query += f" LIMIT {limit}"
5416-
5432+
WITH DISTINCT primary, typ
5433+
ORDER BY primary.label
5434+
{limit_clause}
5435+
CALL {{
5436+
// primary is the channel itself when it's a channel_image —
5437+
// walk to its template alignment + imaging technique.
5438+
// For non-image primaries (dataset, EP, anatomy) these
5439+
// OPTIONAL MATCHes return null and the row's
5440+
// template / technique / thumbnail cells render empty,
5441+
// matching v2 prod's behaviour on dataset rows.
5442+
WITH primary
5443+
OPTIONAL MATCH (primary)-[irw:in_register_with]->(template:Individual)-[:depicts]->(template_anat:Individual)
5444+
OPTIONAL MATCH (primary)-[:is_specified_output_of]->(technique:Class)
5445+
WITH primary, template, template_anat, technique, irw
5446+
LIMIT 1
5447+
RETURN template, template_anat, technique, irw
5448+
}}
5449+
RETURN
5450+
primary.short_form AS id,
5451+
apoc.text.format("[%s](%s)", [primary.label, primary.short_form]) AS name,
5452+
apoc.text.join(coalesce(primary.uniqueFacets, []), '|') AS tags,
5453+
coalesce(typ.label, '') AS type,
5454+
REPLACE(apoc.text.format("[%s](%s)", [COALESCE(template_anat.symbol[0], template_anat.label), template_anat.short_form]), '[null](null)', '') AS template,
5455+
coalesce(technique.label, '') AS technique,
5456+
REPLACE(apoc.text.format("[![%s](%s '%s')](%s)", [COALESCE(primary.symbol[0], coalesce(primary.label, 'image')) + " aligned to " + COALESCE(template_anat.symbol[0], template_anat.label), REPLACE(COALESCE(irw.thumbnail[0], ''), 'thumbnailT.png', 'thumbnail.png'), COALESCE(primary.symbol[0], coalesce(primary.label, 'image')) + " aligned to " + COALESCE(template_anat.symbol[0], template_anat.label), template_anat.short_form + "," + primary.short_form]), "[![null]( 'null')](null)", "") AS thumbnail
5457+
"""
5458+
54175459
results = vc.nc.commit_list([main_query])
54185460
df = pd.DataFrame.from_records(get_dict_cursor()(results))
5419-
if not df.empty: df = encode_markdown_links(df, ['name'])
5420-
5421-
if return_dataframe: return df
5422-
return {"headers": {"id": {"title": "ID", "type": "selection_id", "order": -1}, "name": {"title": "Term", "type": "markdown", "order": 0}, "tags": {"title": "Tags", "type": "tags", "order": 1}, "type": {"title": "Type", "type": "text", "order": 2}}, "rows": [{key: row[key] for key in ["id", "name", "tags", "type"]} for row in safe_to_dict(df, sort_by_id=False)], "count": total_count}
5461+
if not df.empty:
5462+
df = encode_markdown_links(df, ['name', 'template', 'thumbnail'])
5463+
5464+
if return_dataframe:
5465+
return df
5466+
5467+
return {
5468+
"headers": {
5469+
"id": {"title": "ID", "type": "selection_id", "order": -1},
5470+
"name": {"title": "Term", "type": "markdown", "order": 0},
5471+
"tags": {"title": "Tags", "type": "tags", "order": 1},
5472+
"type": {"title": "Type", "type": "text", "order": 2},
5473+
"template": {"title": "Template", "type": "markdown", "order": 3},
5474+
"technique": {"title": "Imaging Technique", "type": "text", "order": 4},
5475+
"thumbnail": {"title": "Thumbnail", "type": "markdown", "order": 9},
5476+
},
5477+
"rows": [
5478+
{k: row[k] for k in ["id", "name", "tags", "type", "template", "technique", "thumbnail"]}
5479+
for row in safe_to_dict(df, sort_by_id=False)
5480+
],
5481+
"count": total_count,
5482+
}
54235483

54245484

54255485
# ===== Complex Transgene Expression Query =====

0 commit comments

Comments
 (0)