TermsForPub: add Template / Imaging Technique / Thumbnail columns

Robbie1977 · Robbie1977 · commit b71009275481 · 2026-06-03T09:56:43.000+01:00
Reviewer report: "List all terms that reference [pub]" results
table on v2-dev shows only Name | Type | Gross_Type. v2 prod shows
Name | Type | Gross_Type | Template_Space | Imaging_Technique |
Images. Three columns missing — parity regression from the SOLR
enrichment step that was dropped during the VFBquery migration.

Pre-migration the legacy XMI chain was:

  //@dataSources.0/@queries.29   Cypher: get primary.short_form
  //@dataSources.0/@queries.24   ProcessQuery: pass ids to SOLR
  //@dataSources.3/@queries.1    SOLR: fetch denormalised record

SOLR's individual record had template / imaging-technique /
thumbnail fields baked in for any channel_image / anatomy_channel
individual. When the chain collapsed to a single VFBquery call,
the Cypher kept only id / name / tags / type and never gained the
template/technique/thumbnail walk.

Mirror the AnatomyExpressedIn pattern (vfb_queries.py:2837-2870)
adapted to TermsForPub. Walk keyed off `primary` (the cited
Individual) acting as its own channel:

  OPTIONAL MATCH (primary)-[irw:in_register_with]-&gt;(template)-[:depicts]-&gt;(template_anat)
  OPTIONAL MATCH (primary)-[:is_specified_output_of]-&gt;(technique)

For channel_image primaries: walks fire, columns populate.
For non-image primaries (datasets, EPs, anatomies): OPTIONAL
MATCHes return null, REPLACE strips `[null](null)` -&gt; '', so
template / technique / thumbnail cells render empty. Matches
v2 prod's behaviour on Wolff2018 dataset rows (cells declared
but blank).

LIMIT applies before the CALL subquery so the multi-hop walk only
runs on rows we actually return — same idiom as AnatomyExpressedIn
and TransgeneExpressionHere.

Verified live against pdb on the reviewer-reported case
(FBrf0240744, Wolff and Rubin 2018):

  Row: id=Wolff2018, name=[Splits targetting CX neurons, Wolff2018](Wolff2018),
       tags=DataSet, type='', template='', technique='', thumbnail=''
  Row: id=FlyLight2019Wolff2018, name=[Splits targetting CX neurons,
       Wolff2018](FlyLight2019Wolff2018), tags=DataSet, type='',
       template='', technique='', thumbnail=''

Both rows now return the full 7-column shape with empty trailing
cells — exact parity with the v2 prod screenshot.

Considered: route through SOLR enrichment instead, mirroring
_owlery_query_to_results pattern (Cypher id-fetch -&gt; SOLR batch
lookup via {!terms f=id} fq, same as the legacy XMI's queries.24
step). Benchmarked against pdb + solr.virtualflybrain.org on the
four pubs with highest non-Unattributed has_reference counts:

  pub             nrefs | Cypher walk | Cypher+SOLR | speedup
  FBrf0240744         2 |     129 ms  |   206 ms    |   0.63x
  FBrf0256843         9 |     132 ms  |   251 ms    |   0.53x
  FBrf0219813         3 |     130 ms  |   246 ms    |   0.53x
  FBrf0242477         4 |     131 ms  |   255 ms    |   0.51x

Cypher walk wins because round-trip latency dominates at small N
and the SOLR approach adds a second HTTPS hop. Pre-computation
helps record compute (sub-ms either way), not record fetching.

Also: SOLR's anat_query field isn't populated on dataset ids
(only on classes), so a single-field SOLR fetch would miss
enrichment on the Wolff2018-shape rows that triggered this
report. Per-type field dispatch (anat_query for Class,
anat_image_query for Individual image, no enrichment for
DataSet) would widen the gap.

If a future pub grows TermsForPub past ~100 refs the trade
flips. Revisit then.

Patch bump 1.14.9 -&gt; 1.14.10. @with_solr_cache('terms_for_pub')
buckets invalidated implicitly by the function source-hash
change (no bucket rename per [feedback_cache_invalidation]).
diff --git a/setup.py b/setup.py
@@ -3,7 +3,7 @@
 
 here = path.abspath(path.dirname(__file__))
 
-__version__ = "1.14.9"
+__version__ = "1.14.10"
 
 # Get the long description from the README file
 with open(path.join(here, 'README.md')) as f:
diff --git a/src/vfbquery/vfb_queries.py b/src/vfbquery/vfb_queries.py
@@ -5404,22 +5404,82 @@ def get_all_datasets(return_dataframe=True, limit: int = -1):
 # ===== Publication Query =====
 
 def get_terms_for_pub(pub_short_form: str, return_dataframe=True, limit: int = -1):
-    """List all terms that reference a publication."""
+    """List all terms that reference a publication.
+
+    v1.14.10: add Template_Space / Imaging_Technique / Images columns to
+    restore v2 prod parity. Pre-migration the chain was Cypher id-fetch
+    -> SOLR enrichment (queries.29 -> queries.24 -> dataSources.3/@queries.1
+    in the legacy XMI). SOLR's denormalised record had template / imaging-
+    technique / images baked in for image-bearing individuals. The
+    VFBquery rewrite kept only the Cypher id-fetch and dropped the
+    enrichment. Mirror the AnatomyExpressedIn CALL subquery pattern,
+    keyed off primary acting as its own channel for channel_image
+    primaries; non-image primaries (datasets, EPs, anatomies) leave
+    these cells empty — matches v2 prod which shows the columns blank
+    on dataset rows (e.g. Wolff2018).
+    """
     count_query = f"MATCH (:pub:Individual {{short_form:'{pub_short_form}'}})<-[:has_reference]-(primary:Individual) RETURN count(DISTINCT primary) AS count"
     count_results = vc.nc.commit_list([count_query])
     total_count = get_dict_cursor()(count_results)[0]['count'] if count_results else 0
-    
-    main_query = f"""MATCH (:pub:Individual {{short_form:'{pub_short_form}'}})<-[:has_reference]-(primary:Individual)
+
+    # Apply LIMIT before the CALL subquery fires so the multi-hop walk
+    # only runs on the rows we actually return — same pattern as
+    # AnatomyExpressedIn / TransgeneExpressionHere.
+    limit_clause = f"LIMIT {limit}" if limit != -1 else ""
+    main_query = f"""
+        MATCH (:pub:Individual {{short_form:'{pub_short_form}'}})<-[:has_reference]-(primary:Individual)
         OPTIONAL MATCH (primary)-[:INSTANCEOF]->(typ:Class)
-        RETURN DISTINCT primary.short_form AS id, '[' + primary.label + '](https://v2.virtualflybrain.org/org.geppetto.frontend/geppetto?id=' + primary.short_form + ')' AS name, apoc.text.join(coalesce(primary.uniqueFacets, []), '|') AS tags, typ.label AS type"""
-    if limit != -1: main_query += f" LIMIT {limit}"
-    
+        WITH DISTINCT primary, typ
+        ORDER BY primary.label
+        {limit_clause}
+        CALL {{
+            // primary is the channel itself when it's a channel_image —
+            // walk to its template alignment + imaging technique.
+            // For non-image primaries (dataset, EP, anatomy) these
+            // OPTIONAL MATCHes return null and the row's
+            // template / technique / thumbnail cells render empty,
+            // matching v2 prod's behaviour on dataset rows.
+            WITH primary
+            OPTIONAL MATCH (primary)-[irw:in_register_with]->(template:Individual)-[:depicts]->(template_anat:Individual)
+            OPTIONAL MATCH (primary)-[:is_specified_output_of]->(technique:Class)
+            WITH primary, template, template_anat, technique, irw
+            LIMIT 1
+            RETURN template, template_anat, technique, irw
+        }}
+        RETURN
+            primary.short_form AS id,
+            apoc.text.format("[%s](%s)", [primary.label, primary.short_form]) AS name,
+            apoc.text.join(coalesce(primary.uniqueFacets, []), '|') AS tags,
+            coalesce(typ.label, '') AS type,
+            REPLACE(apoc.text.format("[%s](%s)", [COALESCE(template_anat.symbol[0], template_anat.label), template_anat.short_form]), '[null](null)', '') AS template,
+            coalesce(technique.label, '') AS technique,
+            REPLACE(apoc.text.format("[![%s](%s '%s')](%s)", [COALESCE(primary.symbol[0], coalesce(primary.label, 'image')) + " aligned to " + COALESCE(template_anat.symbol[0], template_anat.label), REPLACE(COALESCE(irw.thumbnail[0], ''), 'thumbnailT.png', 'thumbnail.png'), COALESCE(primary.symbol[0], coalesce(primary.label, 'image')) + " aligned to " + COALESCE(template_anat.symbol[0], template_anat.label), template_anat.short_form + "," + primary.short_form]), "[![null]( 'null')](null)", "") AS thumbnail
+    """
+
     results = vc.nc.commit_list([main_query])
     df = pd.DataFrame.from_records(get_dict_cursor()(results))
-    if not df.empty: df = encode_markdown_links(df, ['name'])
-    
-    if return_dataframe: return df
-    return {"headers": {"id": {"title": "ID", "type": "selection_id", "order": -1}, "name": {"title": "Term", "type": "markdown", "order": 0}, "tags": {"title": "Tags", "type": "tags", "order": 1}, "type": {"title": "Type", "type": "text", "order": 2}}, "rows": [{key: row[key] for key in ["id", "name", "tags", "type"]} for row in safe_to_dict(df, sort_by_id=False)], "count": total_count}
+    if not df.empty:
+        df = encode_markdown_links(df, ['name', 'template', 'thumbnail'])
+
+    if return_dataframe:
+        return df
+
+    return {
+        "headers": {
+            "id":        {"title": "ID",                "type": "selection_id", "order": -1},
+            "name":      {"title": "Term",              "type": "markdown",     "order":  0},
+            "tags":      {"title": "Tags",              "type": "tags",         "order":  1},
+            "type":      {"title": "Type",              "type": "text",         "order":  2},
+            "template":  {"title": "Template",          "type": "markdown",     "order":  3},
+            "technique": {"title": "Imaging Technique", "type": "text",         "order":  4},
+            "thumbnail": {"title": "Thumbnail",         "type": "markdown",     "order":  9},
+        },
+        "rows": [
+            {k: row[k] for k in ["id", "name", "tags", "type", "template", "technique", "thumbnail"]}
+            for row in safe_to_dict(df, sort_by_id=False)
+        ],
+        "count": total_count,
+    }
 
 
 # ===== Complex Transgene Expression Query =====