Skip to content

Commit 5d7ed02

Browse files
committed
Owlery closure + ar.pub[0] walk for transgene + scRNAseq
Two fixes against the legacy XMI Cypher we inherited at migration time, both verified by direct probe against pdb. 1. Owlery /subclasses closure (semantic correctness): Audit of the pre-migration XMI (commit 998cc28d9^) showed 12 anatomy-rooted CompoundRefQueries whose first chain step was dataSources.1 (Owlery /subclasses). Nine are already routed through Owlery via _owlery_query_to_results (NeuronsPartHere, NeuronsSynaptic, NeuronsPresynapticHere, NeuronsPostsynapticHere, TractsNervesInnervatingHere, ComponentsOf, LineageClonesIn, PartsOf, NeuronClassesFasciculatingHere) and SubclassesOf IS the closure itself. Two skipped Owlery during the migration: get_transgene_expression_here — raw Cypher with direct INSTANCEOF only. Now resolves the subclass closure via Owlery /subclasses first (input class always included so leaves still match), then filters with `anat.short_form IN $closure`. get_anatomy_scrnaseq — v1.13.5 added a Cypher SUBCLASSOF*0.. walk which works for primitive class hierarchies but misses defined-class / equivalentTo inferences. Replaced with Owlery for consistency with get_instances (v1.12.8) and the other nine. Both fall back to the input class alone if Owlery is unreachable — better to return the leaf result than to fail outright. Same defensive pattern as get_instances. 2. Pubs walk fix in get_transgene_expression_here (the user-visible bug behind the empty Reference column on v2-dev): Legacy XMI dataSources[0]/@queries.7 reads pub identifiers from `ar.pub[0]` — an array PROPERTY on the overlaps/part_of RELATIONSHIP — then resolves labels via OPTIONAL MATCH (pub:pub { short_form: p }) Our v1.10.x rewrite traversed a separate `:has_reference|pub` EDGE from the Individual, which doesn't exist in the current Neo4j build. Every row's pubs column came back empty. Verified locally on pacemaker neuron (FBbt_00006048): the patched walk now returns P{GAL4-per.BS}: FlyBase Curators 2017; Kaneko and Hall, 2000 P{GAL4-tim.E}: Helfrich-Forster et al., 2007 P{GSV6}GS10340: Richier et al., 2008 P{cry-GAL4.E}: FlyBase Curators 2017 matching v2 prod's Reference column screenshot exactly. Note on Template_Space / Imaging_Technique: empty for the same four pacemaker-neuron EPs is by-data, not a bug. Direct probe of `(ep)<-[:has_source|SUBCLASSOF|INSTANCEOF*]-(:Individual)` on VFBexp_FBtp0001321 returns zero hits — those classic-era GAL4 EPs have no aligned-image individuals in pdb. The same walk on MB364B (VFBexp_FBtp0099464FBtp0117485) correctly resolves to VFB_00101567 (JRC2018Unisex) / VFB_00017894 (JFRC2). The current `:Template` label target also works (templates carry both :Individual and :Template labels), so no walk change needed there. Minor bump (1.13.x → 1.14.0) to invalidate stale @with_solr_cache buckets that carry the broken empty-pubs payloads from the previous walk. The geppetto-vfb XMI is unchanged. Reference: geppetto-vfb@998cc28d9^:model/vfb.xmi queryChain="//@dataSources.1/@queries.N ..." (Owlery first) dataSources[0]/@queries.7,10,12 (Neo4j Cypher with ar.pub[0])
1 parent 372d6a1 commit 5d7ed02

2 files changed

Lines changed: 69 additions & 32 deletions

File tree

setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33

44
here = path.abspath(path.dirname(__file__))
55

6-
__version__ = "1.12.1"
6+
__version__ = "1.14.0"
77

88
# Get the long description from the README file
99
with open(path.join(here, 'README.md')) as f:

src/vfbquery/vfb_queries.py

Lines changed: 68 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -4193,21 +4193,27 @@ def get_anatomy_scrnaseq(anatomy_short_form: str, return_dataframe=True, limit:
41934193
"""
41944194

41954195
# `hasScRNAseq` on a parent class is set by the indexer when ANY
4196-
# subclass has a Cluster composed_primarily_of it, so the
4197-
# narrow MATCH (primary)<-[:composed_primarily_of]-(c:Cluster) pattern
4196+
# subclass has a Cluster composed_primarily_of it, so the narrow
4197+
# MATCH (primary)<-[:composed_primarily_of]-(c:Cluster) pattern
41984198
# returns 0 on parent classes with scRNAseq-bearing subclasses
41994199
# (e.g. pacemaker neuron FBbt_00006048: 0 direct, 3 via subclasses
4200-
# adult pacemaker neuron / LNv neuron / DN1 neuron). Walk SUBCLASSOF*0..
4201-
# so subclass-attached Clusters are included, matching the indexer's
4202-
# propagation semantics. SUBCLASSOF-only is safe here — Clusters bind
4203-
# to Classes via composed_primarily_of (not the leaf-only INSTANCEOF
4204-
# path that needed Owlery in get_instances v1.12.8); the class
4205-
# hierarchy in Neo4j is complete enough for this walk.
4200+
# adult pacemaker neuron / LNv neuron / DN1 neuron).
4201+
#
4202+
# v1.14.0: resolve the subclass closure via Owlery /subclasses
4203+
# (consistent with get_transgene_expression_here and get_instances
4204+
# v1.12.8). v1.13.5 used a Cypher SUBCLASSOF*0.. walk; that misses
4205+
# defined-class / equivalence inferences. Owlery is the right tool
4206+
# and matches the legacy XMI's first-step Owlery walk.
4207+
try:
4208+
owl_query = f"<{_short_form_to_iri(anatomy_short_form)}>"
4209+
subclass_ids = vc.vfb.oc.get_subclasses(query=owl_query, query_by_label=False)
4210+
anat_short_forms = list({anatomy_short_form, *(subclass_ids or [])})
4211+
except Exception:
4212+
anat_short_forms = [anatomy_short_form]
4213+
42064214
count_query = f"""
4207-
MATCH (primary:Class:Anatomy)
4208-
WHERE primary.short_form = '{anatomy_short_form}'
4209-
WITH primary
4210-
MATCH (primary)<-[:SUBCLASSOF*0..]-(sub:Class)
4215+
MATCH (sub:Class)
4216+
WHERE sub.short_form IN {anat_short_forms!r}
42114217
MATCH (sub)<-[:composed_primarily_of]-(c:Cluster)-[:has_source]->(ds:scRNAseq_DataSet)
42124218
RETURN COUNT(DISTINCT c) AS total_count
42134219
"""
@@ -4216,12 +4222,15 @@ def get_anatomy_scrnaseq(anatomy_short_form: str, return_dataframe=True, limit:
42164222
count_df = pd.DataFrame.from_records(get_dict_cursor()(count_results))
42174223
total_count = count_df['total_count'][0] if not count_df.empty else 0
42184224

4219-
# Main query: get clusters with dataset and publication info
4225+
# Main query: get clusters with dataset and publication info.
4226+
# `primary` is preserved in the projection for parity with the
4227+
# pre-v1.14.0 shape; we re-fetch the input class as `primary`
4228+
# so callers expecting that field continue to work.
42204229
main_query = f"""
42214230
MATCH (primary:Class:Anatomy)
42224231
WHERE primary.short_form = '{anatomy_short_form}'
4223-
WITH primary
4224-
MATCH (primary)<-[:SUBCLASSOF*0..]-(sub:Class)
4232+
MATCH (sub:Class)
4233+
WHERE sub.short_form IN {anat_short_forms!r}
42254234
MATCH (sub)<-[:composed_primarily_of]-(c:Cluster)-[:has_source]->(ds:scRNAseq_DataSet)
42264235
WITH DISTINCT primary, c, ds
42274236
OPTIONAL MATCH (ds)-[:has_reference]->(p:pub)
@@ -5365,22 +5374,36 @@ def get_transgene_expression_here(anatomy_short_form: str, return_dataframe=True
53655374
- Template_Space / Imaging_Technique / Images (one representative
53665375
channel-image per ep, picked via CALL subquery with LIMIT 1)
53675376
5368-
Matches the prod XMI Cypher at
5369-
geppetto-vfb/master:model/vfb.xmi @dataSources.0/@queries.7 (the
5370-
SOLR-backed "Query for exp from anatomy with no warning" path),
5371-
flattened to the same column shape v1.10.1 introduced for the
5372-
SimilarMorphologyTo* siblings so the geppetto-vfb processor's
5373-
COL_HEADER_MAP maps everything cleanly.
5377+
v1.14.0: anatomy traversal now goes through Owlery /subclasses,
5378+
matching the legacy XMI's first-step Owlery walk
5379+
(//@dataSources.1/@queries.8 in the pre-migration chain). Without
5380+
this, a high-level anatomy class (e.g. pacemaker neuron, mushroom
5381+
body intrinsic neuron) returns 0 EPs because Individuals are typed
5382+
INSTANCEOF leaf subclasses, not the parent. Same closure pattern as
5383+
get_instances (v1.12.8).
53745384
53755385
TODO: Expressed_in column. Prod surfaces it from the
53765386
anatomy_channel_image[].anatomy.label list — one chip per
53775387
representative image. Needs a small design decision on how to
53785388
render multiple values in a flat string column; deferred to a
53795389
follow-up so the rest of the columns can ship now.
53805390
"""
5391+
# Resolve the full subclass closure of the input anatomy class via
5392+
# Owlery. Owlery handles OWL inference (equivalence classes, defined
5393+
# classes, anonymous class expressions on the parent chain); the
5394+
# queried class itself is included so leaf classes still match.
5395+
try:
5396+
owl_query = f"<{_short_form_to_iri(anatomy_short_form)}>"
5397+
subclass_ids = vc.vfb.oc.get_subclasses(query=owl_query, query_by_label=False)
5398+
anat_short_forms = list({anatomy_short_form, *(subclass_ids or [])})
5399+
except Exception:
5400+
# If Owlery is unreachable, fall back to the input class alone —
5401+
# better to return the leaf-class result than to fail outright.
5402+
anat_short_forms = [anatomy_short_form]
5403+
53815404
count_query = f"""
53825405
MATCH (ep:Class:Expression_pattern)<-[ar:overlaps|part_of]-(:Individual)-[:INSTANCEOF]->(anat:Class)
5383-
WHERE anat.short_form = '{anatomy_short_form}'
5406+
WHERE anat.short_form IN {anat_short_forms!r}
53845407
RETURN COUNT(DISTINCT ep) AS total_count
53855408
"""
53865409
count_results = vc.nc.commit_list([count_query])
@@ -5391,21 +5414,35 @@ def get_transgene_expression_here(anatomy_short_form: str, return_dataframe=True
53915414
# fire so we only enrich the rows we actually need. With 2,340
53925415
# mushroom-body EPs and a 5-hop thumbnail join inside the CALL, the
53935416
# naive "append LIMIT at the end" form ran for tens of seconds.
5417+
#
5418+
# v1.14.0 pubs walk now matches the legacy XMI Cypher at
5419+
# geppetto-vfb@998cc28d9^:model/vfb.xmi dataSources[0]/@queries.7:
5420+
# pub short_forms are stored as an ARRAY PROPERTY on the
5421+
# overlaps/part_of RELATIONSHIP (ar.pub[0]), not as a separate
5422+
# [:has_reference|pub] edge from the Individual. The previous
5423+
# walk traversed an edge that doesn't exist in the current
5424+
# Neo4j build, so pubs came back empty for every row.
53945425
limit_clause = f"LIMIT {limit}" if limit != -1 else ""
53955426
main_query = f"""
53965427
MATCH (ep:Class:Expression_pattern)<-[ar:overlaps|part_of]-(:Individual)-[:INSTANCEOF]->(anat:Class)
5397-
WHERE anat.short_form = '{anatomy_short_form}'
5398-
WITH DISTINCT ep
5428+
WHERE anat.short_form IN {anat_short_forms!r}
5429+
WITH ep, collect(DISTINCT ar.pub[0]) AS pub_shorts
53995430
ORDER BY ep.label
54005431
{limit_clause}
54015432
CALL {{
5402-
WITH ep
5403-
OPTIONAL MATCH (ep)<-[:overlaps|part_of]-(:Individual)-[:has_reference|pub]->(p:pub)
5404-
// Strip "Unattributed" pub labels — they're VFB's marker for an
5405-
// expression pattern with no citation, but rendered in the V2
5406-
// Reference column they look like a real citation. Match v2 prod
5407-
// behaviour which hides Unattributed entirely.
5408-
RETURN apoc.text.join([l IN collect(DISTINCT coalesce(p.label, p.short_form)) WHERE l IS NOT NULL AND l <> '' AND l <> 'Unattributed'], '; ') AS pubs
5433+
WITH pub_shorts
5434+
UNWIND pub_shorts AS p_sf
5435+
OPTIONAL MATCH (p:pub {{ short_form: p_sf }})
5436+
// Strip "Unattributed" pub labels — VFB's marker for an EP
5437+
// with no citation. Rendered in the V2 Reference column they
5438+
// look like a real citation. Match v2 prod behaviour which
5439+
// hides Unattributed entirely. Also drop NULL p (pub_short
5440+
// without a resolvable pub node) and empty labels.
5441+
WITH p WHERE p IS NOT NULL
5442+
AND coalesce(p.label, p.short_form) IS NOT NULL
5443+
AND coalesce(p.label, p.short_form) <> ''
5444+
AND coalesce(p.label, p.short_form) <> 'Unattributed'
5445+
RETURN apoc.text.join(collect(DISTINCT coalesce(p.label, p.short_form)), '; ') AS pubs
54095446
}}
54105447
CALL {{
54115448
WITH ep

0 commit comments

Comments
 (0)