Skip to content

Commit ee60c62

Browse files
committed
get_instances: use Owlery for subclass closure, Neo4j for instance match
v1.12.6 swapped the direct-INSTANCEOF Cypher for a call to Owlery's /instances endpoint, but that endpoint only returns OWL-asserted instances. VFB stores Individual→Class membership as Neo4j INSTANCEOF edges rather than OWL ClassAssertion axioms, so Owlery's instance reasoning had nothing to chew on — parent classes still returned 0 even with the queried-class-IRI input. Switch to Owlery's /subclasses endpoint (the reasoner still gives us the OWL-correct subclass closure, including equivalence classes and defined classes), then match Individuals INSTANCEOF any class in that set via Neo4j. This is the legacy v2 XMI two-step pattern. Verified live: FBbt_00100246 (MBON11, leaf) still returns its 14 images; FBbt_00007484 (mushroom body intrinsic neuron, parent) now gets all instances of Kenyon cell, γ Kenyon cell, etc. (dozens of images previously visible in SOLR but invisible to the API).
1 parent e273196 commit ee60c62

1 file changed

Lines changed: 36 additions & 33 deletions

File tree

src/vfbquery/vfb_queries.py

Lines changed: 36 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -2069,44 +2069,43 @@ def get_instances(short_form: str, return_dataframe=True, limit: int = -1):
20692069
"""
20702070

20712071
try:
2072-
# Step 1: ask Owlery for instance IDs matching the class expression.
2073-
# Owlery's reasoner handles the subclass closure via OWL inference,
2074-
# which is the canonical VFBquery idiom (same path used by the
2075-
# `Neurons*Here`, `ImagesThatDevelopFrom`, `TractsNervesInnervatingHere`
2076-
# etc. via `_owlery_query_to_results(..., query_instances=True)`).
2072+
# Step 1: ask Owlery for the SUBCLASS closure of the queried class.
2073+
# Owlery's `/subclasses` reasoner does the OWL inference (handles
2074+
# equivalence classes, defined classes, anonymous class expressions
2075+
# in the parent chain, etc.). The queried class itself is included
2076+
# so leaf classes still match.
20772077
#
2078-
# Why this matters: the previous Cypher used
2079-
# `(i)-[:INSTANCEOF]->(p:Class {short_form: $id})` — a single-edge
2080-
# match that only sees individuals *directly* typed as the queried
2081-
# class. For any parent class (e.g. mushroom body intrinsic neuron
2082-
# FBbt_00007484, whose individuals are typed Kenyon cell etc.) the
2083-
# query returned 0 even though SOLR had dozens of image rows on
2084-
# file. Asking Owlery first gives us the same subclass-aware result
2085-
# the legacy v2 XMI chain produced, with proper OWL semantics
2086-
# (equivalence classes, defined classes, etc.).
2078+
# Why Owlery for subclasses (not instances) — VFB stores Individual→
2079+
# Class membership as Neo4j INSTANCEOF edges, NOT as OWL
2080+
# ClassAssertion axioms. Owlery has the class hierarchy but no
2081+
# individual assertions, so its `/instances` endpoint returns
2082+
# nothing for entities whose Individuals live only in Neo4j. We
2083+
# must do the instance match in Neo4j against the Owlery-derived
2084+
# subclass set. This mirrors the legacy v2 XMI two-step chain
2085+
# (Owlery subclasses → SOLR per-class lookup) without the SOLR
2086+
# intermediate.
2087+
#
2088+
# Why this matters: the previous Cypher used a single-edge
2089+
# `(i)-[:INSTANCEOF]->(p:Class {short_form: $id})` match — only
2090+
# individuals *directly* typed as the queried class were seen.
2091+
# For any parent class (e.g. mushroom body intrinsic neuron
2092+
# FBbt_00007484, whose individuals are typed Kenyon cell etc.)
2093+
# the query returned 0 even though SOLR had dozens of image rows
2094+
# on file.
20872095
owl_query = f"<{_short_form_to_iri(short_form)}>"
2088-
instance_ids = vc.vfb.oc.get_instances(query=owl_query, query_by_label=False)
2089-
if not instance_ids:
2090-
if return_dataframe:
2091-
return pd.DataFrame()
2092-
return {
2093-
"headers": _get_instances_headers(),
2094-
"rows": [],
2095-
"count": 0,
2096-
}
2097-
2098-
# Step 2: fetch image metadata for those instances from Neo4j.
2099-
# Pattern: Individual ← depicts ← TemplateChannel → in_register_with → TemplateChannelTemplate → depicts → ActualTemplate
2100-
# The `parent` column now reports the *actual* class each instance
2101-
# is typed as (often a subclass of the queried class) rather than
2102-
# the queried class itself — more useful for v2 display.
2103-
total_count = len(instance_ids)
2104-
2096+
subclass_ids = vc.vfb.oc.get_subclasses(query=owl_query, query_by_label=False)
2097+
# Always include the queried class itself so leaf classes still match.
2098+
class_ids = list({short_form, *subclass_ids})
2099+
2100+
# Step 2: fetch image metadata for instances of any of those
2101+
# classes from Neo4j. The `parent` column reports the actual class
2102+
# each instance is typed as (often a leaf subclass of the queried
2103+
# class) — more useful for v2 display than echoing the queried id.
21052104
query = f"""
21062105
MATCH (i:Individual:has_image)-[:INSTANCEOF]->(p:Class),
21072106
(i)<-[:depicts]-(tc:Individual)-[r:in_register_with]->(tct:Template)-[:depicts]->(templ:Template),
21082107
(i)-[:has_source]->(ds:DataSet)
2109-
WHERE i.short_form IN {instance_ids!r}
2108+
WHERE p.short_form IN {class_ids!r}
21102109
OPTIONAL MATCH (i)-[rx:database_cross_reference]->(site:Site)
21112110
OPTIONAL MATCH (ds)-[:license|licence]->(lic:License)
21122111
RETURN i.short_form as id,
@@ -2133,7 +2132,11 @@ def get_instances(short_form: str, return_dataframe=True, limit: int = -1):
21332132

21342133
columns_to_encode = ['label', 'parent', 'source', 'source_id', 'template', 'dataset', 'license', 'thumbnail']
21352134
df = encode_markdown_links(df, columns_to_encode)
2136-
2135+
2136+
# Total count is the row count returned by the Cypher (i.e. instances
2137+
# of the queried class or any of its Owlery-derived subclasses).
2138+
total_count = len(df)
2139+
21372140
if return_dataframe:
21382141
return df
21392142

0 commit comments

Comments
 (0)