Skip to content

Commit 3a2bfc8

Browse files
authored
Merge pull request #45 from VirtualFlyBrain/feature/cache-all-queries
Cache all handler-reachable queries (v1.19.0)
2 parents eb6bddf + 25b0b0e commit 3a2bfc8

6 files changed

Lines changed: 211 additions & 14 deletions

File tree

CACHING.md

Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,91 @@ VFBquery uses a single-layer caching approach with SOLR:
2626
3. **Cache persistence**: Survives Python restarts and server reboots
2727
4. **Automatic expiration**: 3-month TTL matches VFB_connect behavior
2828

29+
## Cache coverage (v1.19.0)
30+
31+
As of v1.19.0 every query-result function reachable from the HA API handlers
32+
(`ha_api.py`) is served by the persistent SOLR cache, except a small set that
33+
are deliberately excluded (see below). Coverage is verified by a static sweep
34+
that traces each handler entry point through the `QUERY_TYPE_MAP` dispatch and
35+
the FlyBase/connectivity/hierarchy handlers — see `coverage_sweep.py`.
36+
37+
Caching is applied in one of two layers, both of which the handler path goes
38+
through (`handler -> vfbquery.<fn> (patched to *_cached) -> _original`):
39+
40+
- `@with_solr_cache('<bucket>')` on the original in `vfb_queries.py` (most
41+
hierarchy / neuron-in-region / connectivity / image queries), or
42+
- `@with_solr_cache('<bucket>')` on the `*_cached` wrapper in
43+
`cached_functions.py` (term_info, similarity, transcriptomics, datasets).
44+
45+
A function counts as cached if either layer carries the decorator; do not add
46+
the decorator at both layers for the same function (double round-trips).
47+
48+
New buckets added in v1.19.0: `cluster_expression`, `expression_cluster`,
49+
`scrnaseq_dataset_data`, `individual_neuron_inputs`, `similar_morphology`,
50+
`similar_morphology_part_of`, `similar_morphology_part_of_exp`,
51+
`similar_morphology_nb`, `similar_morphology_nb_exp`, `dataset_images`,
52+
`all_aligned_images`, `all_datasets`, `transgene_expression_here`,
53+
`related_anatomy`. The five genuinely new buckets (`dataset_images`,
54+
`all_aligned_images`, `all_datasets`, `transgene_expression_here`,
55+
`related_anatomy`) are also listed in the wrapper's `expensive_query_types`
56+
and `dataframe_query_types`, so a limited request computes the full result
57+
once, caches it, and serves later limited requests by slicing the cached full
58+
result.
59+
60+
### Cross-dataset connectivity (`query_connectivity`)
61+
62+
`query_connectivity` takes five parameters (`upstream_type`,
63+
`downstream_type`, `weight`, `group_by_class`, `exclude_dbs`), so the default
64+
single-id `@with_solr_cache` key does not fit. It is persisted directly in
65+
`vfb_connectivity.py` under a composite key
66+
(`query_connectivity:{upstream}:{downstream}:{weight}:{group_by_class}:{exclude_dbs}`,
67+
hashed for a Solr-safe document id). The in-memory `ResultCache` and request
68+
coalescer in `ha_api.py` sit in front; this SOLR layer sits behind so a cold
69+
miss survives restarts and reaches the other containers. Graph
70+
post-processing (`post_fn`) stays in the handler and is never part of the
71+
cached payload. `force_refresh=true` on `/query_connectivity` drops both the
72+
in-memory entry and the SOLR document and recomputes.
73+
74+
### Deliberately not cached
75+
76+
- `get_similar_morphology_userdata` — keyed on a per-session user upload id;
77+
the result is user/session-specific, so it is left to the in-memory L1
78+
cache only.
79+
- `get_flybase_stocks`, `get_flybase_combo_pubs`, `find_stocks`,
80+
`find_combo_publications` — backed by the FlyBase RDBMS, not Neo4j/Owlery;
81+
out of scope for this offload.
82+
- `resolve_entity`, `resolve_combination` — thin resolvers over the already
83+
cached `term_info`.
84+
- `list_connectome_datasets` — tiny static list; L1 cache is sufficient.
85+
- `get_hierarchy` — delegates its heavy work to the SOLR-cached
86+
`get_parts_of` / `get_subclasses_of` and relies on Owlery's own
87+
server-side cache, with the handler holding an in-memory composite-key
88+
entry; persistent composite caching is a sensible follow-up but was left
89+
out to keep this change focused.
90+
91+
### Cache server
92+
93+
The cache reads and writes `cache_url`, which defaults to the dedicated
94+
query-cache Solr:
95+
96+
```
97+
http://vfbquerycache.virtualflybrain.org:80/solr/vfb_json
98+
```
99+
100+
(`SolrResultCache.DEFAULT_CACHE_URL`). This is a separate, lightly-loaded host
101+
from the ontology Solr (`solr.virtualflybrain.org`); it is reached on port 80
102+
because the Solr native port is firewalled externally. Override with the
103+
`VFBQUERY_SOLR_URL` environment variable (e.g. to point at a staging core for
104+
testing):
105+
106+
```bash
107+
export VFBQUERY_SOLR_URL=http://localhost:8983/solr/vfb_json
108+
```
109+
110+
Note: data reads in `vfb_queries.py` (term_info, painted domains, ontology
111+
label lookups, etc.) still go to `solr.virtualflybrain.org` — only the result
112+
*cache* moved. The two are independent.
113+
29114
## Runtime Configuration
30115

31116
Control caching behavior:

setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33

44
here = path.abspath(path.dirname(__file__))
55

6-
__version__ = "1.18.0"
6+
__version__ = "1.19.0"
77

88
# Get the long description from the README file
99
with open(path.join(here, 'README.md')) as f:

src/vfbquery/cached_functions.py

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -138,6 +138,7 @@ def get_similar_neurons_cached(neuron, similarity_score='NBLAST_score', return_d
138138
"""
139139
return _original_get_similar_neurons(neuron=neuron, similarity_score=similarity_score, return_dataframe=return_dataframe, limit=limit)
140140

141+
@with_solr_cache('similar_morphology')
141142
def get_similar_morphology_cached(neuron_short_form: str, return_dataframe=True, limit: int = -1, force_refresh: bool = False):
142143
"""
143144
Enhanced get_similar_morphology with SOLR caching.
@@ -153,6 +154,7 @@ def get_similar_morphology_cached(neuron_short_form: str, return_dataframe=True,
153154
"""
154155
return _original_get_similar_morphology(neuron_short_form=neuron_short_form, return_dataframe=return_dataframe, limit=limit)
155156

157+
@with_solr_cache('similar_morphology_part_of')
156158
def get_similar_morphology_part_of_cached(neuron_short_form: str, return_dataframe=True, limit: int = -1, force_refresh: bool = False):
157159
"""
158160
Enhanced get_similar_morphology_part_of with SOLR caching.
@@ -168,6 +170,7 @@ def get_similar_morphology_part_of_cached(neuron_short_form: str, return_datafra
168170
"""
169171
return _original_get_similar_morphology_part_of(neuron_short_form=neuron_short_form, return_dataframe=return_dataframe, limit=limit)
170172

173+
@with_solr_cache('similar_morphology_part_of_exp')
171174
def get_similar_morphology_part_of_exp_cached(expression_short_form: str, return_dataframe=True, limit: int = -1, force_refresh: bool = False):
172175
"""
173176
Enhanced get_similar_morphology_part_of_exp with SOLR caching.
@@ -183,6 +186,7 @@ def get_similar_morphology_part_of_exp_cached(expression_short_form: str, return
183186
"""
184187
return _original_get_similar_morphology_part_of_exp(expression_short_form=expression_short_form, return_dataframe=return_dataframe, limit=limit)
185188

189+
@with_solr_cache('similar_morphology_nb')
186190
def get_similar_morphology_nb_cached(neuron_short_form: str, return_dataframe=True, limit: int = -1, force_refresh: bool = False):
187191
"""
188192
Enhanced get_similar_morphology_nb with SOLR caching.
@@ -197,6 +201,7 @@ def get_similar_morphology_nb_cached(neuron_short_form: str, return_dataframe=Tr
197201
"""
198202
return _original_get_similar_morphology_nb(neuron_short_form=neuron_short_form, return_dataframe=return_dataframe, limit=limit)
199203

204+
@with_solr_cache('similar_morphology_nb_exp')
200205
def get_similar_morphology_nb_exp_cached(expression_short_form: str, return_dataframe=True, limit: int = -1, force_refresh: bool = False):
201206
"""
202207
Enhanced get_similar_morphology_nb_exp with SOLR caching.
@@ -211,6 +216,9 @@ def get_similar_morphology_nb_exp_cached(expression_short_form: str, return_data
211216
"""
212217
return _original_get_similar_morphology_nb_exp(expression_short_form=expression_short_form, return_dataframe=return_dataframe, limit=limit)
213218

219+
# Deliberately not @with_solr_cache: the key is a per-session user upload id,
220+
# so the result is user/session-specific and not safe to share via the
221+
# persistent cache. Left to recompute (and to the in-memory L1 cache only).
214222
def get_similar_morphology_userdata_cached(upload_id: str, return_dataframe=True, limit: int = -1, force_refresh: bool = False):
215223
"""
216224
Enhanced get_similar_morphology_userdata with SOLR caching.
@@ -295,6 +303,7 @@ def get_templates_cached(limit: int = -1, return_dataframe: bool = False, force_
295303
"""
296304
return _original_get_templates(limit=limit, return_dataframe=return_dataframe, force_refresh=force_refresh)
297305

306+
@with_solr_cache('related_anatomy')
298307
def get_related_anatomy_cached(template_short_form: str, limit: int = -1, return_dataframe: bool = False, force_refresh: bool = False):
299308
"""
300309
Enhanced get_related_anatomy with SOLR caching.
@@ -348,6 +357,7 @@ def get_template_roi_tree_cached(template_short_form: str, return_dataframe: boo
348357
"""
349358
return _original_get_template_roi_tree(template_short_form=template_short_form, return_dataframe=return_dataframe)
350359

360+
@with_solr_cache('dataset_images')
351361
def get_dataset_images_cached(dataset_short_form: str, return_dataframe=True, limit: int = -1, force_refresh: bool = False):
352362
"""
353363
Enhanced get_dataset_images with SOLR caching.
@@ -362,6 +372,7 @@ def get_dataset_images_cached(dataset_short_form: str, return_dataframe=True, li
362372
"""
363373
return _original_get_dataset_images(dataset_short_form=dataset_short_form, return_dataframe=return_dataframe, limit=limit)
364374

375+
@with_solr_cache('all_aligned_images')
365376
def get_all_aligned_images_cached(template_short_form: str, return_dataframe=True, limit: int = -1, force_refresh: bool = False):
366377
"""
367378
Enhanced get_all_aligned_images with SOLR caching.
@@ -391,6 +402,7 @@ def get_aligned_datasets_cached(template_short_form: str, return_dataframe=True,
391402
"""
392403
return _original_get_aligned_datasets(template_short_form=template_short_form, return_dataframe=return_dataframe, limit=limit)
393404

405+
@with_solr_cache('all_datasets')
394406
def get_all_datasets_cached(return_dataframe=True, limit: int = -1, force_refresh: bool = False):
395407
"""
396408
Enhanced get_all_datasets with SOLR caching.
@@ -404,10 +416,16 @@ def get_all_datasets_cached(return_dataframe=True, limit: int = -1, force_refres
404416
"""
405417
return _original_get_all_datasets(return_dataframe=return_dataframe, limit=limit)
406418

419+
@with_solr_cache('individual_neuron_inputs')
407420
def get_individual_neuron_inputs_cached(neuron_short_form: str, return_dataframe=True, limit: int = -1, summary_mode: bool = False, force_refresh: bool = False):
408421
"""
409422
Enhanced get_individual_neuron_inputs with SOLR caching.
410423
424+
Note: the SOLR cache keys on the neuron id (and return_dataframe). The
425+
REST path always calls with summary_mode=False, so the default key is
426+
safe there; a non-default summary_mode is not part of the cache key, so
427+
direct library callers that vary it should pass force_refresh.
428+
411429
Args:
412430
neuron_short_form: Neuron short form
413431
return_dataframe: Whether to return DataFrame or list of dicts
@@ -484,6 +502,7 @@ def get_anatomy_scrnaseq_cached(anatomy_short_form: str, return_dataframe=True,
484502
"""
485503
return _original_get_anatomy_scrnaseq(anatomy_short_form=anatomy_short_form, return_dataframe=return_dataframe, limit=limit)
486504

505+
@with_solr_cache('cluster_expression')
487506
def get_cluster_expression_cached(cluster_short_form: str, return_dataframe=True, limit: int = -1, force_refresh: bool = False):
488507
"""
489508
Enhanced get_cluster_expression with SOLR caching.
@@ -498,6 +517,7 @@ def get_cluster_expression_cached(cluster_short_form: str, return_dataframe=True
498517
"""
499518
return _original_get_cluster_expression(cluster_short_form=cluster_short_form, return_dataframe=return_dataframe, limit=limit)
500519

520+
@with_solr_cache('expression_cluster')
501521
def get_expression_cluster_cached(gene_short_form: str, return_dataframe=True, limit: int = -1, force_refresh: bool = False):
502522
"""
503523
Enhanced get_expression_cluster with SOLR caching.
@@ -512,6 +532,7 @@ def get_expression_cluster_cached(gene_short_form: str, return_dataframe=True, l
512532
"""
513533
return _original_get_expression_cluster(gene_short_form=gene_short_form, return_dataframe=return_dataframe, limit=limit)
514534

535+
@with_solr_cache('scrnaseq_dataset_data')
515536
def get_scrnaseq_dataset_data_cached(dataset_short_form: str, return_dataframe=True, limit: int = -1, force_refresh: bool = False):
516537
"""
517538
Enhanced get_scrnaseq_dataset_data with SOLR caching.
@@ -526,6 +547,7 @@ def get_scrnaseq_dataset_data_cached(dataset_short_form: str, return_dataframe=T
526547
"""
527548
return _original_get_scrnaseq_dataset_data(dataset_short_form=dataset_short_form, return_dataframe=return_dataframe, limit=limit)
528549

550+
@with_solr_cache('transgene_expression_here')
529551
def get_transgene_expression_here_cached(anatomy_short_form: str, return_dataframe=True, limit: int = -1, force_refresh: bool = False):
530552
"""
531553
Enhanced get_transgene_expression_here with SOLR caching.

src/vfbquery/ha_api.py

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -425,14 +425,15 @@ def _run_list_connectome_datasets():
425425

426426

427427
def _run_query_connectivity(upstream_type, downstream_type, weight,
428-
group_by_class, exclude_dbs):
428+
group_by_class, exclude_dbs, force_refresh=False):
429429
"""Execute query_connectivity in a worker process."""
430430
return _vfb.query_connectivity(
431431
upstream_type=upstream_type,
432432
downstream_type=downstream_type,
433433
weight=weight,
434434
group_by_class=group_by_class,
435435
exclude_dbs=exclude_dbs,
436+
force_refresh=force_refresh,
436437
)
437438

438439

@@ -962,6 +963,7 @@ async def handle_query_connectivity(request):
962963
else:
963964
exclude_dbs = ["hb", "fafb"]
964965
include_graph = request.query.get("include_graph", "false").lower() in ("true", "1", "yes")
966+
force_refresh = request.query.get("force_refresh", "false").lower() in ("true", "1", "yes")
965967

966968
post_fn = None
967969
if include_graph:
@@ -978,9 +980,13 @@ def post_fn(result):
978980
return result
979981

980982
key = f"query_connectivity:{upstream}:{downstream}:{weight}:{group_by_class}:{exclude_dbs}"
983+
# force_refresh=true drops the in-memory L1 entry so the recomputed result
984+
# replaces it; the SOLR layer is invalidated inside query_connectivity.
985+
if force_refresh:
986+
request.app["result_cache"].invalidate(key)
981987
return await _dispatch_to_pool(
982988
request, key, _run_query_connectivity,
983-
upstream, downstream, weight, group_by_class, exclude_dbs,
989+
upstream, downstream, weight, group_by_class, exclude_dbs, force_refresh,
984990
post_fn=post_fn,
985991
)
986992

src/vfbquery/solr_result_cache.py

Lines changed: 31 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -60,18 +60,26 @@ class SolrResultCache:
6060
will periodically probe Solr and re-enable itself when the service recovers.
6161
"""
6262

63-
def __init__(self,
64-
cache_url: str = "https://solr.virtualflybrain.org/solr/vfb_json",
63+
# Dedicated query-cache Solr, reachable on port 80 (the ontology Solr's
64+
# native port is firewalled externally, so the cache must use this host).
65+
DEFAULT_CACHE_URL = "http://vfbquerycache.virtualflybrain.org:80/solr/vfb_json"
66+
67+
def __init__(self,
68+
cache_url: str = None,
6569
ttl_hours: int = 2160, # 3 months like VFB_connect
6670
max_result_size_mb: int = 10):
6771
"""
6872
Initialize SOLR result cache
69-
73+
7074
Args:
71-
cache_url: SOLR collection URL for caching
75+
cache_url: SOLR collection URL for caching. Defaults to the
76+
VFBQUERY_SOLR_URL env var if set, otherwise the dedicated
77+
query-cache Solr (DEFAULT_CACHE_URL).
7278
ttl_hours: Time-to-live for cache entries in hours
7379
max_result_size_mb: Maximum result size to cache in MB
7480
"""
81+
if cache_url is None:
82+
cache_url = os.getenv('VFBQUERY_SOLR_URL', self.DEFAULT_CACHE_URL)
7583
self.cache_url = cache_url
7684
self.ttl_hours = ttl_hours
7785
self.max_result_size_mb = max_result_size_mb
@@ -777,14 +785,19 @@ def wrapper(*args, **kwargs):
777785

778786
# For expensive queries, we still only cache full results, but we handle limited requests
779787
# by slicing from cached full results
780-
expensive_query_types = ['similar_neurons', 'similar_morphology', 'similar_morphology_part_of',
781-
'similar_morphology_part_of_exp', 'similar_morphology_nb',
788+
expensive_query_types = ['similar_neurons', 'similar_morphology', 'similar_morphology_part_of',
789+
'similar_morphology_part_of_exp', 'similar_morphology_nb',
782790
'similar_morphology_nb_exp', 'similar_morphology_userdata',
783-
'neurons_part_here', 'neurons_synaptic',
791+
'neurons_part_here', 'neurons_synaptic',
784792
'neurons_presynaptic', 'neurons_postsynaptic',
785793
'expression_overlaps_here', 'anatomy_scrnaseq', 'aligned_datasets', 'terms_for_pub',
786794
'individual_neuron_inputs', 'cluster_expression', 'expression_cluster', 'scrnaseq_dataset_data',
787-
'painted_domains', 'downstream_class_connectivity_query', 'upstream_class_connectivity_query']
795+
'painted_domains', 'downstream_class_connectivity_query', 'upstream_class_connectivity_query',
796+
# New buckets (v1.19.0): large, limit-sliced results — listing them here
797+
# means a limited request computes the full result once, caches it, and
798+
# serves later limited requests by slicing the cached full result.
799+
'dataset_images', 'all_aligned_images', 'all_datasets',
800+
'transgene_expression_here', 'related_anatomy']
788801

789802
# For neuron_neuron_connectivity_query, only cache when all parameters are defaults
790803
if query_type == 'neuron_neuron_connectivity_query':
@@ -795,11 +808,16 @@ def wrapper(*args, **kwargs):
795808
# Extract term_id from first argument or kwargs
796809
term_id = args[0] if args else kwargs.get('short_form') or kwargs.get('term_id')
797810

798-
# For functions like get_templates that don't have a term_id, use query_type as cache key
811+
# For functions that don't have a term_id, use a fixed cache key
812+
# tied to the query_type (the result is a single global list).
799813
if not term_id:
800814
if query_type == 'templates':
801815
# Use a fixed cache key for templates since it doesn't take a term_id
802816
term_id = 'all_templates'
817+
elif query_type == 'all_datasets':
818+
# get_all_datasets has no id argument; the result is the full
819+
# dataset catalogue, so a single fixed key is correct.
820+
term_id = 'all_datasets'
803821
else:
804822
logger.warning(f"No term_id found for caching {query_type}")
805823
return func(*args, **kwargs)
@@ -823,7 +841,10 @@ def wrapper(*args, **kwargs):
823841
'images_that_develop_from', 'expression_pattern_fragments', 'expression_overlaps_here',
824842
'anatomy_scrnaseq', 'aligned_datasets', 'terms_for_pub', 'individual_neuron_inputs',
825843
'cluster_expression', 'expression_cluster', 'scrnaseq_dataset_data', 'painted_domains',
826-
'downstream_class_connectivity_query', 'upstream_class_connectivity_query']
844+
'downstream_class_connectivity_query', 'upstream_class_connectivity_query',
845+
# New buckets (v1.19.0) — see expensive_query_types above.
846+
'dataset_images', 'all_aligned_images', 'all_datasets',
847+
'transgene_expression_here', 'related_anatomy']
827848
if query_type in dataframe_query_types:
828849
return_dataframe = kwargs.get('return_dataframe', True) # Default is True
829850
cache_term_id = f"{cache_term_id}_dataframe_{return_dataframe}"

0 commit comments

Comments
 (0)