Skip to content

Commit c4e7163

Browse files
committed
Refactor caching implementation to use SOLR-based caching throughout the codebase, enhancing performance and simplifying cache management.
1 parent 2ecdb96 commit c4e7163

4 files changed

Lines changed: 60 additions & 204 deletions

File tree

CACHING.md

Lines changed: 38 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -1,41 +1,43 @@
11
# VFBquery Caching Guide
22

3-
VFBquery includes intelligent caching for optimal performance. Caching is **enabled by default** with production-ready settings.
3+
VFBquery includes intelligent SOLR-based caching for optimal performance. Caching is **enabled by default** with production-ready settings.
44

55
## Default Behavior
66

7-
VFBquery automatically enables caching when imported:
7+
VFBquery automatically enables SOLR caching when imported:
88

99
```python
1010
import vfbquery as vfb
1111

12-
# Caching is already active with optimal settings:
12+
# SOLR caching is already active with optimal settings:
1313
# - 3-month cache duration
14-
# - 2GB memory cache with LRU eviction
15-
# - Persistent disk storage
14+
# - Persistent across sessions
1615
# - Zero configuration required
1716

1817
result = vfb.get_term_info('FBbt_00003748') # Cached automatically
1918
```
2019

20+
## How It Works
21+
22+
VFBquery uses a single-layer caching approach with SOLR:
23+
24+
1. **First query**: Fetches data from Neo4j/Owlery and caches in SOLR
25+
2. **Subsequent queries**: Served directly from SOLR cache
26+
3. **Cache persistence**: Survives Python restarts and server reboots
27+
4. **Automatic expiration**: 3-month TTL matches VFB_connect behavior
28+
2129
## Runtime Configuration
2230

23-
Adjust cache settings while your application is running:
31+
Control caching behavior:
2432

2533
```python
2634
import vfbquery as vfb
2735

28-
# Modify cache duration
29-
vfb.set_cache_ttl(720) # 1 month
30-
vfb.set_cache_ttl(24) # 1 day
31-
32-
# Adjust memory limits
33-
vfb.set_cache_memory_limit(512) # 512MB
34-
vfb.set_cache_max_items(5000) # 5K items
36+
# Clear specific cache entries
37+
vfb.clear_solr_cache('term_info', 'FBbt_00003748')
3538

36-
# Toggle disk persistence
37-
vfb.disable_disk_cache() # Memory-only
38-
vfb.enable_disk_cache() # Restore persistence
39+
# Get SOLR cache statistics
40+
stats = vfb.get_solr_cache().get_cache_stats()
3941
```
4042

4143
### Environment Control
@@ -48,15 +50,15 @@ export VFBQUERY_CACHE_ENABLED=false
4850

4951
## Performance Benefits
5052

51-
VFBquery caching provides significant performance improvements:
53+
VFBquery SOLR caching provides significant performance improvements:
5254

5355
```python
5456
import vfbquery as vfb
5557

56-
# First query: builds cache (~1-2 seconds)
58+
# First query: builds SOLR cache (~1-2 seconds)
5759
result1 = vfb.get_term_info('FBbt_00003748')
5860

59-
# Subsequent queries: served from cache (<0.1 seconds)
61+
# Subsequent queries: served from SOLR cache (<0.1 seconds)
6062
result2 = vfb.get_term_info('FBbt_00003748') # 54,000x faster!
6163
```
6264

@@ -71,16 +73,11 @@ result2 = vfb.get_term_info('FBbt_00003748') # 54,000x faster!
7173
```python
7274
import vfbquery as vfb
7375

74-
# Get cache statistics
75-
stats = vfb.get_vfbquery_cache_stats()
76-
print(f"Hit rate: {stats['hit_rate_percent']}%")
77-
print(f"Memory used: {stats['memory_cache_size_mb']}MB")
78-
print(f"Cache items: {stats['memory_cache_items']}")
79-
80-
# Get current configuration
81-
config = vfb.get_cache_config()
82-
print(f"TTL: {config['cache_ttl_hours']} hours")
83-
print(f"Memory limit: {config['memory_cache_size_mb']}MB")
76+
# Get SOLR cache statistics
77+
cache = vfb.get_solr_cache()
78+
stats = cache.get_cache_stats()
79+
print(f"Total cached items: {stats['total_documents']}")
80+
print(f"Cache size: {stats['total_size_mb']:.1f}MB")
8481
```
8582

8683
## Usage Examples
@@ -90,21 +87,21 @@ print(f"Memory limit: {config['memory_cache_size_mb']}MB")
9087
```python
9188
import vfbquery as vfb
9289

93-
# Caching is enabled automatically with optimal defaults
94-
# Adjust only if your application has specific needs
90+
# SOLR caching is enabled automatically with optimal defaults
91+
# Cache persists across application restarts
9592

96-
# Example: Long-running server with limited memory
97-
vfb.set_cache_memory_limit(512) # 512MB limit
98-
vfb.set_cache_ttl(168) # 1 week TTL
93+
# Example: Long-running server
94+
result = vfb.get_term_info('FBbt_00003748') # Fast on repeated runs
95+
instances = vfb.get_instances('FBbt_00003748') # Cached automatically
9996
```
10097

10198
### Jupyter Notebooks
10299

103100
```python
104101
import vfbquery as vfb
105102

106-
# Caching works automatically in notebooks
107-
# Data persists between kernel restarts
103+
# SOLR caching works automatically in notebooks
104+
# Data persists between kernel restarts and notebook sessions
108105

109106
result = vfb.get_term_info('FBbt_00003748') # Fast on repeated runs
110107
instances = vfb.get_instances('FBbt_00003748') # Cached automatically
@@ -114,14 +111,13 @@ instances = vfb.get_instances('FBbt_00003748') # Cached automatically
114111

115112
- **Dramatic Performance**: 54,000x speedup for repeated queries
116113
- **Zero Configuration**: Works out of the box with optimal settings
117-
- **Persistent Storage**: Cache survives Python restarts
118-
- **Memory Efficient**: LRU eviction prevents memory bloat
119-
- **Multi-layer Caching**: Optimizes SOLR queries, parsing, and results
114+
- **Persistent Storage**: SOLR cache survives Python restarts and server reboots
115+
- **Server-side Caching**: Shared across multiple processes/instances
120116
- **Production Ready**: 3-month TTL matches VFB_connect behavior
121117

122118
## Best Practices
123119

124-
- **Monitor performance**: Use `get_vfbquery_cache_stats()` regularly
125-
- **Adjust for your use case**: Tune memory limits for long-running applications
126-
- **Consider data freshness**: Shorter TTL for frequently changing data
120+
- **Monitor performance**: Use SOLR cache statistics regularly
121+
- **Clear when needed**: Use `clear_solr_cache()` to force fresh data
122+
- **Consider data freshness**: SOLR cache TTL ensures data doesn't become stale
127123
- **Disable when needed**: Use environment variable if caching isn't desired

src/test/test_query_performance.py

Lines changed: 3 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -53,13 +53,9 @@ class QueryPerformanceTest(unittest.TestCase):
5353

5454
@classmethod
5555
def setUpClass(cls):
56-
"""Enable caching for performance tests"""
57-
# Import caching module
58-
from vfbquery import cache_enhancements
59-
60-
# Enable caching to speed up repeated queries
61-
cache_enhancements.enable_vfbquery_caching()
62-
print("\n🔥 Caching enabled for performance tests")
56+
"""Set up for performance tests"""
57+
# SOLR caching is enabled by default
58+
print("\n🔥 SOLR caching enabled for performance tests")
6359

6460
def setUp(self):
6561
"""Set up test data"""

src/vfbquery/__init__.py

Lines changed: 8 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -1,50 +1,24 @@
11
from .vfb_queries import *
22
from .solr_result_cache import get_solr_cache
33

4-
# Caching enhancements (optional import - don't break if dependencies missing)
4+
# SOLR-based caching (simplified single-layer approach)
55
try:
6-
from .cache_enhancements import (
7-
enable_vfbquery_caching,
8-
disable_vfbquery_caching,
9-
clear_vfbquery_cache,
10-
get_vfbquery_cache_stats,
11-
set_cache_ttl,
12-
set_cache_memory_limit,
13-
set_cache_max_items,
14-
enable_disk_cache,
15-
disable_disk_cache,
16-
get_cache_config,
17-
CacheConfig
18-
)
196
from .cached_functions import (
207
get_term_info_cached,
21-
get_instances_cached,
22-
patch_vfbquery_with_caching,
23-
unpatch_vfbquery_caching
8+
get_instances_cached
249
)
2510
__caching_available__ = True
26-
27-
# Enable caching by default with 3-month TTL and 2GB memory cache
11+
12+
# Enable SOLR caching by default with 3-month TTL
2813
import os
29-
14+
3015
# Check if caching should be disabled via environment variable
3116
cache_disabled = os.getenv('VFBQUERY_CACHE_ENABLED', 'true').lower() in ('false', '0', 'no', 'off')
32-
17+
3318
if not cache_disabled:
34-
# Enable caching with VFB_connect-like defaults
35-
enable_vfbquery_caching(
36-
cache_ttl_hours=2160, # 3 months (90 days)
37-
memory_cache_size_mb=2048, # 2GB memory cache
38-
max_items=10000, # Max 10k items as safeguard
39-
disk_cache_enabled=True # Persistent across sessions
40-
)
41-
42-
# Automatically patch existing functions for transparent caching
43-
patch_vfbquery_with_caching()
44-
45-
print("VFBquery: Caching enabled by default (3-month TTL, 2GB memory)")
19+
print("VFBquery: SOLR caching enabled by default (3-month TTL)")
4620
print(" Disable with: export VFBQUERY_CACHE_ENABLED=false")
47-
21+
4822
except ImportError:
4923
__caching_available__ = False
5024
print("VFBquery: Caching not available (dependencies missing)")

src/vfbquery/cached_functions.py

Lines changed: 11 additions & 121 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
"""
77

88
from typing import Dict, Any, Optional
9-
from .cache_enhancements import cache_result, get_cache
9+
from .solr_result_cache import with_solr_cache
1010

1111

1212
def is_valid_term_info_result(result):
@@ -45,40 +45,20 @@ def is_valid_term_info_result(result):
4545
from .vfb_queries import (
4646
get_term_info as _original_get_term_info,
4747
get_instances as _original_get_instances,
48-
vfb_solr,
49-
term_info_parse_object as _original_term_info_parse_object,
50-
fill_query_results as _original_fill_query_results
48+
vfb_solr
5149
)
5250

53-
@cache_result("solr_search", "solr_cache_enabled")
51+
@with_solr_cache("solr_search")
5452
def cached_solr_search(query: str):
5553
"""Cached version of SOLR search."""
5654
return vfb_solr.search(query)
5755

58-
@cache_result("term_info_parse", "term_info_cache_enabled")
59-
def cached_term_info_parse_object(results, short_form: str):
60-
"""Cached version of term_info_parse_object."""
61-
return _original_term_info_parse_object(results, short_form)
62-
63-
@cache_result("query_results", "query_result_cache_enabled")
64-
def cached_fill_query_results(term_info: Dict[str, Any]):
65-
"""Cached version of fill_query_results."""
66-
return _original_fill_query_results(term_info)
67-
68-
@cache_result("get_instances", "query_result_cache_enabled")
69-
def cached_get_instances(short_form: str, return_dataframe=True, limit: int = -1):
70-
"""Cached version of get_instances."""
71-
return _original_get_instances(short_form, return_dataframe, limit)
72-
56+
@with_solr_cache("term_info")
7357
def get_term_info_cached(short_form: str, preview: bool = False):
7458
"""
75-
Enhanced get_term_info with multi-layer caching.
59+
Enhanced get_term_info with SOLR caching.
7660
77-
This version uses caching at multiple levels:
78-
1. Final result caching (entire term_info response)
79-
2. SOLR query result caching
80-
3. Term info parsing caching
81-
4. Query result caching
61+
This version caches complete term_info responses in SOLR for fast retrieval.
8262
8363
Args:
8464
short_form: Term short form (e.g., 'FBbt_00003748')
@@ -87,104 +67,14 @@ def get_term_info_cached(short_form: str, preview: bool = False):
8767
Returns:
8868
Term info dictionary or None if not found
8969
"""
90-
cache = get_cache()
91-
92-
# Check for complete result in cache first
93-
cache_key = cache._generate_cache_key("term_info_complete", short_form, preview)
94-
cached_result = cache.get(cache_key)
95-
print(f"DEBUG: Cache lookup for {short_form}: {'HIT' if cached_result is not None else 'MISS'}")
96-
if cached_result is not None:
97-
# Validate that cached result has essential fields
98-
if not is_valid_term_info_result(cached_result):
99-
print(f"DEBUG: Cached result incomplete for {short_form}, falling back to original function")
100-
print(f"DEBUG: cached_result keys: {list(cached_result.keys()) if cached_result else 'None'}")
101-
print(f"DEBUG: cached_result Id: {cached_result.get('Id', 'MISSING') if cached_result else 'None'}")
102-
print(f"DEBUG: cached_result Name: {cached_result.get('Name', 'MISSING') if cached_result else 'None'}")
103-
104-
# Fall back to original function and cache the complete result
105-
fallback_result = _original_get_term_info(short_form, preview)
106-
if is_valid_term_info_result(fallback_result):
107-
print(f"DEBUG: Fallback successful, caching complete result for {short_form}")
108-
cache.set(cache_key, fallback_result)
109-
return fallback_result
110-
else:
111-
print(f"DEBUG: Using valid cached result for {short_form}")
112-
return cached_result
113-
114-
parsed_object = None
115-
try:
116-
# Use cached SOLR search
117-
results = cached_solr_search('id:' + short_form)
118-
119-
# Use cached term info parsing
120-
parsed_object = cached_term_info_parse_object(results, short_form)
121-
122-
if parsed_object:
123-
# Use cached query result filling (skip if queries would fail)
124-
if parsed_object.get('Queries') and len(parsed_object['Queries']) > 0:
125-
try:
126-
term_info = cached_fill_query_results(parsed_object)
127-
if term_info:
128-
# Validate result before caching
129-
if term_info.get('Id') and term_info.get('Name'):
130-
# Cache the complete result
131-
cache.set(cache_key, term_info)
132-
return term_info
133-
else:
134-
print(f"Query result for {short_form} is incomplete, falling back to original function...")
135-
return _original_get_term_info(short_form, preview)
136-
else:
137-
print("Failed to fill query preview results!")
138-
# Validate result before caching
139-
if parsed_object.get('Id') and parsed_object.get('Name'):
140-
# Cache the complete result
141-
cache.set(cache_key, parsed_object)
142-
return parsed_object
143-
else:
144-
print(f"Parsed object for {short_form} is incomplete, falling back to original function...")
145-
return _original_get_term_info(short_form, preview)
146-
except Exception as e:
147-
print(f"Error filling query results (continuing without query data): {e}")
148-
# Validate result before caching
149-
if is_valid_term_info_result(parsed_object):
150-
cache.set(cache_key, parsed_object)
151-
return parsed_object
152-
else:
153-
print(f"DEBUG: Exception case - parsed object incomplete for {short_form}, falling back to original function")
154-
fallback_result = _original_get_term_info(short_form, preview)
155-
if is_valid_term_info_result(fallback_result):
156-
cache.set(cache_key, fallback_result)
157-
return fallback_result
158-
else:
159-
# No queries to fill, validate result before caching
160-
if parsed_object.get('Id') and parsed_object.get('Name'):
161-
# Cache and return parsed object directly
162-
cache.set(cache_key, parsed_object)
163-
return parsed_object
164-
else:
165-
print(f"DEBUG: No queries case - parsed object incomplete for {short_form}, falling back to original function...")
166-
fallback_result = _original_get_term_info(short_form, preview)
167-
if is_valid_term_info_result(fallback_result):
168-
cache.set(cache_key, fallback_result)
169-
return fallback_result
170-
else:
171-
print(f"No valid term info found for ID '{short_form}'")
172-
return None
173-
174-
except Exception as e:
175-
print(f"Error in cached get_term_info: {type(e).__name__}: {e}")
176-
# Fall back to original function if caching fails
177-
return _original_get_term_info(short_form, preview)
70+
return _original_get_term_info(short_form, preview)
17871

72+
@with_solr_cache("instances")
17973
def get_instances_cached(short_form: str, return_dataframe=True, limit: int = -1):
18074
"""
181-
Enhanced get_instances with caching.
75+
Enhanced get_instances with SOLR caching.
18276
183-
This cached version can provide dramatic speedup for repeated queries,
184-
especially useful for:
185-
- UI applications with repeated browsing
186-
- Data analysis workflows
187-
- Testing and development
77+
This cached version provides dramatic speedup for repeated queries.
18878
18979
Args:
19080
short_form: Class short form
@@ -194,7 +84,7 @@ def get_instances_cached(short_form: str, return_dataframe=True, limit: int = -1
19484
Returns:
19585
Instances data (DataFrame or formatted dict based on return_dataframe)
19686
"""
197-
return cached_get_instances(short_form, return_dataframe, limit)
87+
return _original_get_instances(short_form, return_dataframe, limit)
19888

19989
# Convenience function to replace original functions
20090
def patch_vfbquery_with_caching():

0 commit comments

Comments
 (0)