Skip to content

Commit 7988e69

Browse files
authored
Merge pull request #37 from VirtualFlyBrain/dev
Cleaned up caching
2 parents cfbfd6e + a9d4593 commit 7988e69

25 files changed

Lines changed: 7063 additions & 1906 deletions

.github/workflows/examples.yml

Lines changed: 20 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,9 +22,25 @@ jobs:
2222
pip install -r requirements.txt
2323
pip install deepdiff colorama
2424
pip install .
25+
- name: Check SOLR availability
26+
run: |
27+
python -c "
28+
import os
29+
try:
30+
import vfbquery as vfb
31+
result = vfb.get_term_info('FBbt_00003748')
32+
with open(os.environ['GITHUB_ENV'], 'a') as f:
33+
f.write('SOLR_AVAILABLE=true\n')
34+
print('SOLR is available')
35+
except Exception as e:
36+
print('SOLR not available:', e)
37+
with open(os.environ['GITHUB_ENV'], 'a') as f:
38+
f.write('SOLR_AVAILABLE=false\n')
39+
exit(1)
40+
"
2541
- name: Run examples from README.md
2642
run: |
27-
cat README.md | grep -e '```python' -e '```' -e '^[^`]*$' | sed -e '/^```python/,/^```/!d' -e '/^```/d' -e 's/\(vfb.[^)]*)\)/print(\1)/g' > test_examples.py
43+
cat README.md | grep -e '```python' -e '```' -e '^[^`]*$' | sed -e '/^```python/,/^```/!d' -e '/^```/d' -e 's/\(vfb\.[^(]*([^)]*)\)/print(\1)/g' > test_examples.py
2844
cat test_examples.py
2945
export VFBQUERY_CACHE_ENABLED=false
3046
python test_examples.py
@@ -33,8 +49,10 @@ jobs:
3349
python -m src.test.readme_parser
3450
env:
3551
PYTHONPATH: ${{ github.workspace }}
36-
- name: Run examples from README.md and compare JSON outputs
52+
if: env.SOLR_AVAILABLE == 'true'
53+
- name: Run examples from README.md and validate structure
3754
run: |
3855
python -m src.test.test_examples_diff
3956
env:
4057
PYTHONPATH: ${{ github.workspace }}
58+
if: env.SOLR_AVAILABLE == 'true'

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,3 +12,5 @@ test_results.py
1212
.pytest_cache
1313
.venv
1414
.vscode/settings.json
15+
temp_examples_output.txt
16+
json_block_*.json

CACHING.md

Lines changed: 43 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -1,41 +1,43 @@
11
# VFBquery Caching Guide
22

3-
VFBquery includes intelligent caching for optimal performance. Caching is **enabled by default** with production-ready settings.
3+
VFBquery includes intelligent SOLR-based caching for optimal performance. Caching is **enabled by default** with production-ready settings.
44

55
## Default Behavior
66

7-
VFBquery automatically enables caching when imported:
7+
VFBquery automatically enables SOLR caching when imported:
88

99
```python
1010
import vfbquery as vfb
1111

12-
# Caching is already active with optimal settings:
12+
# SOLR caching is already active with optimal settings:
1313
# - 3-month cache duration
14-
# - 2GB memory cache with LRU eviction
15-
# - Persistent disk storage
14+
# - Persistent across sessions
1615
# - Zero configuration required
1716

1817
result = vfb.get_term_info('FBbt_00003748') # Cached automatically
1918
```
2019

20+
## How It Works
21+
22+
VFBquery uses a single-layer caching approach with SOLR:
23+
24+
1. **First query**: Fetches data from Neo4j/Owlery and caches in SOLR
25+
2. **Subsequent queries**: Served directly from SOLR cache
26+
3. **Cache persistence**: Survives Python restarts and server reboots
27+
4. **Automatic expiration**: 3-month TTL matches VFB_connect behavior
28+
2129
## Runtime Configuration
2230

23-
Adjust cache settings while your application is running:
31+
Control caching behavior:
2432

2533
```python
2634
import vfbquery as vfb
2735

28-
# Modify cache duration
29-
vfb.set_cache_ttl(720) # 1 month
30-
vfb.set_cache_ttl(24) # 1 day
31-
32-
# Adjust memory limits
33-
vfb.set_cache_memory_limit(512) # 512MB
34-
vfb.set_cache_max_items(5000) # 5K items
36+
# Clear specific cache entries
37+
vfb.clear_solr_cache('term_info', 'FBbt_00003748')
3538

36-
# Toggle disk persistence
37-
vfb.disable_disk_cache() # Memory-only
38-
vfb.enable_disk_cache() # Restore persistence
39+
# Get SOLR cache statistics
40+
stats = vfb.get_solr_cache().get_cache_stats()
3941
```
4042

4143
### Environment Control
@@ -48,39 +50,38 @@ export VFBQUERY_CACHE_ENABLED=false
4850

4951
## Performance Benefits
5052

51-
VFBquery caching provides significant performance improvements:
53+
VFBquery SOLR caching provides significant performance improvements:
5254

5355
```python
5456
import vfbquery as vfb
5557

56-
# First query: builds cache (~1-2 seconds)
58+
# First query: builds SOLR cache (~1-2 seconds)
5759
result1 = vfb.get_term_info('FBbt_00003748')
5860

59-
# Subsequent queries: served from cache (<0.1 seconds)
61+
# Subsequent queries: served from SOLR cache (<0.1 seconds)
6062
result2 = vfb.get_term_info('FBbt_00003748') # 54,000x faster!
63+
64+
# Similarity queries are also cached
65+
similar = vfb.get_similar_neurons('VFB_jrchk00s') # Cached after first run
6166
```
6267

6368
**Typical Performance:**
6469

6570
- First query: 1-2 seconds
6671
- Cached queries: <0.1 seconds
6772
- Speedup: Up to 54,000x for complex queries
73+
- **NBLAST similarity queries**: 10+ seconds → <0.1 seconds (cached)
6874

6975
## Monitoring Cache Performance
7076

7177
```python
7278
import vfbquery as vfb
7379

74-
# Get cache statistics
75-
stats = vfb.get_vfbquery_cache_stats()
76-
print(f"Hit rate: {stats['hit_rate_percent']}%")
77-
print(f"Memory used: {stats['memory_cache_size_mb']}MB")
78-
print(f"Cache items: {stats['memory_cache_items']}")
79-
80-
# Get current configuration
81-
config = vfb.get_cache_config()
82-
print(f"TTL: {config['cache_ttl_hours']} hours")
83-
print(f"Memory limit: {config['memory_cache_size_mb']}MB")
80+
# Get SOLR cache statistics
81+
cache = vfb.get_solr_cache()
82+
stats = cache.get_cache_stats()
83+
print(f"Total cached items: {stats['total_documents']}")
84+
print(f"Cache size: {stats['total_size_mb']:.1f}MB")
8485
```
8586

8687
## Usage Examples
@@ -90,21 +91,21 @@ print(f"Memory limit: {config['memory_cache_size_mb']}MB")
9091
```python
9192
import vfbquery as vfb
9293

93-
# Caching is enabled automatically with optimal defaults
94-
# Adjust only if your application has specific needs
94+
# SOLR caching is enabled automatically with optimal defaults
95+
# Cache persists across application restarts
9596

96-
# Example: Long-running server with limited memory
97-
vfb.set_cache_memory_limit(512) # 512MB limit
98-
vfb.set_cache_ttl(168) # 1 week TTL
97+
# Example: Long-running server
98+
result = vfb.get_term_info('FBbt_00003748') # Fast on repeated runs
99+
instances = vfb.get_instances('FBbt_00003748') # Cached automatically
99100
```
100101

101102
### Jupyter Notebooks
102103

103104
```python
104105
import vfbquery as vfb
105106

106-
# Caching works automatically in notebooks
107-
# Data persists between kernel restarts
107+
# SOLR caching works automatically in notebooks
108+
# Data persists between kernel restarts and notebook sessions
108109

109110
result = vfb.get_term_info('FBbt_00003748') # Fast on repeated runs
110111
instances = vfb.get_instances('FBbt_00003748') # Cached automatically
@@ -114,14 +115,14 @@ instances = vfb.get_instances('FBbt_00003748') # Cached automatically
114115

115116
- **Dramatic Performance**: 54,000x speedup for repeated queries
116117
- **Zero Configuration**: Works out of the box with optimal settings
117-
- **Persistent Storage**: Cache survives Python restarts
118-
- **Memory Efficient**: LRU eviction prevents memory bloat
119-
- **Multi-layer Caching**: Optimizes SOLR queries, parsing, and results
118+
- **Persistent Storage**: SOLR cache survives Python restarts and server reboots
119+
- **Server-side Caching**: Shared across multiple processes/instances
120+
- **Similarity Queries**: NBLAST and morphological similarity searches are cached
120121
- **Production Ready**: 3-month TTL matches VFB_connect behavior
121122

122123
## Best Practices
123124

124-
- **Monitor performance**: Use `get_vfbquery_cache_stats()` regularly
125-
- **Adjust for your use case**: Tune memory limits for long-running applications
126-
- **Consider data freshness**: Shorter TTL for frequently changing data
125+
- **Monitor performance**: Use SOLR cache statistics regularly
126+
- **Clear when needed**: Use `clear_solr_cache()` to force fresh data
127+
- **Consider data freshness**: SOLR cache TTL ensures data doesn't become stale
127128
- **Disable when needed**: Use environment variable if caching isn't desired

0 commit comments

Comments
 (0)