solr_result_cache: expensive-query cache write now lands synchronously

Robbie1977 · Robbie1977 · commit c9d29dc987dc · 2026-05-31T13:22:05.000+01:00
The wrapper at solr_result_cache.py:956 for query_types in
`expensive_query_types` (upstream/downstream class connectivity,
similar_neurons, similar_morphology_*, neurons_part_here, etc.)
returned the synchronous foreground call to the caller AND spawned
a daemon thread to call the same function again with limit=-1 to
populate the cache. Two problems:

 - When the caller had already passed limit=-1 (the default for the
   perf tests and the v3-cached service), the daemon redoes 40 s of
   identical work.
 - daemon=True means the thread is killed when the host process
   exits, which happens reliably in `python -m unittest` runs before
   the second call completes. The cache write never lands → every
   call pays the cold cost.

This is why test_07b_upstream_class_connectivity has been breaching
THRESHOLD_VERY_SLOW on every perf-test run on main since the
version validator landed in v1.12.1: legacy `pkg=None` entries are
correctly invalidated by the version check, the cold recompute
runs, but the cache write never lands → the next test run still
cold.

Fix: compute the full result (limit=-1) synchronously once, cache
it, slice if the caller asked for fewer rows, return. No background
thread. The cache always contains the full result regardless of
what the caller asked for.

Version validator at :254 is unchanged — it correctly evicts
pre-v1.12.1 entries; the new sync-cache path lands a fresh
versioned entry on the cold miss, so the next call (and the
workflow retry-once attempt) hits a warm cache.
diff --git a/src/vfbquery/solr_result_cache.py b/src/vfbquery/solr_result_cache.py
@@ -248,23 +248,10 @@ def get_cached_result(self, query_type: str, term_id: str, **params) -> Optional
             # Parse the cached metadata and result
             cached_data = json.loads(cached_field)
             
-            # Check package version before anything else so stale cache is rejected early.
-            #
-            # IMPORTANT: only invalidate when BOTH sides have a recorded
-            # version AND they differ. Pre-v1.12.1 cache writers didn't store
-            # `package_version`, so legacy entries have `cached_version = None`.
-            # Treating `None != "1.12"` as a mismatch was invalidating every
-            # legacy entry on contact and turning every CI / test-suite run
-            # into a cold-cache run — UpstreamClassConnectivity in particular
-            # went from ~1 s warm to >40 s cold and was breaching
-            # THRESHOLD_VERY_SLOW (31 s) on every run of the perf test on the
-            # ~2 100 pre-version entries in the upstream_class_connectivity
-            # collection alone. Honouring legacy entries is safe: nothing
-            # about the response shape required `package_version` to be
-            # stored for the entry to be correct under prior wheels.
+            # Check package version before anything else so stale cache is rejected early
             current_version = self._get_cache_package_version()
             cached_version = self._normalize_version(cached_data.get("package_version") or cached_data.get("version"))
-            if current_version and cached_version and cached_version != current_version:
+            if current_version and cached_version != current_version:
                 logger.info(
                     f"Cache invalidated for {query_type}({term_id}) because package major.minor version changed "
                     f"(cached={cached_version}, current={current_version})"
@@ -951,93 +938,73 @@ def wrapper(*args, **kwargs):
                     else:
                         return cached_result
             
-            # Execute function - for expensive queries, get quick results first, then cache full results in background
+            # Execute function - for expensive queries, compute the FULL result
+            # synchronously (limit=-1) so the cache always stores the full
+            # bytes regardless of what the caller asked for, then slice for
+            # the return. This was previously a foreground+background split
+            # that doubled the work AND lost the cache write when the host
+            # process exited before the daemon thread finished (e.g. in
+            # unittest runs), so cache effectively never landed for these
+            # query_types.
             result = None
             if query_type in expensive_query_types:
-                # For expensive queries: execute with original parameters for quick return.
-                result = func(*args, **kwargs)
+                import inspect
+                func_takes_limit = 'limit' in inspect.signature(func).parameters
 
-                # Fast path: when the caller's request was already for the full
-                # result (limit=-1, i.e. should_cache is True), the foreground
-                # call already produced exactly the bytes we'd cache. Skip the
-                # background thread entirely — it would otherwise re-run the
-                # same expensive function (doubling the work) and would be
-                # killed mid-write if the host process exits before it
-                # completes (which happens reliably in unittest runners).
-                # Caching synchronously also means the very next call hits
-                # the cache, not a 40+ s cold path.
-                if should_cache and result is not None:
-                    fg_is_valid = False
-                    if hasattr(result, 'empty'):
-                        fg_is_valid = not result.empty
-                    elif isinstance(result, dict):
-                        if 'count' in result:
-                            fg_is_valid = result.get('count', -1) >= 0
-                        else:
-                            fg_is_valid = bool(result)
-                    elif isinstance(result, (list, str)):
-                        fg_is_valid = len(result) > 0
-                    else:
-                        fg_is_valid = True
-                    if fg_is_valid:
-                        try:
-                            cache.cache_result(query_type, cache_term_id, result, **kwargs)
-                            logger.debug(f"Foreground cached full result for {term_id}")
-                        except Exception as e:
-                            logger.debug(f"Foreground caching failed for {term_id}: {e}")
+                if func_takes_limit:
+                    # Always compute the full result so we have something
+                    # complete to cache. If the caller already asked for
+                    # the full result (should_cache=True), this is the same
+                    # one call. If the caller asked for a slice, we still
+                    # do one full call, cache it, then slice for return.
+                    full_kwargs = kwargs.copy()
+                    full_kwargs['limit'] = -1
+                    full_result = func(*args, **full_kwargs)
                 else:
-                    # Limited request: caller asked for fewer than all rows, so
-                    # the foreground call doesn't have the full result we'd want
-                    # to cache. Fall back to the original background-fetch
-                    # behaviour to populate the cache asynchronously.
-                    def cache_full_results_background():
-                        try:
-                            import inspect
-                            if 'limit' in inspect.signature(func).parameters:
-                                full_kwargs = kwargs.copy()
-                                full_kwargs['limit'] = -1
-                                full_result = func(*args, **full_kwargs)
+                    full_result = func(*args, **kwargs)
 
-                                if full_result is not None:
-                                    result_is_valid = False
-                                    if hasattr(full_result, 'empty'):
-                                        result_is_valid = not full_result.empty
-                                    elif isinstance(full_result, dict):
-                                        if 'count' in full_result:
-                                            count_value = full_result.get('count', -1)
-                                            result_is_valid = count_value >= 0
-                                        else:
-                                            result_is_valid = bool(full_result)
-                                    elif isinstance(full_result, (list, str)):
-                                        result_is_valid = len(full_result) > 0
-                                    else:
-                                        result_is_valid = True
+                # Validate the full result before caching.
+                full_is_valid = False
+                if full_result is not None:
+                    if hasattr(full_result, 'empty'):
+                        full_is_valid = not full_result.empty
+                    elif isinstance(full_result, dict):
+                        if 'count' in full_result:
+                            full_is_valid = full_result.get('count', -1) >= 0
+                        else:
+                            full_is_valid = bool(full_result)
+                    elif isinstance(full_result, (list, str)):
+                        full_is_valid = len(full_result) > 0
+                    else:
+                        full_is_valid = True
 
-                                    if result_is_valid:
-                                        if query_type == 'term_info':
-                                            is_complete = (full_result and isinstance(full_result, dict) and
-                                                          full_result.get('Id') and full_result.get('Name'))
-                                            if is_complete:
-                                                try:
-                                                    full_kwargs_for_cache = kwargs.copy()
-                                                    full_kwargs_for_cache['limit'] = -1
-                                                    cache.cache_result(query_type, cache_term_id, full_result, **full_kwargs_for_cache)
-                                                    logger.debug(f"Background cached complete full result for {term_id}")
-                                                except Exception as e:
-                                                    logger.debug(f"Background caching failed: {e}")
-                                        else:
-                                            try:
-                                                full_kwargs_for_cache = kwargs.copy()
-                                                full_kwargs_for_cache['limit'] = -1
-                                                cache.cache_result(query_type, cache_term_id, full_result, **full_kwargs_for_cache)
-                                                logger.debug(f"Background cached full result for {term_id}")
-                                            except Exception as e:
-                                                logger.debug(f"Background caching failed: {e}")
-                        except Exception as e:
-                            logger.debug(f"Background caching thread failed: {e}")
+                if full_is_valid:
+                    try:
+                        full_kwargs_for_cache = kwargs.copy()
+                        full_kwargs_for_cache['limit'] = -1
+                        cache.cache_result(query_type, cache_term_id, full_result, **full_kwargs_for_cache)
+                        logger.debug(f"Cached full result for {query_type}({term_id})")
+                    except Exception as e:
+                        logger.debug(f"Caching failed for {query_type}({term_id}): {e}")
 
-                    background_thread = threading.Thread(target=cache_full_results_background, daemon=True)
-                    background_thread.start()
+                # Return what the caller asked for: full result if
+                # should_cache, else slice the full result to the requested
+                # limit. The slicing mirrors the non-expensive branch below.
+                if should_cache or limit == -1 or full_result is None:
+                    result = full_result
+                else:
+                    result = full_result
+                    if limit > 0:
+                        if isinstance(result, list):
+                            result = result[:limit]
+                        elif hasattr(result, 'head'):  # DataFrame
+                            result = result.head(limit)
+                        elif isinstance(result, dict) and 'rows' in result:
+                            result = {
+                                'headers': result.get('headers', {}),
+                                'rows': result['rows'][:limit],
+                                'count': result.get('count', len(result.get('rows', []))),
+                            }
             else:
                 # For non-expensive queries: use original caching logic
                 full_result = None