Skip to content

Fix Discover query vector cache bounds#2091

Merged
JSv4 merged 2 commits into
mainfrom
codex/propose-fix-for-discover-query-cache-vulnerability
Jun 29, 2026
Merged

Fix Discover query vector cache bounds#2091
JSv4 merged 2 commits into
mainfrom
codex/propose-fix-for-discover-query-cache-vulnerability

Conversation

@JSv4

@JSv4 JSv4 commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator

Motivation

  • The process-global LRU used the raw textSearch string as a cache key without any length bound, allowing attacker-controlled large strings to be retained in worker memory and enabling a remote memory-DoS when a default embedder is configured.
  • The cache also stored failed (None) embeddings, which could pin attacker inputs in the LRU until eviction.

Description

  • Add a new constant DISCOVER_TEXT_SEARCH_MAX_LENGTH = 512 to bound accepted Discover textSearch input. (opencontractserver/constants/search.py)
  • Introduce _normalise_text_search() to strip and reject empty or overlong searches, and route all five Discover resolvers to call it before performing text or semantic work. (config/graphql/discover_queries.py)
  • Change the query-vector cache wrapper to raise _UncacheableQueryVector when embedding fails instead of returning/caching None, and update _semantic_ids() to catch that exception and fall back to text-only results. (config/graphql/discover_queries.py)
  • Add regression tests asserting that an overlong query returns an empty result before any embedding call and that failed embeddings do not populate the LRU cache. (opencontractserver/tests/test_discover_search_graphql.py)

Testing

  • python -m py_compile config/graphql/discover_queries.py opencontractserver/constants/search.py opencontractserver/tests/test_discover_search_graphql.py succeeded (syntax check passed).
  • git diff --check produced no whitespace/patch errors.
  • pytest opencontractserver/tests/test_discover_search_graphql.py could not be executed in this environment due to a missing django dependency (ModuleNotFoundError: No module named 'django'), so full test-suite execution was not performed here.

Codex Task

@claude

claude Bot commented Jun 29, 2026

Copy link
Copy Markdown

Review: Fix Discover query vector cache bounds

This PR correctly addresses two real security issues: unbounded user-controlled strings being retained as LRU cache keys, and failed embedding calls being cached as None (pinning the query string in worker memory until LRU eviction). The approach — input normalisation at the resolver layer plus converting the cache wrapper to raise-on-failure — is sound. A few issues worth addressing before merge:


1. Stale return-type annotation on _cached_query_vector — maintenance hazard

config/graphql/discover_queries.py, line 149

The function signature still reads -> Optional[list], but after this PR the function never returns None — it either returns a list or raises _UncacheableQueryVector. Any future caller that copies the old call pattern:

vector = _cached_query_vector(query_text, embedder_path)
if not vector:   # dead branch — None can never reach here
    return []

would silently swallow the uncaught exception as a 500 error to GraphQL instead of the intended text-only graceful fallback. Update the annotation to -> list so mypy and future readers see the real contract.


2. Missing changelog fragment — CLAUDE.md requirement

changelog.d/

CLAUDE.md mandates a changelog.d/<slug>.security.md (or .fixed.md) fragment for any security fix. No 2091* file appears in changelog.d/. The closest existing fragments (1919-discover-followup.changed.md, 1919-discover-followup.fixed.md) cover the prior caching refactor, not this DoS guard.

Add something like changelog.d/2091-discover-cache-dos.security.md with a bullet describing the vulnerability and fix.


3. Protection lives only at call sites, not at the resource boundary — altitude concern

config/graphql/discover_queries.py, line 119 (_normalise_text_search) and line 149 (_cached_query_vector)

The 512-char cap is enforced inside each of the five resolver methods. _cached_query_vector itself accepts an unbounded query_text. If a sixth resolver is added later (or any internal caller reuses _semantic_ids directly), the protection is silently absent — the long string becomes an LRU cache key with no warning.

The minimum fix is a # PRECONDITION: query_text must have passed _normalise_text_search docstring on both _semantic_ids and _cached_query_vector. A stronger fix is a defensive assert len(query_text) <= DISCOVER_TEXT_SEARCH_MAX_LENGTH inside _cached_query_vector that makes a violation loud rather than silent.


4. test_overlong_query_returns_empty_before_searching verifies the semantic arm only, not the text arm

opencontractserver/tests/test_discover_search_graphql.py, line 307

The test patches _query_vector and asserts it was never called, which proves the semantic arm is bypassed. It does not assert that the text arm (_text_ids) was also skipped. If a future refactor moves the length check to after the _text_ids call (guarding only the LRU), the test would still pass while the unbounded DB text-predicate DoS vector is silently reintroduced. Adding patch("config.graphql.discover_queries._text_ids") alongside the existing mock and asserting that too was not called would close this gap.


Minor: test_overlong_query_returns_empty_before_searching assertion is outside the with block

opencontractserver/tests/test_discover_search_graphql.py, line 310

        with patch("config.graphql.discover_queries._query_vector") as query_vector:
            result = self.graphene_client.execute(...)
        self.assertIsNone(result.get("errors"), ...)
        self.assertEqual(...)
        query_vector.assert_not_called()   # <-- outside the with block

query_vector.assert_not_called() runs after the patch has been deactivated. This works correctly in practice (mock objects retain their call history), but if the resolver ever spawns an async or deferred post-execute callback that calls _query_vector, those calls would not be captured by the inner mock after the with exits. Move lines 308–310 inside the with block to make the assertion scope explicit.


Overall the security intent is correct and the test coverage for the new behaviour is a good start. Addressing the stale type annotation and the missing changelog fragment are the two must-haves before merge; the other points are improvements worth discussing.

@codecov

codecov Bot commented Jun 29, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@JSv4 JSv4 merged commit f025fd6 into main Jun 29, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant