Skip to content

LCORE-1446: Add support for metadata filters in Solr vector search#1581

Merged
tisnik merged 1 commit into
lightspeed-core:mainfrom
anik120:vector-io-filters
May 18, 2026
Merged

LCORE-1446: Add support for metadata filters in Solr vector search#1581
tisnik merged 1 commit into
lightspeed-core:mainfrom
anik120:vector-io-filters

Conversation

@anik120
Copy link
Copy Markdown
Contributor

@anik120 anik120 commented Apr 23, 2026

Description

Updates the Solr vector search integration to support the new Filter API introduced in llama-stack 0.6.0, enabling metadata-based filtering of RAG results using comparison and compound filters.

Filter format examples:
Simple: {"filters": {"type": "eq", "key": "platform", "value": "openshift"}}
Compound: {"filters": {"type": "and", "filters": [...]}}

Note: This change requires lightspeed-providers with solr_vector_io filter support, introduced in lightspeed-core/lightspeed-providers#119

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement
  • Benchmarks improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: Claude

Related Tickets & Documents

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added structured metadata filtering with comparison operators (eq, ne, in, nin) and compound filters (and, or) for vector search queries.
    • Maintained backward compatibility with legacy filter formats.
  • Documentation

    • Updated guides and API documentation with metadata filtering examples and per-request filter syntax.
    • Documented how static and dynamic filters combine using AND logic across all search modes.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 23, 2026

Walkthrough

This PR adds support for structured metadata filtering in Solr queries. The implementation extracts nested filter objects from request payloads, routes them to a separate parameters key, documents the feature in request schemas and API specs, provides user guidance with examples, and validates behavior with new unit tests.

Changes

Structured Solr filter support

Layer / File(s) Summary
Request model schema updates
src/models/common/query.py
SolrVectorSearchRequest.filters field metadata expanded with descriptions of structured comparison operators (eq, ne, in, nin) and examples showing both structured and legacy filter formats.
Query parameter extraction logic
src/utils/vector_search.py
Core _build_query_params function detects nested "filters" key in solr.filters dict, maps it to top-level params["filters"], routes remaining keys to params["solr"], and preserves legacy behavior when the nested key is absent; docstring updated to document the structured format.
API documentation and examples
docs/openapi.json
OpenAPI schema "Filters" object description updated to document structured metadata operators and accept legacy fq payloads; examples expanded to show structured filter objects (with eq, in, and operators) and legacy format.
User guides and examples
docs/okp_guide.md
OKP guide Step 4 adds static product filtering example via okp.chunk_filter_query and new "Dynamic Metadata Filtering" section describing per-request structured filter syntax, supported operators, behavioral rules (AND combination with static filters, string escaping), and multiple curl examples for simple equality, in lists, and nested and/or compounds.
Unit tests for filter extraction
tests/unit/utils/test_vector_search.py
New _build_query_params test cases validate legacy filters.fq passthrough to params["solr"], structured filters.filters extraction to top-level params["filters"], mixed structured+other Solr parameters, and compound type: "and" preservation; includes minor formatting adjustments to existing mode-related tests.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically identifies the main change: adding metadata filter support to Solr vector search, matching the primary modifications across implementation, tests, and documentation.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
✨ Simplify code
  • Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/models/requests.py (1)

127-150: ⚠️ Potential issue | 🟡 Minor

Update the canonical filters description to match the new routing.

filters is no longer always passed through as params["solr"]; structured metadata filters are extracted to top-level params["filters"] in src/utils/vector_search.py. Keeping this doc/field description stale conflicts with the new examples.

Proposed doc update
 class SolrVectorSearchRequest(BaseModel):
     """LCORE Solr inline RAG options for ``vector_io.query`` (mode and provider filters).
 
     Attributes:
         mode: Solr vector_io search mode. When omitted, the server default (hybrid) is used.
-        filters: Solr provider filter payload passed through as params['solr'].
+        filters: Solr provider filter payload. Structured metadata filters under
+            a nested ``filters`` key are sent to top-level ``params["filters"]``;
+            remaining legacy Solr options are passed under ``params["solr"]``.
@@
     filters: Optional[dict[str, Any]] = Field(
         None,
-        description="Solr provider filter payload passed through as params['solr'].",
+        description=(
+            "Solr provider filter payload. Structured metadata filters under "
+            "a nested 'filters' key are sent as top-level vector_io filters; "
+            "remaining legacy Solr options are passed under params['solr']."
+        ),
         examples=[{"fq": ["product:*openshift*", "product_version:*4.16*"]}],
     )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/models/requests.py` around lines 127 - 150, Update the Field description
for the filters attribute in the LCORE Solr inline RAG options (the filters
field defined in this model used by vector_io.query) to reflect the new routing:
explain that structured metadata filters are extracted to top-level
params["filters"] while remaining Solr-specific payloads are kept under
params["solr"], and adjust the example text accordingly so it no longer states
that filters are always passed through as params["solr"]; reference the
extraction logic in src/utils/vector_search.py when rewording to make the
distinction clear.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/models/requests.py`:
- Around line 290-312: Reformat the unformatted nested dict/list literals in the
Solr examples block (the entries containing "mode": "hybrid", "mode":
"semantic", and the {"filters": {"fq": ["product:*openshift*"]}} entry) to
comply with Black style; run Black (or your project's pre-commit formatting) on
src/models/requests.py so the nested dictionaries and lists are reformatted
consistently (wrap long lines, align brackets and commas) and commit the
reflowed code.

In `@src/utils/vector_search.py`:
- Line 90: Reformat the long dict comprehension into Black-friendly multi-line
form: break the expression that assigns remaining_filters from
solr.filters.items() across lines so Black won't reflow it; locate the
assignment to remaining_filters (the comprehension over solr.filters.items())
and split the dict literal, the for clause, and the if clause onto separate
lines consistent with Black's preferred formatting.
- Around line 61-68: Update the docstring for the function that builds the
parameter dictionary for vector_io.query (the code handling the solr argument)
to clarify that only structured filters (the structured Solr request: mode and
structured filters) are lifted to the top level, while legacy/unnested filter
payloads remain nested under params["solr"]; explicitly mention the solr
argument and that legacy filters are preserved under params["solr"] so callers
know which filters are moved vs retained.

In `@tests/unit/utils/test_vector_search.py`:
- Around line 83-136: Reformat the inline
SolrVectorSearchRequest.model_validate(...) test fixtures in
tests/unit/utils/test_vector_search.py to Black-compliant multi-line style with
trailing commas (for each dict passed to
SolrVectorSearchRequest.model_validate), then re-run Black/flake formatting;
focus on the fixtures in test_with_filters_and_other_solr_params,
test_with_compound_filter, and the earlier test block that calls
_build_query_params(solr=solr) so the dict literals are split across lines,
keys/values have trailing commas, and the file passes Black. Use the same symbol
references (SolrVectorSearchRequest.model_validate and _build_query_params) to
locate and update the offending calls.

---

Outside diff comments:
In `@src/models/requests.py`:
- Around line 127-150: Update the Field description for the filters attribute in
the LCORE Solr inline RAG options (the filters field defined in this model used
by vector_io.query) to reflect the new routing: explain that structured metadata
filters are extracted to top-level params["filters"] while remaining
Solr-specific payloads are kept under params["solr"], and adjust the example
text accordingly so it no longer states that filters are always passed through
as params["solr"]; reference the extraction logic in src/utils/vector_search.py
when rewording to make the distinction clear.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7227812e-62f3-4284-997b-ba1189f40da7

📥 Commits

Reviewing files that changed from the base of the PR and between e59b4b9 and b6f480c.

📒 Files selected for processing (3)
  • src/models/requests.py
  • src/utils/vector_search.py
  • tests/unit/utils/test_vector_search.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)
  • GitHub Check: radon
  • GitHub Check: Pylinter
  • GitHub Check: unit_tests (3.12)
  • GitHub Check: unit_tests (3.13)
  • GitHub Check: integration_tests (3.13)
  • GitHub Check: integration_tests (3.12)
  • GitHub Check: build-pr
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 3
  • GitHub Check: E2E: server mode / ci / group 2
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: E2E: library mode / ci / group 3
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Import FastAPI dependencies with: from fastapi import APIRouter, HTTPException, Request, status, Depends
Import Llama Stack client with: from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Type aliases defined at module level for clarity
Use Final[type] as type hint for all constants
All functions require docstrings with brief descriptions
Complete type annotations for parameters and return types in functions
Use typing_extensions.Self for model validators in Pydantic models
Use modern union type syntax str | int instead of Union[str, int]
Use Optional[Type] for optional type hints
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead
Use async def for I/O operations and external API calls
Handle APIConnectionError from Llama Stack in error handling
Use standard log levels with clear purposes: debug, info, warning, error
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Use ABC for abstract base classes with @abstractmethod decorators
Use @model_validator and @field_validator for Pydantic model validation
Complete type annotations for all class attributes; use specific types, not Any
Follow Google Python docstring conventions with Parameters, Returns, Raises, and Attributes sections

Files:

  • src/utils/vector_search.py
  • src/models/requests.py
  • tests/unit/utils/test_vector_search.py
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Pydantic models extend ConfigurationBase for config, BaseModel for data models

Files:

  • src/utils/vector_search.py
  • src/models/requests.py
tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest; pytest is the standard for this project
Use pytest-mock for AsyncMock objects in tests
Use marker pytest.mark.asyncio for async tests
Unit tests require 60% coverage, integration tests 10%

Files:

  • tests/unit/utils/test_vector_search.py
🧠 Learnings (4)
📚 Learning: 2026-03-17T11:34:53.242Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 1335
File: docs/config.md:411-411
Timestamp: 2026-03-17T11:34:53.242Z
Learning: In the lightspeed-stack project (`src/models/config.py`, `docs/config.md`), the internal Solr filter `is_chunk:true` (defined as `SOLR_CHUNK_FILTER_QUERY` in `src/constants.py`) is always injected by the system for OKP searches and is intentionally hidden from users. The `chunk_filter_query` field in `OkpConfiguration` is user-facing and additive-only, but the documentation must NOT mention the internal `is_chunk:true` behavior — this is a deliberate design decision by the maintainers.

Applied to files:

  • src/utils/vector_search.py
  • src/models/requests.py
  • tests/unit/utils/test_vector_search.py
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.

Applied to files:

  • src/models/requests.py
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.

Applied to files:

  • src/models/requests.py
📚 Learning: 2026-04-15T18:54:07.540Z
Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1510
File: src/models/requests.py:769-773
Timestamp: 2026-04-15T18:54:07.540Z
Learning: In lightspeed-core/lightspeed-stack request models, keep schema-level size limits defined as inline numeric literals (e.g., Pydantic Field max_length values used for validation) rather than extracting them into constants.py. Only extract values intended as configurable runtime defaults (e.g., DEFAULT_* settings like header/file upload sizes). Do not flag inline numeric literals used directly in Pydantic Field constraints or field validators as an extraction-to-constants issue.

Applied to files:

  • src/models/requests.py
🪛 GitHub Actions: Black
src/utils/vector_search.py

[error] 1-1: Black --check would reformat this file.

src/models/requests.py

[error] 1-1: Black --check would reformat this file.

tests/unit/utils/test_vector_search.py

[error] 1-1: Black --check would reformat this file.

🔇 Additional comments (1)
src/models/requests.py (1)

777-779: ResponsesRequest doc update looks aligned.

The docstring now advertises structured and legacy Solr filter support for the Responses API request model.

Comment thread src/models/requests.py Outdated
Comment thread src/utils/vector_search.py
Comment thread src/utils/vector_search.py Outdated
Comment thread tests/unit/utils/test_vector_search.py Outdated
@anik120 anik120 force-pushed the vector-io-filters branch from b6f480c to 3b5b567 Compare April 23, 2026 16:36
@anik120 anik120 changed the title Add support for metadata filters in Solr vector search LCORE-1446: Add support for metadata filters in Solr vector search Apr 23, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
src/models/requests.py (1)

290-312: ⚠️ Potential issue | 🟡 Minor

Run Black on the Solr examples.

CI still reports src/models/requests.py would be reformatted; this example block has missing trailing commas and a long inline dict/list literal.

Black-style formatting
             {
                 "mode": "hybrid",
                 "filters": {
                     "filters": {
                         "type": "eq",
                         "key": "platform",
-                        "value": "openshift"
+                        "value": "openshift",
                     }
-                }
+                },
             },
             {
                 "mode": "semantic",
                 "filters": {
                     "filters": {
                         "type": "and",
                         "filters": [
                             {"type": "eq", "key": "platform", "value": "openshift"},
-                            {"type": "in", "key": "version", "value": ["4.14", "4.15", "4.16"]}
-                        ]
+                            {
+                                "type": "in",
+                                "key": "version",
+                                "value": ["4.14", "4.15", "4.16"],
+                            },
+                        ],
                     }
-                }
+                },
             },
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/models/requests.py` around lines 290 - 312, The example Solr query
literals in src/models/requests.py (the inline dict/list block containing the
hybrid/semantic/filter entries) are not Black-formatted — add the missing
trailing commas and reformat the long inline dict/list across multiple lines to
satisfy Black; specifically update the three example entries (the dict with
"mode": "hybrid", the dict with "mode": "semantic" and nested "filters" list,
and the final {"filters": {"fq": ["product:*openshift*"]}} entry) so each nested
dict/list element has proper trailing commas and line breaks, then run Black (or
apply Black formatting) to the file.
src/utils/vector_search.py (2)

61-68: ⚠️ Potential issue | 🟡 Minor

Clarify that only structured filters are extracted.

Legacy Solr filters remain under params["solr"], so the return docs should not imply every filter moves to top-level params["filters"].

Docstring update
     Args:
         solr: Optional structured Solr request (mode and filters from the API).
             - mode: Solr search mode (semantic, hybrid, lexical)
-            - filters: Solr filter payload, may contain structured metadata filters
+            - filters: Solr filter payload; nested structured metadata filters are
+              extracted, while legacy Solr filters remain under ``params["solr"]``
 
     Returns:
-        Parameter dictionary for ``vector_io.query`` with extracted filters at top level.
+        Parameter dictionary for ``vector_io.query``. Structured metadata filters are
+        placed at top level; legacy Solr filters remain under ``params["solr"]``.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/vector_search.py` around lines 61 - 68, Update the docstring for
the function that builds parameters for vector_io.query to make clear that only
structured Solr filters (the `solr["filters"]` structured payload) are extracted
into top-level `params["filters"]`, while legacy/other Solr filters remain
nested under `params["solr"]`; explicitly mention the `solr` argument,
`params["filters"]`, and `params["solr"]` to avoid implying all filters are
lifted to top level.

90-90: ⚠️ Potential issue | 🟡 Minor

Format the dict comprehension with Black.

CI reports this file would be reformatted; split the long comprehension before merging.

Black-style formatting
-            remaining_filters = {k: v for k, v in solr.filters.items() if k != "filters"}
+            remaining_filters = {
+                k: v for k, v in solr.filters.items() if k != "filters"
+            }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/vector_search.py` at line 90, The dict comprehension assigning
remaining_filters is too long for Black; reformat it into a multi-line
comprehension (wrap the comprehension in parentheses and place the for/if parts
on separate indented lines) so Black won't reformat it in CI—e.g., break the
expression that builds remaining_filters from solr.filters across multiple lines
and align the key/value pair, for-loop, and conditional onto their own lines
while preserving the same logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/unit/utils/test_vector_search.py`:
- Around line 109-151: Update the two tests
test_with_filters_and_other_solr_params and test_with_compound_filter to assert
the full structured filter payload returned by _build_query_params (not just
type/key/length); for the first test, verify params["filters"] equals the full
dict {"type":"in","key":"version","value":["4.14","4.15"]} and that
params["solr"] equals {"custom_param":"value"} and k is default, and for the
second test assert params["filters"] equals the full nested compound dict (type
"and" with its "filters" list containing the exact eq and ne dicts) and that
"solr" is absent; use the SolrVectorSearchRequest-created expected structures to
compare deep equality.

---

Duplicate comments:
In `@src/models/requests.py`:
- Around line 290-312: The example Solr query literals in src/models/requests.py
(the inline dict/list block containing the hybrid/semantic/filter entries) are
not Black-formatted — add the missing trailing commas and reformat the long
inline dict/list across multiple lines to satisfy Black; specifically update the
three example entries (the dict with "mode": "hybrid", the dict with "mode":
"semantic" and nested "filters" list, and the final {"filters": {"fq":
["product:*openshift*"]}} entry) so each nested dict/list element has proper
trailing commas and line breaks, then run Black (or apply Black formatting) to
the file.

In `@src/utils/vector_search.py`:
- Around line 61-68: Update the docstring for the function that builds
parameters for vector_io.query to make clear that only structured Solr filters
(the `solr["filters"]` structured payload) are extracted into top-level
`params["filters"]`, while legacy/other Solr filters remain nested under
`params["solr"]`; explicitly mention the `solr` argument, `params["filters"]`,
and `params["solr"]` to avoid implying all filters are lifted to top level.
- Line 90: The dict comprehension assigning remaining_filters is too long for
Black; reformat it into a multi-line comprehension (wrap the comprehension in
parentheses and place the for/if parts on separate indented lines) so Black
won't reformat it in CI—e.g., break the expression that builds remaining_filters
from solr.filters across multiple lines and align the key/value pair, for-loop,
and conditional onto their own lines while preserving the same logic.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: f394fddb-4d56-45f0-a386-7dc57aa59c34

📥 Commits

Reviewing files that changed from the base of the PR and between b6f480c and 3b5b567.

📒 Files selected for processing (3)
  • src/models/requests.py
  • src/utils/vector_search.py
  • tests/unit/utils/test_vector_search.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
  • GitHub Check: radon
  • GitHub Check: mypy
  • GitHub Check: check_dependencies
  • GitHub Check: build-pr
  • GitHub Check: Pylinter
  • GitHub Check: integration_tests (3.12)
  • GitHub Check: unit_tests (3.13)
  • GitHub Check: unit_tests (3.12)
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: E2E: library mode / ci / group 3
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: E2E: server mode / ci / group 3
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Import FastAPI dependencies with: from fastapi import APIRouter, HTTPException, Request, status, Depends
Import Llama Stack client with: from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Type aliases defined at module level for clarity
Use Final[type] as type hint for all constants
All functions require docstrings with brief descriptions
Complete type annotations for parameters and return types in functions
Use typing_extensions.Self for model validators in Pydantic models
Use modern union type syntax str | int instead of Union[str, int]
Use Optional[Type] for optional type hints
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead
Use async def for I/O operations and external API calls
Handle APIConnectionError from Llama Stack in error handling
Use standard log levels with clear purposes: debug, info, warning, error
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Use ABC for abstract base classes with @abstractmethod decorators
Use @model_validator and @field_validator for Pydantic model validation
Complete type annotations for all class attributes; use specific types, not Any
Follow Google Python docstring conventions with Parameters, Returns, Raises, and Attributes sections

Files:

  • src/utils/vector_search.py
  • tests/unit/utils/test_vector_search.py
  • src/models/requests.py
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Pydantic models extend ConfigurationBase for config, BaseModel for data models

Files:

  • src/utils/vector_search.py
  • src/models/requests.py
tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest; pytest is the standard for this project
Use pytest-mock for AsyncMock objects in tests
Use marker pytest.mark.asyncio for async tests
Unit tests require 60% coverage, integration tests 10%

Files:

  • tests/unit/utils/test_vector_search.py
🧠 Learnings (5)
📚 Learning: 2026-03-17T11:34:53.242Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 1335
File: docs/config.md:411-411
Timestamp: 2026-03-17T11:34:53.242Z
Learning: In the lightspeed-stack project (`src/models/config.py`, `docs/config.md`), the internal Solr filter `is_chunk:true` (defined as `SOLR_CHUNK_FILTER_QUERY` in `src/constants.py`) is always injected by the system for OKP searches and is intentionally hidden from users. The `chunk_filter_query` field in `OkpConfiguration` is user-facing and additive-only, but the documentation must NOT mention the internal `is_chunk:true` behavior — this is a deliberate design decision by the maintainers.

Applied to files:

  • src/utils/vector_search.py
  • tests/unit/utils/test_vector_search.py
  • src/models/requests.py
📚 Learning: 2026-02-23T14:11:46.950Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1198
File: src/utils/responses.py:184-188
Timestamp: 2026-02-23T14:11:46.950Z
Learning: The query request validator in the Responses API flow requires that `query_request.model` and `query_request.provider` must either both be specified or both be absent. The concatenated format (e.g., `model="provider/model"` with `provider=None`) is not permitted by the validator.

Applied to files:

  • src/models/requests.py
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.

Applied to files:

  • src/models/requests.py
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.

Applied to files:

  • src/models/requests.py
📚 Learning: 2026-04-15T18:54:07.540Z
Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1510
File: src/models/requests.py:769-773
Timestamp: 2026-04-15T18:54:07.540Z
Learning: In lightspeed-core/lightspeed-stack request models, keep schema-level size limits defined as inline numeric literals (e.g., Pydantic Field max_length values used for validation) rather than extracting them into constants.py. Only extract values intended as configurable runtime defaults (e.g., DEFAULT_* settings like header/file upload sizes). Do not flag inline numeric literals used directly in Pydantic Field constraints or field validators as an extraction-to-constants issue.

Applied to files:

  • src/models/requests.py
🪛 GitHub Actions: Black
src/utils/vector_search.py

[error] 1-1: Black --check failed: file would be reformatted by Black. Run 'uv tool run black --write src/utils/vector_search.py' (or 'uv tool run black --write .').

src/models/requests.py

[error] 1-1: Black --check failed: file would be reformatted by Black. Run 'uv tool run black --write src/models/requests.py' (or 'uv tool run black --write .').

🔇 Additional comments (4)
src/models/requests.py (2)

285-287: Solr filter description looks aligned.

This now documents structured comparison filters while preserving legacy filter-only compatibility.


777-779: Responses API Solr docs look good.

The request docstring now reflects structured metadata filters and legacy compatibility.

tests/unit/utils/test_vector_search.py (2)

70-107: Good coverage for legacy and simple structured filters.

These cases verify the key behavioral split: legacy fq remains under params["solr"], while structured metadata filters move to top-level params["filters"].


163-180: Mode plus legacy filter tests look good.

These keep coverage for request mode overrides and default mode behavior while preserving legacy Solr filter passthrough.

Comment on lines +109 to +151
def test_with_filters_and_other_solr_params(self) -> None:
"""Test parameters with both filters and other solr-specific params."""
solr = SolrVectorSearchRequest.model_validate(
{
"filters": {
"filters": {
"type": "in",
"key": "version",
"value": ["4.14", "4.15"],
},
"custom_param": "value",
},
},
)
params = _build_query_params(solr=solr)

assert params["solr"] == {"filter": "value"}
# Filters extracted to top-level
assert params["filters"]["type"] == "in"
assert params["filters"]["key"] == "version"
# Other params remain under solr key
assert params["solr"] == {"custom_param": "value"}
assert params["k"] == constants.SOLR_VECTOR_SEARCH_DEFAULT_K

def test_with_compound_filter(self) -> None:
"""Test parameters with compound AND filter."""
solr = SolrVectorSearchRequest.model_validate(
{
"filters": {
"filters": {
"type": "and",
"filters": [
{"type": "eq", "key": "platform", "value": "openshift"},
{"type": "ne", "key": "status", "value": "archived"},
],
},
},
},
)
params = _build_query_params(solr=solr)

assert params["filters"]["type"] == "and"
assert len(params["filters"]["filters"]) == 2
assert "solr" not in params
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Assert the full structured filter payload.

The mixed and compound tests should verify values/nested filters are preserved, not only type, key, and list length.

Strengthen assertions
         # Filters extracted to top-level
         assert params["filters"]["type"] == "in"
         assert params["filters"]["key"] == "version"
+        assert params["filters"]["value"] == ["4.14", "4.15"]
         # Other params remain under solr key
         assert params["solr"] == {"custom_param": "value"}
@@
 
         assert params["filters"]["type"] == "and"
-        assert len(params["filters"]["filters"]) == 2
+        assert params["filters"]["filters"] == [
+            {"type": "eq", "key": "platform", "value": "openshift"},
+            {"type": "ne", "key": "status", "value": "archived"},
+        ]
         assert "solr" not in params
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def test_with_filters_and_other_solr_params(self) -> None:
"""Test parameters with both filters and other solr-specific params."""
solr = SolrVectorSearchRequest.model_validate(
{
"filters": {
"filters": {
"type": "in",
"key": "version",
"value": ["4.14", "4.15"],
},
"custom_param": "value",
},
},
)
params = _build_query_params(solr=solr)
assert params["solr"] == {"filter": "value"}
# Filters extracted to top-level
assert params["filters"]["type"] == "in"
assert params["filters"]["key"] == "version"
# Other params remain under solr key
assert params["solr"] == {"custom_param": "value"}
assert params["k"] == constants.SOLR_VECTOR_SEARCH_DEFAULT_K
def test_with_compound_filter(self) -> None:
"""Test parameters with compound AND filter."""
solr = SolrVectorSearchRequest.model_validate(
{
"filters": {
"filters": {
"type": "and",
"filters": [
{"type": "eq", "key": "platform", "value": "openshift"},
{"type": "ne", "key": "status", "value": "archived"},
],
},
},
},
)
params = _build_query_params(solr=solr)
assert params["filters"]["type"] == "and"
assert len(params["filters"]["filters"]) == 2
assert "solr" not in params
def test_with_filters_and_other_solr_params(self) -> None:
"""Test parameters with both filters and other solr-specific params."""
solr = SolrVectorSearchRequest.model_validate(
{
"filters": {
"filters": {
"type": "in",
"key": "version",
"value": ["4.14", "4.15"],
},
"custom_param": "value",
},
},
)
params = _build_query_params(solr=solr)
# Filters extracted to top-level
assert params["filters"]["type"] == "in"
assert params["filters"]["key"] == "version"
assert params["filters"]["value"] == ["4.14", "4.15"]
# Other params remain under solr key
assert params["solr"] == {"custom_param": "value"}
assert params["k"] == constants.SOLR_VECTOR_SEARCH_DEFAULT_K
def test_with_compound_filter(self) -> None:
"""Test parameters with compound AND filter."""
solr = SolrVectorSearchRequest.model_validate(
{
"filters": {
"filters": {
"type": "and",
"filters": [
{"type": "eq", "key": "platform", "value": "openshift"},
{"type": "ne", "key": "status", "value": "archived"},
],
},
},
},
)
params = _build_query_params(solr=solr)
assert params["filters"]["type"] == "and"
assert params["filters"]["filters"] == [
{"type": "eq", "key": "platform", "value": "openshift"},
{"type": "ne", "key": "status", "value": "archived"},
]
assert "solr" not in params
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/utils/test_vector_search.py` around lines 109 - 151, Update the
two tests test_with_filters_and_other_solr_params and test_with_compound_filter
to assert the full structured filter payload returned by _build_query_params
(not just type/key/length); for the first test, verify params["filters"] equals
the full dict {"type":"in","key":"version","value":["4.14","4.15"]} and that
params["solr"] equals {"custom_param":"value"} and k is default, and for the
second test assert params["filters"] equals the full nested compound dict (type
"and" with its "filters" list containing the exact eq and ne dicts) and that
"solr" is absent; use the SolrVectorSearchRequest-created expected structures to
compare deep equality.

Comment thread src/models/requests.py Outdated
"filters": {
"type": "eq",
"key": "platform",
"value": "openshift"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will fail. the correct value should be the full product name:

"openshift container platform"

the filter field values are a bit aggressive.

Currently, "openshift" also does not work with the dynamic filters.

Comment thread src/models/requests.py Outdated
"filters": {
"type": "eq",
"key": "platform",
"value": "openshift"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will fail. the correct value should be the full product name:

"openshift container platform"

the filter field values are a bit aggressive.

Currently, "openshift" also does not work with the dynamic filters.

@Anxhela21
Copy link
Copy Markdown
Contributor

@anik120 PTAL left a comment and please also add documentation - we also need to update the okp_guide.md to include examples for how to use dynamic filters. And please fix failing black test.

@anik120 anik120 force-pushed the vector-io-filters branch from 3b5b567 to 4ef8fef Compare May 12, 2026 20:16
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
tests/unit/utils/test_vector_search.py (2)

132-151: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Verify the full nested filter structure.

The test checks type and list length but doesn't verify the actual nested filter objects.

🧪 Strengthen the assertion
         assert params["filters"]["type"] == "and"
-        assert len(params["filters"]["filters"]) == 2
+        assert params["filters"]["filters"] == [
+            {"type": "eq", "key": "platform", "value": "openshift"},
+            {"type": "ne", "key": "status", "value": "archived"},
+        ]
         assert "solr" not in params
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/utils/test_vector_search.py` around lines 132 - 151, The test
test_with_compound_filter currently only checks the top-level "type" and length;
update it to assert the actual nested filter objects produced by
_build_query_params when given a SolrVectorSearchRequest: after building params
call, verify params["filters"]["filters"][0] and params["filters"]["filters"][1]
match the expected dicts ({"type":"eq","key":"platform","value":"openshift"} and
{"type":"ne","key":"status","value":"archived"}) and still assert "solr" not in
params; locate these symbols in the test file (test_with_compound_filter,
SolrVectorSearchRequest, _build_query_params, params) and add the two assertions
accordingly.

109-130: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Strengthen assertion to verify the full filter value.

The test checks type and key but doesn't verify that value contains the expected list ["4.14", "4.15"].

🧪 Strengthen the assertion
         # Filters extracted to top-level
         assert params["filters"]["type"] == "in"
         assert params["filters"]["key"] == "version"
+        assert params["filters"]["value"] == ["4.14", "4.15"]
         # Other params remain under solr key
         assert params["solr"] == {"custom_param": "value"}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/utils/test_vector_search.py` around lines 109 - 130, The test
`test_with_filters_and_other_solr_params` currently asserts the extracted filter
`type` and `key` but misses validating the actual filter values; update the
assertions in this test (which uses `SolrVectorSearchRequest` and calls
`_build_query_params`) to assert that `params["filters"]["value"]` equals the
expected list ["4.14", "4.15"] while keeping the existing checks for `type`,
`key`, `solr`, and `k` (the latter referencing
`constants.SOLR_VECTOR_SEARCH_DEFAULT_K`).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@tests/unit/utils/test_vector_search.py`:
- Around line 132-151: The test test_with_compound_filter currently only checks
the top-level "type" and length; update it to assert the actual nested filter
objects produced by _build_query_params when given a SolrVectorSearchRequest:
after building params call, verify params["filters"]["filters"][0] and
params["filters"]["filters"][1] match the expected dicts
({"type":"eq","key":"platform","value":"openshift"} and
{"type":"ne","key":"status","value":"archived"}) and still assert "solr" not in
params; locate these symbols in the test file (test_with_compound_filter,
SolrVectorSearchRequest, _build_query_params, params) and add the two assertions
accordingly.
- Around line 109-130: The test `test_with_filters_and_other_solr_params`
currently asserts the extracted filter `type` and `key` but misses validating
the actual filter values; update the assertions in this test (which uses
`SolrVectorSearchRequest` and calls `_build_query_params`) to assert that
`params["filters"]["value"]` equals the expected list ["4.14", "4.15"] while
keeping the existing checks for `type`, `key`, `solr`, and `k` (the latter
referencing `constants.SOLR_VECTOR_SEARCH_DEFAULT_K`).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: bb037a90-d31d-45b7-afed-048a7bfd5156

📥 Commits

Reviewing files that changed from the base of the PR and between 3b5b567 and 4ef8fef.

📒 Files selected for processing (4)
  • docs/okp_guide.md
  • src/models/requests.py
  • src/utils/vector_search.py
  • tests/unit/utils/test_vector_search.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: E2E: server mode / ci / group 3
  • GitHub Check: E2E: server mode / ci / group 2
  • GitHub Check: E2E: library mode / ci / group 3
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
🧰 Additional context used
📓 Path-based instructions (3)
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Llama Stack imports: Use from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
All functions must have complete type annotations for parameters and return types, use modern syntax (str | int), and include descriptive docstrings
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Use async def for I/O operations and external API calls
Use standard log levels with clear purposes: debug() for diagnostic info, info() for program execution, warning() for unexpected events, error() for serious problems
All classes must have descriptive docstrings explaining purpose and use PascalCase with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Abstract classes must use ABC with @abstractmethod decorators
Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes

Files:

  • src/models/requests.py
  • src/utils/vector_search.py
src/models/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Pydantic models must use @model_validator and @field_validator for validation and complete type annotations for all attributes, avoiding Any type

Files:

  • src/models/requests.py
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Use pytest.mark.asyncio marker for async tests

Files:

  • tests/unit/utils/test_vector_search.py
🧠 Learnings (3)
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.

Applied to files:

  • src/models/requests.py
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.

Applied to files:

  • src/models/requests.py
📚 Learning: 2026-04-15T18:54:07.540Z
Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1510
File: src/models/requests.py:769-773
Timestamp: 2026-04-15T18:54:07.540Z
Learning: In lightspeed-core/lightspeed-stack request models, keep schema-level size limits defined as inline numeric literals (e.g., Pydantic Field max_length values used for validation) rather than extracting them into constants.py. Only extract values intended as configurable runtime defaults (e.g., DEFAULT_* settings like header/file upload sizes). Do not flag inline numeric literals used directly in Pydantic Field constraints or field validators as an extraction-to-constants issue.

Applied to files:

  • src/models/requests.py
🔇 Additional comments (11)
src/models/requests.py (2)

285-321: LGTM!


785-787: LGTM!

src/utils/vector_search.py (2)

63-67: LGTM!


82-99: LGTM!

tests/unit/utils/test_vector_search.py (3)

70-83: LGTM!


85-107: LGTM!


164-180: LGTM!

docs/okp_guide.md (4)

124-130: LGTM!


134-151: LGTM!


153-242: LGTM!


244-250: LGTM!

@anik120 anik120 force-pushed the vector-io-filters branch 3 times, most recently from 50d73f8 to d55e2ce Compare May 12, 2026 20:54
@anik120
Copy link
Copy Markdown
Contributor Author

anik120 commented May 12, 2026

@Anxhela21 addressed all your comments (also had to rebase on top of main). PTAL, thanks!

@anik120 anik120 force-pushed the vector-io-filters branch from d55e2ce to 5efa2cd Compare May 15, 2026 17:09
Comment thread docs/okp_guide.md Outdated
"filters": {
"type": "in",
"key": "product",
"value": ["openshift_container_platform", "ansible_automation_platform", "rhel"]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rhel is not a valid filter either, I believe it needs to be red_hat_enterprise_linux
here is the complete list: https://mimir.corp.redhat.com/rag/products/

Copy link
Copy Markdown
Contributor

@Anxhela21 Anxhela21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left one more comment. Otherwise looks good. Thank you.

LGTM

Updates the Solr vector search integration to support the new Filter API
introduced in llama-stack 0.6.0, enabling metadata-based filtering of RAG
results using comparison and compound filters.

Filter format examples:
    Simple: {"filters": {"type": "eq", "key": "platform", "value": "openshift"}}
    Compound: {"filters": {"type": "and", "filters": [...]}}

Note: This change requires lightspeed-providers with solr_vector_io filter support,
introduced in lightspeed-core/lightspeed-providers#119
@anik120 anik120 force-pushed the vector-io-filters branch from 5efa2cd to f0732e2 Compare May 18, 2026 15:07
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
src/utils/vector_search.py (1)

114-120: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix malformed _build_query_params docstring (duplicate Returns and ambiguous behavior).

On Lines 114-120, the docstring has two Returns: sections and still reads as if filters are broadly extracted to top level. Consolidate to a single Returns: block and state that only nested structured filters are lifted, while legacy filters remain under params["solr"].

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/utils/vector_search.py` around lines 114 - 120, Docstring for
_build_query_params is malformed and ambiguous; update it to have a single
Returns: section that clearly explains the function returns a dict suitable for
vector_io.query and that only nested structured filters are lifted to top-level
keys while legacy/flat filters remain under params["solr"]. Edit the
_build_query_params docstring to remove the duplicate Returns block, describe
the returned structure (top-level params plus params["solr"] for legacy
filters), and explicitly note the behavior for "mode" and "filters" extraction
so callers understand which filters are moved vs preserved.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/openapi.json`:
- Line 19419: Update the Solr provider description string for params['solr'] to
mention logical/compound operators (e.g., and, or, not) in addition to the
comparison operators (eq, ne, in, nin) so it accurately reflects that structured
metadata filters can include compound logical operators as shown in the second
example; edit the description value that currently references "Supports
structured metadata filters (eq, ne, in, nin comparison operators)" to append or
replace with wording that explicitly includes compound/logical operators like
"and", "or", "not".

In `@src/models/common/query.py`:
- Around line 85-89: The description for the Solr provider filter (the
description assigned to params['solr'] in the model) only mentions comparison
operators; update that description string to also mention compound operators
such as "and" and "or" (and that nested/compound filters are supported) so it
matches the examples later (e.g., nested filters and compound expressions).
Modify the description tuple where "Solr provider filter payload passed through
as params['solr']" is defined to explicitly list both comparison operators (eq,
ne, in, nin) and compound operators (and, or).

---

Duplicate comments:
In `@src/utils/vector_search.py`:
- Around line 114-120: Docstring for _build_query_params is malformed and
ambiguous; update it to have a single Returns: section that clearly explains the
function returns a dict suitable for vector_io.query and that only nested
structured filters are lifted to top-level keys while legacy/flat filters remain
under params["solr"]. Edit the _build_query_params docstring to remove the
duplicate Returns block, describe the returned structure (top-level params plus
params["solr"] for legacy filters), and explicitly note the behavior for "mode"
and "filters" extraction so callers understand which filters are moved vs
preserved.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6af69bce-5cf7-48de-928c-0cd114950792

📥 Commits

Reviewing files that changed from the base of the PR and between 4ef8fef and f0732e2.

📒 Files selected for processing (4)
  • docs/okp_guide.md
  • docs/openapi.json
  • src/models/common/query.py
  • src/utils/vector_search.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
  • GitHub Check: unit_tests (3.13)
  • GitHub Check: unit_tests (3.12)
  • GitHub Check: integration_tests (3.12)
  • GitHub Check: spectral
  • GitHub Check: Pylinter
  • GitHub Check: build-pr
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
  • GitHub Check: E2E: library mode / ci / group 3
  • GitHub Check: E2E: server mode / ci / group 2
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: E2E: server mode / ci / group 3
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🧰 Additional context used
📓 Path-based instructions (2)
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Llama Stack imports: Use from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
All functions must have complete type annotations for parameters and return types, use modern syntax (str | int), and include descriptive docstrings
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Use async def for I/O operations and external API calls
Use standard log levels with clear purposes: debug() for diagnostic info, info() for program execution, warning() for unexpected events, error() for serious problems
All classes must have descriptive docstrings explaining purpose and use PascalCase with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Abstract classes must use ABC with @abstractmethod decorators
Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes

Files:

  • src/utils/vector_search.py
  • src/models/common/query.py
src/models/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Pydantic models must use @model_validator and @field_validator for validation and complete type annotations for all attributes, avoiding Any type

Files:

  • src/models/common/query.py
🧠 Learnings (2)
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.

Applied to files:

  • src/models/common/query.py
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.

Applied to files:

  • src/models/common/query.py
🔇 Additional comments (2)
docs/openapi.json (1)

19421-19453: LGTM!

docs/okp_guide.md (1)

151-157: LGTM!

Also applies to: 162-278

Comment thread docs/openapi.json
],
"title": "Filters",
"description": "Solr provider filter payload passed through as params['solr'].",
"description": "Solr provider filter payload passed through as params['solr']. Supports structured metadata filters (eq, ne, in, nin comparison operators). Legacy filter-only objects (e.g. fq) are still accepted.",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial | 💤 Low value

Consider mentioning compound operators in the description.

The description lists comparison operators (eq, ne, in, nin) but the second example demonstrates a compound and filter. For completeness, consider adding logical operators to the description.

📝 Suggested description update
-                        "description": "Solr provider filter payload passed through as params['solr']. Supports structured metadata filters (eq, ne, in, nin comparison operators). Legacy filter-only objects (e.g. fq) are still accepted.",
+                        "description": "Solr provider filter payload passed through as params['solr']. Supports structured metadata filters (eq, ne, in, nin comparison operators) and compound filters (and, or). Legacy filter-only objects (e.g. fq) are still accepted.",
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"description": "Solr provider filter payload passed through as params['solr']. Supports structured metadata filters (eq, ne, in, nin comparison operators). Legacy filter-only objects (e.g. fq) are still accepted.",
"description": "Solr provider filter payload passed through as params['solr']. Supports structured metadata filters (eq, ne, in, nin comparison operators) and compound filters (and, or). Legacy filter-only objects (e.g. fq) are still accepted.",
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/openapi.json` at line 19419, Update the Solr provider description string
for params['solr'] to mention logical/compound operators (e.g., and, or, not) in
addition to the comparison operators (eq, ne, in, nin) so it accurately reflects
that structured metadata filters can include compound logical operators as shown
in the second example; edit the description value that currently references
"Supports structured metadata filters (eq, ne, in, nin comparison operators)" to
append or replace with wording that explicitly includes compound/logical
operators like "and", "or", "not".

Comment on lines +85 to +89
description=(
"Solr provider filter payload passed through as params['solr']. "
"Supports structured metadata filters (eq, ne, in, nin comparison operators). "
"Legacy filter-only objects (e.g. fq) are still accepted."
),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Include compound operators in the field description for consistency.

On Line 87, the description lists only comparison operators, but the examples (Line 99+) already document compound filters (and, nested filters). Please explicitly mention compound operators (and/or) here to avoid client confusion.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/models/common/query.py` around lines 85 - 89, The description for the
Solr provider filter (the description assigned to params['solr'] in the model)
only mentions comparison operators; update that description string to also
mention compound operators such as "and" and "or" (and that nested/compound
filters are supported) so it matches the examples later (e.g., nested filters
and compound expressions). Modify the description tuple where "Solr provider
filter payload passed through as params['solr']" is defined to explicitly list
both comparison operators (eq, ne, in, nin) and compound operators (and, or).

Copy link
Copy Markdown
Contributor

@tisnik tisnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tisnik tisnik merged commit fd4acfa into lightspeed-core:main May 18, 2026
31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants