Skip to content

LCORE-1444: Solr enrichment not working in library mode#1335

Merged
are-ces merged 1 commit into
lightspeed-core:mainfrom
are-ces:fix-solr-enrichment
Mar 18, 2026
Merged

LCORE-1444: Solr enrichment not working in library mode#1335
are-ces merged 1 commit into
lightspeed-core:mainfrom
are-ces:fix-solr-enrichment

Conversation

@are-ces

@are-ces are-ces commented Mar 17, 2026

Copy link
Copy Markdown
Contributor

Description

Changes:

  • The solr enrichment script runs also when running lightspeed-stack in library mode
  • Hid is_chunk:true filtering attribute from the user

The is_chunk:true value should always be included in the OKP Solr filters, thus should not be the responsibility of the user to provide this filter.

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement
  • Benchmarks improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: Claude

Related Tickets & Documents

  • Related Issue # LCORE-1444
  • Closes # LCORE-1444

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  1. Configure OKP in the lightspeed config
  2. Start Solr on the right port 8983
  3. Run in library mode
  4. Test a query, make sure that Solr chunks are returned

I performed such testing, configuring also BYOK rag

Summary by CodeRabbit

Release Notes

  • New Features

    • Chunk filtering for OKP searches is now optional and customizable. Users can specify custom Solr boolean syntax filters via configuration, or leave it empty to rely on default filtering behavior.
  • Documentation

    • Updated configuration and guide documentation to reflect changes in chunk filtering behavior and provide examples of custom filter syntax.

@coderabbitai

coderabbitai Bot commented Mar 17, 2026

Copy link
Copy Markdown
Contributor

Walkthrough

This PR refactors chunk filtering logic in the OKP (Okapi) RAG configuration. The chunk_filter_query field changes from a required field with default "is_chunk:true" to optional with default None, with a new internal constant managing the default filter. Configuration enrichment logic is restructured to derive OKP context from RAG configuration instead of a separate flag.

Changes

Cohort / File(s) Summary
Configuration Model
src/models/config.py, src/constants.py
chunk_filter_query in OkpConfiguration changed from required string defaulting to "is_chunk:true" to optional string with None default. New constant SOLR_CHUNK_FILTER_QUERY added for internal filter management.
Documentation
docs/config.md, docs/rag_guide.md, examples/lightspeed-stack-byok-okp-rag.yaml
Updated descriptions and examples to reflect chunk_filter_query as an optional additive filter rather than a default constraint. Removed references to static is_chunk:true default and switched to generic filter syntax examples.
Enrichment Logic
src/llama_stack_configuration.py
enrich_solr signature changed from (ls_config, solr_config) to (ls_config, rag_config, okp_config). OKP enablement now derived from RAG config inline/tool lists. Chunk filter derivation moved from solr config to okp config, with internal is_chunk:true filter applied by default when no user filter provided.
Client Configuration
src/client.py
_enrich_library_config method signature simplified: removed byok_rag parameter. Enrichment now always applied unconditionally, sourcing BYOK RAG and OKP data from global configuration instead of parameters.
Test Updates
tests/unit/models/config/test_dump_configuration.py, tests/unit/models/config/test_rag_configuration.py, tests/unit/test_llama_stack_configuration.py
Test expectations updated to expect chunk_filter_query as None by default. New tests added for default chunk filter behavior and user-provided filter conjunction logic. Existing enrichment tests refactored to use OKP RAG configuration structure.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title clearly and specifically describes the main bug fix: ensuring Solr enrichment works in library mode, which directly addresses the core issue being resolved.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

CodeRabbit can approve the review once all CodeRabbit's comments are resolved.

Enable the reviews.request_changes_workflow setting to automatically approve the review once all CodeRabbit's comments are resolved.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/llama_stack_configuration.py (1)

418-468: ⚠️ Potential issue | 🟠 Major

Existing Solr providers are never updated with the new chunk filter.

If okp_solr already exists, the code skips provider creation and never writes the new chunk_filter_query, so internal/default filtering can be missed.

Proposed fix
     existing_providers = [
         p.get("provider_id") for p in ls_config["providers"]["vector_io"]
     ]
-    if constants.SOLR_PROVIDER_ID not in existing_providers:
+    if constants.SOLR_PROVIDER_ID not in existing_providers:
@@
         logger.info("Added OKP provider to providers/vector_io")
+    else:
+        for provider in ls_config["providers"]["vector_io"]:
+            if provider.get("provider_id") == constants.SOLR_PROVIDER_ID:
+                provider.setdefault("config", {})
+                provider["config"].setdefault("chunk_window_config", {})
+                provider["config"]["chunk_window_config"][
+                    "chunk_filter_query"
+                ] = chunk_filter_query
+                logger.info("Updated OKP provider chunk_filter_query")
+                break
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/llama_stack_configuration.py` around lines 418 - 468, The code only
appends a Solr provider when constants.SOLR_PROVIDER_ID is missing, so existing
providers never get their chunk_filter_query updated; modify the branch that
checks existing_providers (and the else branch when constants.SOLR_PROVIDER_ID
is present) to locate the existing provider object in
ls_config["providers"]["vector_io"] whose "provider_id" ==
constants.SOLR_PROVIDER_ID and update its
["config"]["chunk_window_config"]["chunk_filter_query"] = chunk_filter_query
(and create the nested dicts if missing), and emit a logger.info/log.debug
message indicating the provider was updated; keep the creation flow for the new
provider unchanged.
🧹 Nitpick comments (2)
tests/unit/models/config/test_rag_configuration.py (1)

83-87: Add a regression test that rejects user-supplied is_chunk:*.

Given the new contract, add a negative test to prevent future regressions where users can inject the reserved internal clause.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/models/config/test_rag_configuration.py` around lines 83 - 87, Add
a negative regression test that ensures OkpConfiguration rejects user-supplied
internal clause "is_chunk:*": create a new test (e.g.,
test_reject_is_chunk_clause) that attempts to construct
OkpConfiguration(chunk_filter_query="is_chunk:*" or "foo AND is_chunk:true") and
asserts that this raises the validation error (ValueError or the specific
exception raised by the configuration validation). Reference the existing
test_custom_chunk_filter_query and the OkpConfiguration class/constructor to
locate where to add the test.
tests/unit/test_llama_stack_configuration.py (1)

501-525: Good test coverage for the new chunk filter behavior.

The tests correctly validate the core functionality: default internal filter application and AND conjunction with user-provided filters.

Consider adding an edge case test to verify behavior when a user inadvertently includes is_chunk:true in their filter. Depending on the implementation, this could result in a duplicated filter like "is_chunk:true AND is_chunk:true".

📝 Optional: Additional edge case test
def test_enrich_solr_user_chunk_filter_query_with_duplicate_internal_filter() -> None:
    """Test enrich_solr handles user filter that already contains is_chunk:true."""
    ls_config: dict[str, Any] = {}
    enrich_solr(ls_config, _OKP_RAG_CONFIG, {"chunk_filter_query": "is_chunk:true AND product:ansible"})

    provider = next(
        p for p in ls_config["providers"]["vector_io"] if p["provider_id"] == "okp_solr"
    )
    # Verify expected behavior - document whether duplication is acceptable or if it should be deduplicated
    chunk_filter = provider["config"]["chunk_window_config"]["chunk_filter_query"]
    assert "is_chunk:true" in chunk_filter
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/test_llama_stack_configuration.py` around lines 501 - 525, Add an
edge-case unit test named
test_enrich_solr_user_chunk_filter_query_with_duplicate_internal_filter that
calls enrich_solr with a user chunk_filter_query that already contains
"is_chunk:true" (e.g., "is_chunk:true AND product:ansible"), then locate the
okp_solr provider and assert the
provider["config"]["chunk_window_config"]["chunk_filter_query"] contains
"is_chunk:true" but does not contain a duplicated "is_chunk:true AND
is_chunk:true" sequence; this documents expected behavior and prevents
regressions in enrich_solr's chunk filter composition.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/config.md`:
- Line 411: Update the docs for the chunk_filter_query field to explicitly state
that the system always injects the internal filter is_chunk:true and that
chunk_filter_query is additive-only (user-provided conditions are ANDed with the
injected is_chunk:true filter); mention the field name chunk_filter_query and
the internal flag is_chunk:true so readers know the system-enforced behavior and
that they cannot override or remove is_chunk:true via chunk_filter_query.

In `@docs/openapi.json`:
- Line 4797: The GET and POST operations for the /a2a path share the same
operationId ("handle_a2a_jsonrpc_a2a_post"), which breaks OpenAPI consumers;
update the GET operation's operationId to a unique name (for example
"handle_a2a_jsonrpc_a2a_get") so each operationId is distinct, ensuring you
change the operationId field on the /a2a GET entry rather than the POST entry
and keep the rest of the GET operation intact.

In `@examples/lightspeed-stack-byok-okp-rag.yaml`:
- Around line 69-71: Add a short explicit note next to the chunk_filter_query
example clarifying that the system always appends is_chunk:true internally, so
users should only supply additional Solr filters and must not include any
is_chunk:* clause themselves; reference the chunk_filter_query config key and
the is_chunk:true field so reviewers can locate the example to update.

In `@src/client.py`:
- Around line 79-91: yaml.safe_load can return None (or non-dict) which causes
enrich_byok_rag and enrich_solr to fail; after loading ls_config, check if it's
a mapping (isinstance(ls_config, dict)), and if not log a warning and replace it
with an empty dict (ls_config = {}) before calling enrich_byok_rag and
enrich_solr so the enrichment helpers always receive a dict; keep existing
logger usage and the same variables (ls_config, configuration, enrich_byok_rag,
enrich_solr).

In `@src/llama_stack_configuration.py`:
- Around line 404-408: The composed Solr chunk_filter_query currently
concatenates constants.SOLR_CHUNK_FILTER_QUERY and user_filter without grouping,
which breaks boolean grouping when user_filter contains OR; update the
composition of chunk_filter_query (the variable chunk_filter_query where
constants.SOLR_CHUNK_FILTER_QUERY and user_filter are combined) to wrap
user_filter in parentheses before the AND, e.g. use "(" + user_filter + ")" or
an f-string that places user_filter inside parentheses so the entire
user-provided clause is grouped: keep the same conditional logic (if user_filter
else constants.SOLR_CHUNK_FILTER_QUERY) but ensure user_filter is parenthesized.

In `@src/models/config.py`:
- Around line 1743-1748: The chunk_filter_query Field currently allows
user-supplied predicates like "is_chunk:..." which conflicts with the internal
always-on filter; add a validation step for chunk_filter_query (e.g., a pydantic
`@validator` for the chunk_filter_query field, name it
validate_chunk_filter_query) that rejects any value containing the reserved
predicate "is_chunk" (case-insensitive, e.g. detect patterns like
"is_chunk\s*:") and raise a ValueError with a clear message asking users to
remove is_chunk predicates; ensure the validator returns the original value when
valid.

---

Outside diff comments:
In `@src/llama_stack_configuration.py`:
- Around line 418-468: The code only appends a Solr provider when
constants.SOLR_PROVIDER_ID is missing, so existing providers never get their
chunk_filter_query updated; modify the branch that checks existing_providers
(and the else branch when constants.SOLR_PROVIDER_ID is present) to locate the
existing provider object in ls_config["providers"]["vector_io"] whose
"provider_id" == constants.SOLR_PROVIDER_ID and update its
["config"]["chunk_window_config"]["chunk_filter_query"] = chunk_filter_query
(and create the nested dicts if missing), and emit a logger.info/log.debug
message indicating the provider was updated; keep the creation flow for the new
provider unchanged.

---

Nitpick comments:
In `@tests/unit/models/config/test_rag_configuration.py`:
- Around line 83-87: Add a negative regression test that ensures
OkpConfiguration rejects user-supplied internal clause "is_chunk:*": create a
new test (e.g., test_reject_is_chunk_clause) that attempts to construct
OkpConfiguration(chunk_filter_query="is_chunk:*" or "foo AND is_chunk:true") and
asserts that this raises the validation error (ValueError or the specific
exception raised by the configuration validation). Reference the existing
test_custom_chunk_filter_query and the OkpConfiguration class/constructor to
locate where to add the test.

In `@tests/unit/test_llama_stack_configuration.py`:
- Around line 501-525: Add an edge-case unit test named
test_enrich_solr_user_chunk_filter_query_with_duplicate_internal_filter that
calls enrich_solr with a user chunk_filter_query that already contains
"is_chunk:true" (e.g., "is_chunk:true AND product:ansible"), then locate the
okp_solr provider and assert the
provider["config"]["chunk_window_config"]["chunk_filter_query"] contains
"is_chunk:true" but does not contain a duplicated "is_chunk:true AND
is_chunk:true" sequence; this documents expected behavior and prevents
regressions in enrich_solr's chunk filter composition.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: bb781fd9-73b3-4fb1-863b-8e42ae90bbdb

📥 Commits

Reviewing files that changed from the base of the PR and between a491022 and b9137bc.

📒 Files selected for processing (12)
  • docs/config.md
  • docs/openapi.json
  • docs/openapi.md
  • docs/rag_guide.md
  • examples/lightspeed-stack-byok-okp-rag.yaml
  • src/client.py
  • src/constants.py
  • src/llama_stack_configuration.py
  • src/models/config.py
  • tests/unit/models/config/test_dump_configuration.py
  • tests/unit/models/config/test_rag_configuration.py
  • tests/unit/test_llama_stack_configuration.py

Comment thread docs/config.md
Comment thread docs/openapi.json Outdated
Comment thread examples/lightspeed-stack-byok-okp-rag.yaml
Comment thread src/client.py
Comment on lines 79 to +91
with open(input_config_path, "r", encoding="utf-8") as f:
ls_config = yaml.safe_load(f)
except (OSError, yaml.YAMLError) as e:
logger.warning("Failed to read llama-stack config: %s", e)
return input_config_path

enrich_byok_rag(ls_config, byok_rag)
config = configuration.configuration

# Enrichment: BYOK RAG
enrich_byok_rag(ls_config, [b.model_dump() for b in config.byok_rag])

# Enrichment: Solr - enabled when "okp" appears in either inline or tool list
enrich_solr(ls_config, config.rag.model_dump(), config.okp.model_dump())

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

What does yaml.safe_load from PyYAML return for an empty YAML document?

💡 Result:

For an empty YAML stream/document (e.g., "" or a file containing only whitespace/comments), yaml.safe_load(...) returns None. [1], [2]


🏁 Script executed:

# Find enrichment helper functions
fd "\.py$" src/ | xargs grep -l "def enrich_byok_rag\|def enrich_solr"

Repository: lightspeed-core/lightspeed-stack

Length of output: 108


🏁 Script executed:

# Get the content around the enrichment calls to understand context
rg "def enrich_byok_rag|def enrich_solr" -A 15

Repository: lightspeed-core/lightspeed-stack

Length of output: 2310


🏁 Script executed:

# Check if ls_config is ever checked for None elsewhere in the file
rg "ls_config" src/client.py -B 2 -A 2

Repository: lightspeed-core/lightspeed-stack

Length of output: 924


Guard against non-dict YAML payloads before enrichment.

yaml.safe_load returns None for empty or comment-only YAML; enrichment helpers expect a mapping and will crash with TypeError when accessing dict operations on None.

Proposed fix
         try:
             with open(input_config_path, "r", encoding="utf-8") as f:
                 ls_config = yaml.safe_load(f)
         except (OSError, yaml.YAMLError) as e:
             logger.warning("Failed to read llama-stack config: %s", e)
             return input_config_path
+        if not isinstance(ls_config, dict):
+            logger.warning(
+                "Invalid llama-stack config format in %s: expected mapping at root",
+                input_config_path,
+            )
+            return input_config_path
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/client.py` around lines 79 - 91, yaml.safe_load can return None (or
non-dict) which causes enrich_byok_rag and enrich_solr to fail; after loading
ls_config, check if it's a mapping (isinstance(ls_config, dict)), and if not log
a warning and replace it with an empty dict (ls_config = {}) before calling
enrich_byok_rag and enrich_solr so the enrichment helpers always receive a dict;
keep existing logger usage and the same variables (ls_config, configuration,
enrich_byok_rag, enrich_solr).

Comment thread src/llama_stack_configuration.py
Comment thread src/models/config.py

@tisnik tisnik left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not update openapi.md, it's generation is broken now. Simply remove it from the PR

- The solr enrichment script runs also when running lightspeed-stack in library mode
- Hid is_chunk:true filtering attribute from the user
@are-ces are-ces force-pushed the fix-solr-enrichment branch from b9137bc to cf588c1 Compare March 17, 2026 14:49

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
src/models/config.py (1)

1743-1748: ⚠️ Potential issue | 🟠 Major

Reject reserved is_chunk:* predicates in chunk_filter_query.

The field is now hidden from users in the schema, but it still accepts user-supplied is_chunk:* clauses. That breaks the additive-only contract and lets callers override or contradict the system-managed chunk filter.

Proposed fix
 class OkpConfiguration(ConfigurationBase):
@@
     chunk_filter_query: Optional[str] = Field(
         default=None,
         title="OKP chunk filter query",
         description="Additional OKP filter query applied to every OKP search request. "
         "Use Solr boolean syntax, e.g. 'product:ansible AND product:*openshift*'.",
     )
+
+    `@field_validator`("chunk_filter_query")
+    `@classmethod`
+    def validate_chunk_filter_query(cls, value: Optional[str]) -> Optional[str]:
+        if value is None:
+            return value
+        if re.search(r"\bis_chunk\s*:", value, flags=re.IGNORECASE):
+            raise ValueError(
+                "Do not include 'is_chunk:*' in okp.chunk_filter_query; it is applied internally."
+            )
+        return value

Based on learnings, is_chunk:true is always injected internally and chunk_filter_query is intentionally additive-only and hidden from users.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/models/config.py` around lines 1743 - 1748, The chunk_filter_query field
currently allows user-supplied "is_chunk:*" predicates which must be prohibited;
add a Pydantic validator for chunk_filter_query in the Config model (use the
existing Field declaration for chunk_filter_query as the identifier) that
inspects the string (case-insensitive) and raises a ValueError if any "is_chunk"
predicate is present (e.g. match patterns like (?i)\bis_chunk\s*:), so user
input cannot include or override is_chunk:true/false; keep the field hidden in
the schema as-is and provide a clear error message when validation fails.
src/client.py (1)

76-91: ⚠️ Potential issue | 🟠 Major

Guard non-mapping YAML roots before calling the enrichment helpers.

PyYAML documents that safe_load() returns None when the stream has no documents. An empty or comment-only library config will therefore hit enrich_byok_rag() / enrich_solr() as a non-dict and fail on the first mapping operation instead of following the existing fallback path. (pyyaml.org)

Proposed fix
         try:
             with open(input_config_path, "r", encoding="utf-8") as f:
                 ls_config = yaml.safe_load(f)
         except (OSError, yaml.YAMLError) as e:
             logger.warning("Failed to read llama-stack config: %s", e)
             return input_config_path
+
+        if not isinstance(ls_config, dict):
+            logger.warning(
+                "Invalid llama-stack config format in %s: expected mapping at root",
+                input_config_path,
+            )
+            return input_config_path
 
         config = configuration.configuration
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/client.py` around lines 76 - 91, The loaded YAML root (ls_config from
yaml.safe_load) may be None or a non-mapping, so before calling enrich_byok_rag
or enrich_solr in _enrich_library_config, guard against non-dict roots (e.g., if
ls_config is None or not isinstance(ls_config, dict)) and follow the existing
fallback path (log a warning and return input_config_path or otherwise avoid
calling enrich_byok_rag/enrich_solr). Locate _enrich_library_config and add the
check immediately after safe_load and before enrich_byok_rag/enrich_solr to
prevent passing a non-mapping to those helpers.
src/llama_stack_configuration.py (1)

403-408: ⚠️ Potential issue | 🟠 Major

Wrap the user clause before AND-ing it into the Solr filter.

Solr's standard parser uses parentheses to control boolean logic, and Lucene's parser docs likewise call out parentheses as the way to set precedence inside boolean clauses. Without grouping user_filter, any OR inside it can escape the intended AND (...) scope and broaden the final chunk filter. (solr.apache.org)

Proposed fix
     chunk_filter_query = (
-        f"{constants.SOLR_CHUNK_FILTER_QUERY} AND {user_filter}"
+        f"{constants.SOLR_CHUNK_FILTER_QUERY} AND ({user_filter})"
         if user_filter
         else constants.SOLR_CHUNK_FILTER_QUERY
     )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/llama_stack_configuration.py` around lines 403 - 408, The current
construction of chunk_filter_query concatenates
constants.SOLR_CHUNK_FILTER_QUERY and user_filter directly, which allows any ORs
in user_filter to escape the intended AND scope; update the logic that builds
chunk_filter_query (where user_filter and constants.SOLR_CHUNK_FILTER_QUERY are
used) to wrap the user_filter in parentheses before joining (i.e., use "(" +
user_filter + ")" when user_filter is present) so the resulting Solr filter
preserves correct boolean precedence.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/llama_stack_configuration.py`:
- Line 560: The call enrich_solr(ls_config, config.get("rag", {}),
config.get("okp", {})) can pass None if the YAML explicitly sets rag or okp to
null; normalize those values before calling enrich_solr by replacing
config.get("rag", {}) and config.get("okp", {}) with expressions that coerce
falsy/None to empty dicts (e.g., rag = config.get("rag") or {}; okp =
config.get("okp") or {}) and then call enrich_solr(ls_config, rag, okp) so
enrich_solr always receives dicts and avoids AttributeError.

---

Duplicate comments:
In `@src/client.py`:
- Around line 76-91: The loaded YAML root (ls_config from yaml.safe_load) may be
None or a non-mapping, so before calling enrich_byok_rag or enrich_solr in
_enrich_library_config, guard against non-dict roots (e.g., if ls_config is None
or not isinstance(ls_config, dict)) and follow the existing fallback path (log a
warning and return input_config_path or otherwise avoid calling
enrich_byok_rag/enrich_solr). Locate _enrich_library_config and add the check
immediately after safe_load and before enrich_byok_rag/enrich_solr to prevent
passing a non-mapping to those helpers.

In `@src/llama_stack_configuration.py`:
- Around line 403-408: The current construction of chunk_filter_query
concatenates constants.SOLR_CHUNK_FILTER_QUERY and user_filter directly, which
allows any ORs in user_filter to escape the intended AND scope; update the logic
that builds chunk_filter_query (where user_filter and
constants.SOLR_CHUNK_FILTER_QUERY are used) to wrap the user_filter in
parentheses before joining (i.e., use "(" + user_filter + ")" when user_filter
is present) so the resulting Solr filter preserves correct boolean precedence.

In `@src/models/config.py`:
- Around line 1743-1748: The chunk_filter_query field currently allows
user-supplied "is_chunk:*" predicates which must be prohibited; add a Pydantic
validator for chunk_filter_query in the Config model (use the existing Field
declaration for chunk_filter_query as the identifier) that inspects the string
(case-insensitive) and raises a ValueError if any "is_chunk" predicate is
present (e.g. match patterns like (?i)\bis_chunk\s*:), so user input cannot
include or override is_chunk:true/false; keep the field hidden in the schema
as-is and provide a clear error message when validation fails.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b0339113-5cf3-41b1-b966-80bab1497113

📥 Commits

Reviewing files that changed from the base of the PR and between b9137bc and cf588c1.

📒 Files selected for processing (10)
  • docs/config.md
  • docs/rag_guide.md
  • examples/lightspeed-stack-byok-okp-rag.yaml
  • src/client.py
  • src/constants.py
  • src/llama_stack_configuration.py
  • src/models/config.py
  • tests/unit/models/config/test_dump_configuration.py
  • tests/unit/models/config/test_rag_configuration.py
  • tests/unit/test_llama_stack_configuration.py
🚧 Files skipped from review as they are similar to previous changes (7)
  • examples/lightspeed-stack-byok-okp-rag.yaml
  • tests/unit/models/config/test_dump_configuration.py
  • docs/config.md
  • docs/rag_guide.md
  • src/constants.py
  • tests/unit/models/config/test_rag_configuration.py
  • tests/unit/test_llama_stack_configuration.py

enrich_solr(
ls_config, {"enabled": okp_enabled, "chunk_filter_query": chunk_filter_query}
)
enrich_solr(ls_config, config.get("rag", {}), config.get("okp", {}))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Normalize nullable rag / okp sections before passing them down.

dict.get(key, {}) only uses the fallback when the key is missing. If the YAML explicitly sets rag or okp to null, this call passes None into enrich_solr(), and the first .get() there raises AttributeError.

Proposed fix
-    enrich_solr(ls_config, config.get("rag", {}), config.get("okp", {}))
+    enrich_solr(ls_config, config.get("rag") or {}, config.get("okp") or {})
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
enrich_solr(ls_config, config.get("rag", {}), config.get("okp", {}))
enrich_solr(ls_config, config.get("rag") or {}, config.get("okp") or {})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/llama_stack_configuration.py` at line 560, The call
enrich_solr(ls_config, config.get("rag", {}), config.get("okp", {})) can pass
None if the YAML explicitly sets rag or okp to null; normalize those values
before calling enrich_solr by replacing config.get("rag", {}) and
config.get("okp", {}) with expressions that coerce falsy/None to empty dicts
(e.g., rag = config.get("rag") or {}; okp = config.get("okp") or {}) and then
call enrich_solr(ls_config, rag, okp) so enrich_solr always receives dicts and
avoids AttributeError.

@tisnik tisnik left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tisnik tisnik requested review from Jared-Sprague and mwcz March 17, 2026 17:18
@Jared-Sprague

Copy link
Copy Markdown
Contributor

LGTM!

@are-ces are-ces merged commit 55d38af into lightspeed-core:main Mar 18, 2026
24 checks passed
@are-ces are-ces mentioned this pull request Mar 18, 2026
19 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants