Skip to content

LCORE-2310: Added non-streaming agent query utilities#1880

Merged
tisnik merged 2 commits into
lightspeed-core:mainfrom
asimurka:agent_query_utils
Jun 10, 2026
Merged

LCORE-2310: Added non-streaming agent query utilities#1880
tisnik merged 2 commits into
lightspeed-core:mainfrom
asimurka:agent_query_utils

Conversation

@asimurka

@asimurka asimurka commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Description

Adds utility functions for non-streaming query served by LLS pydantic-ai agent provider. For now, utilities are disconnected from query handler - that will be added in a separate PR.

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement
  • Benchmarks improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: Cursor

Related Tickets & Documents

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

  • New Features

    • Improved mapping of model/provider failures into user-facing HTTP errors
    • Turn summaries now include LLM text, tool calls/results, provider response IDs, and token usage
    • Token and call metrics recorded by provider, model, and endpoint
    • Moderation refusals are persisted into conversations and return a minimal summary
  • Tests

    • Unit tests for finish-reason mapping, token extraction, summary construction, moderation handling, and error-to-HTTP conversions

@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Walkthrough

Adds non-streaming agent helpers to map inference errors to LCS error responses, extract token usage, build TurnSummary from agent runs (including tool calls), and orchestrate non-streaming agent execution with moderation handling. Unit tests exercise success, moderation-block, and error mappings.

Changes

Non-streaming agent query helpers

Layer / File(s) Summary
Error handling and finish-reason resolution
src/utils/agents/query.py, tests/unit/utils/agents/test_query.py
AgentFinishReason enum and mapping functions translate pydantic-ai AgentRunError, Llama Stack APIStatusError/APIConnectionError, and runtime errors into AbstractErrorResponse variants; finish-reason extraction handles stop, length, content_filter, missing reasons, and provider cancelled.
Token usage and turn summary aggregation
src/utils/agents/query.py, tests/unit/utils/agents/test_query.py
extract_agent_token_usage converts RunUsage into TokenCounter and records provider/model/endpoint metrics. build_turn_summary_from_agent_run aggregates LLM text and native tool call/return parts into TurnSummary, attaches provider response id and token usage.
Agent response orchestration
src/utils/agents/query.py, tests/unit/utils/agents/test_query.py
retrieve_agent_response persists moderation refusal when blocked and returns a minimal summary; otherwise runs the agent non-streaming, maps provider/runtime failures to HTTPException, derives vector-store/RAG ids from tools/config, and returns the complete TurnSummary.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • jrobertboos
  • tisnik
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly summarizes the main change: adding non-streaming agent query utility functions, which aligns with the changeset that introduces 328 lines of new agent utilities and supporting tests.
Docstring Coverage ✅ Passed Docstring coverage is 93.10% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
✨ Simplify code
  • Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@asimurka asimurka requested review from jrobertboos and tisnik June 9, 2026 07:26
@asimurka asimurka mentioned this pull request Jun 9, 2026
19 tasks

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/utils/agents/query.py`:
- Around line 298-300: Update the Raises section of the docstring in
src/utils/agents/query.py to use clear, grammatical wording: replace the phrase
"On moderation is not applicable" with something like "HTTPException: If
moderation is not applicable for the input" and keep the second clause as
"HTTPException: on agent or provider failure" (or combine into a single line
that reads "HTTPException: If moderation is not applicable for the input, or on
agent/provider failure") so the meaning is explicit and grammatical; edit the
docstring where the Raises block mentions HTTPException to reflect this clearer
phrasing.
- Around line 83-98: Update the docstring to match the implementation: clarify
that non-context-length RuntimeError exceptions are converted to a generic
internal error response (InternalServerErrorResponse.generic()) rather than
re-raised. In the exception-handling function that checks AgentRunError,
APIStatusError, APIConnectionError and RuntimeError via
is_context_length_error(), change the docstring text to state that
non-context-length RuntimeError values return
InternalServerErrorResponse.generic() (or list the actual behavior), referencing
the RuntimeError branch and InternalServerErrorResponse.generic() so future
readers see the implemented behavior.
- Around line 149-152: In get_agent_finish_reason, avoid directly constructing
AgentFinishReason(response.finish_reason) because response.finish_reason can be
"tool_call" (or other unknown values) which will raise ValueError; update the
function to either explicitly check for "tool_call" on response.finish_reason
and return AgentFinishReason.ERROR, or wrap the enum conversion in a try/except
ValueError that returns AgentFinishReason.ERROR as a safe fallback, referencing
response.finish_reason and AgentFinishReason to locate the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 193e0be0-68c0-4c67-b19f-e11b7be241e0

📥 Commits

Reviewing files that changed from the base of the PR and between aa29526 and 0386bad.

📒 Files selected for processing (2)
  • src/utils/agents/query.py
  • tests/unit/utils/agents/test_query.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
  • GitHub Check: integration_tests (3.12)
  • GitHub Check: mypy
  • GitHub Check: Pyright
  • GitHub Check: spectral
  • GitHub Check: unit_tests (3.13)
  • GitHub Check: build-pr
  • GitHub Check: unit_tests (3.12)
  • GitHub Check: Pylinter
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
  • GitHub Check: E2E: library mode / ci / group 3
  • GitHub Check: E2E: server mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 3
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🧰 Additional context used
📓 Path-based instructions (2)
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Use pytest.mark.asyncio marker for async tests

Files:

  • tests/unit/utils/agents/test_query.py
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Llama Stack imports: Use from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
All functions must have complete type annotations for parameters and return types, use modern syntax (str | int), and include descriptive docstrings
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Use async def for I/O operations and external API calls
Use standard log levels with clear purposes: debug() for diagnostic info, info() for program execution, warning() for unexpected events, error() for serious problems
All classes must have descriptive docstrings explaining purpose and use PascalCase with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Abstract classes must use ABC with @abstractmethod decorators
Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes

Files:

  • src/utils/agents/query.py
🔇 Additional comments (12)
src/utils/agents/query.py (7)

1-57: LGTM!


60-68: LGTM!


101-135: LGTM!


154-182: LGTM!


184-219: LGTM!


222-276: LGTM!


315-315: Avoid unsafe cast: ensure responses_params.input is str before cast(str, responses_params.input)
ResponseInput is str | list[ResponseItem] (src/models/common/responses/types.py), so cast(str, ...) is only safe if this non-streaming path guarantees string input earlier; add an isinstance(..., str) guard (or normalize list input) before calling await agent.run(...).

tests/unit/utils/agents/test_query.py (5)

1-131: LGTM!


133-181: LGTM!


183-243: LGTM!


358-487: LGTM!


276-290: Ensure the test actually exercises ModelResponse.native_tool_calls: build_turn_summary_from_agent_run iterates message.native_tool_calls (src/utils/agents/query.py lines 262-264), but the test builds ModelResponse by placing NativeToolCallPart/NativeToolReturnPart into parts (tests/unit/utils/agents/test_query.py lines 276-290). This only tests the native-tools path if ModelResponse.native_tool_calls is populated from parts; otherwise, the test should construct/set native_tool_calls explicitly (or assert behavior that depends on native-tool processing).

Comment thread src/utils/agents/query.py Outdated
Comment thread src/utils/agents/query.py
Comment on lines +149 to +152
if response.finish_reason is None:
return AgentFinishReason.ERROR
return AgentFinishReason(response.finish_reason)

@coderabbitai coderabbitai Bot Jun 9, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check pydantic-ai FinishReason definition to confirm possible values
rg -n "FinishReason" --type py -A 5

Repository: lightspeed-core/lightspeed-stack

Length of output: 7310


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Show the relevant enum + get_agent_finish_reason implementation
sed -n '40,180p' src/utils/agents/query.py

# Show the unit tests for get_agent_finish_reason
sed -n '120,220p' tests/unit/utils/agents/test_query.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 8661


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Locate imports/definitions around ModelResponse / FinishReason usage
rg -n "from .* import .*ModelResponse|ModelResponse" src/utils/agents/query.py src/utils -S

# Find how finish_reason is typed/used elsewhere
rg -n "\"tool_calls\"|\"tool_output\"|finish_reason\s*:" src -S

# Show utils.pydantic_ai module (likely where ModelResponse/FinishReason are re-exported)
fd -a -t f "pydantic_ai.py" src/utils -x echo
fd -a -t f "pydantic_ai*" src/utils -x echo

Repository: lightspeed-core/lightspeed-stack

Length of output: 1254


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect utils/pydantic_ai.py for re-exports/types that define ModelResponse / FinishReason
sed -n '1,220p' src/utils/pydantic_ai.py

# Find any references to "finish_reason" across the repo
rg -n "finish_reason" src -S

# Inspect pydantic-ai ModelResponse type hints only if vendored/aliased in this repo;
# otherwise, we can at least see whether code ever expects tool_calls/tool_output.
rg -n "\"tool_calls\"|\"tool_output\"" src -S

Repository: lightspeed-core/lightspeed-stack

Length of output: 5568


🏁 Script executed:

#!/bin/bash
set -euo pipefail

python3 - <<'PY'
import inspect
import typing
try:
    import pydantic_ai
    from pydantic_ai.messages import ModelResponse
except Exception as e:
    print("Failed to import pydantic_ai / ModelResponse:", e)
    raise

print("pydantic_ai version:", getattr(pydantic_ai, "__version__", "unknown"))
print("ModelResponse:", ModelResponse)

# Print annotated fields for ModelResponse
ann = getattr(ModelResponse, "__annotations__", None)
print("ModelResponse.__annotations__ keys:", sorted(list(ann.keys())) if ann else None)

# Try to find finish_reason field annotation
finish_ann = None
if ann and "finish_reason" in ann:
    finish_ann = ann["finish_reason"]
print("finish_reason annotation:", finish_ann)

# Try to resolve/inspect any enums/literals involved in finish_reason
try:
    from pydantic_ai.messages import FinishReason
    print("FinishReason:", FinishReason)
    print("FinishReason members/values:")
    if hasattr(FinishReason, "__members__"):
        print(list(FinishReason.__members__.items())[:50])
    else:
        print("Non-enum type:", FinishReason)
except Exception as e:
    print("Could not import FinishReason from pydantic_ai.messages:", e)

# If finish_reason is a Literal, show its args
if finish_ann is not None:
    origin = typing.get_origin(finish_ann)
    args = typing.get_args(finish_ann)
    print("finish_reason origin:", origin)
    print("finish_reason args (first 30):", list(args)[:30])
PY

Repository: lightspeed-core/lightspeed-stack

Length of output: 294


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Find dependency pins for pydantic-ai / pydantic_ai
ls -la
fd -a -t f "pyproject.toml|requirements*.txt|uv.lock|poetry.lock|Pipfile.lock" . 2>/dev/null || true

rg -n "pydantic[-_]ai|pydantic_ai" -S pyproject.toml requirements*.txt poetry.lock uv.lock Pipfile.lock 2>/dev/null || true

# Also show the relevant dependency section if pyproject exists
if [ -f pyproject.toml ]; then
  sed -n '1,200p' pyproject.toml
fi

Repository: lightspeed-core/lightspeed-stack

Length of output: 9409


🌐 Web query:

pydantic-ai 1.105.0 FinishReason tool_calls tool_output

💡 Result:

In Pydantic AI, FinishReason, tool_calls, and tool_output are core concepts related to how an agent manages model responses and tool execution. FinishReason The FinishReason is an enumeration that describes why a model stopped generating a response, normalized to OpenTelemetry values [1]. It includes the following possible values: 'stop', 'length', 'content_filter', 'tool_call', and 'error' [1]. Developers can use the finish_reason attribute on a ModelResponse to inspect why a model ended its generation, for instance, to implement fallback logic if a response was truncated due to length limits or a content filter [2]. Tool Calls and Tool Output Pydantic AI distinguishes between function tools (which perform actions) and output tools (which produce the final result of a run) [3][4]. 1. Tool Calls: When a model decides to use a tool, it emits a tool call. If the model emits a tool call, the finish_reason for that response is typically 'tool_call' [1][5]. 2. Tool Output: This is the data returned after executing a tool. In the default "Tool Output" mode, the JSON schema for output types is provided to the model as a special output tool [6][7]. 3. Managing Execution: Recent updates, such as those in PR #5339 (which influenced behavior leading up to v1.105.0), refined how these are handled [8][9]. Specifically: - The end_strategy parameter (e.g., 'graceful', 'exhaustive') controls the execution flow when a model emits both function tools and an output tool in the same response [4][7]. - 'Graceful' ensures function tools are executed even if a final result is found, while 'exhaustive' ensures all tools—including output tools—are executed [4]. - ToolOutput can be used as a marker class to customize output tools [6][7]. Pydantic AI v1.105.0 was released on June 2, 2026, and focused on features like on-demand loading and Grok model support [8]. The concepts of FinishReason and tool management remain central to the framework's operational flow [1][6].

Citations:


🏁 Script executed:

#!/bin/bash
set -euo pipefail
rg -n "get_agent_finish_reason\\(|finish_reason=.*tool" tests src -S

Repository: lightspeed-core/lightspeed-stack

Length of output: 882


🏁 Script executed:

#!/bin/bash
set -euo pipefail
sed -n '200,280p' src/utils/agents/query.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 2918


🏁 Script executed:

#!/bin/bash
set -euo pipefail
rg -n "\"tool_call\"|\"tool_calls\"|\"tool_output\"|tool_output|tool_calls" src tests -S

Repository: lightspeed-core/lightspeed-stack

Length of output: 28367


🌐 Web query:

pydantic-ai FinishReason enum values stop length content_filter tool_call error ModelResponse.finish_reason

💡 Result:

In Pydantic AI, the FinishReason type is a TypeAlias representing the reason a model stopped generating a response [1]. It is defined as a Literal string type with the following valid values [1]: stop: The model reached a natural stop point or a provided stop sequence [1][2]. length: The maximum number of tokens specified in the request was reached [1][2]. content_filter: The generation was stopped due to a content filter flag [1][2]. tool_call: The model initiated a tool call [1]. error: The generation ended due to an error (e.g., failure or cancellation) [1][3][4]. The ModelResponse class includes a finish_reason attribute (of type FinishReason | None) that records this information [1]. This field is intended to normalize diverse provider-specific finish reasons into a standardized format aligned with OpenTelemetry conventions [1][5]. Individual model providers (such as OpenAI, Anthropic, and xAI) implement internal mapping logic to convert their native API response finish reasons into these standardized Pydantic AI FinishReason values [3][6][7][8].

Citations:


Potential ValueError when finish_reason is tool_call (or otherwise unknown)

get_agent_finish_reason() constructs AgentFinishReason(response.finish_reason) without a guard. pydantic-ai’s FinishReason includes "tool_call", which is not a member of the local AgentFinishReason enum, so this will raise ValueError instead of mapping to an LCS error.

Add explicit handling for tool_call (or catch ValueError and map unknown finish reasons to a safe fallback like AgentFinishReason.ERROR) rather than directly converting to the enum.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/utils/agents/query.py` around lines 149 - 152, In
get_agent_finish_reason, avoid directly constructing
AgentFinishReason(response.finish_reason) because response.finish_reason can be
"tool_call" (or other unknown values) which will raise ValueError; update the
function to either explicitly check for "tool_call" on response.finish_reason
and return AgentFinishReason.ERROR, or wrap the enum conversion in a try/except
ValueError that returns AgentFinishReason.ERROR as a safe fallback, referencing
response.finish_reason and AgentFinishReason to locate the change.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From pydantic-ai openai processor:

_CHAT_FINISH_REASON_MAP: dict[
    Literal['stop', 'length', 'tool_calls', 'content_filter', 'function_call'], FinishReason
] = {
    'stop': 'stop',
    'length': 'length',
    'tool_calls': 'tool_call',
    'content_filter': 'content_filter',
    'function_call': 'tool_call',
}

_RESPONSES_FINISH_REASON_MAP: dict[Literal['max_output_tokens', 'content_filter'] | ResponseStatus, FinishReason] = {
    'max_output_tokens': 'length',
    'content_filter': 'content_filter',
    'completed': 'stop',
    'cancelled': 'error',
    'failed': 'error',
}

OpenResponses cannot result in tool_call finish reason.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asimurka, thanks for clarifying — you're right. Since this code targets the OpenAI Responses API (not Chat Completions), _RESPONSES_FINISH_REASON_MAP applies, and "tool_call" is not a possible finish reason from that path. All values it can emit — stop, length, content_filter, error — are covered by AgentFinishReason. My concern doesn't apply here.


✏️ Learnings added
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1880
File: src/utils/agents/query.py:149-152
Timestamp: 2026-06-09T07:36:38.363Z
Learning: In `src/utils/agents/query.py` (lightspeed-core/lightspeed-stack), `get_agent_finish_reason` and `AgentFinishReason` enum are designed specifically for the pydantic-ai OpenAI **Responses API** (not Chat Completions). The Responses API `_RESPONSES_FINISH_REASON_MAP` only produces `stop`, `length`, `content_filter`, and `error` as finish reasons — `tool_call` cannot occur from this API path. Therefore, a `ValueError` from `AgentFinishReason(response.finish_reason)` is not a concern in this context.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Comment thread src/utils/agents/query.py
@asimurka asimurka force-pushed the agent_query_utils branch from 0386bad to dd950ae Compare June 9, 2026 09:45

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/utils/agents/query.py`:
- Around line 74-86: Update the function docstrings in src/utils/agents/query.py
to follow the repo's Google-style docstring headings: replace "Args:" with
"Parameters:", ensure each docstring includes "Returns:" and "Raises:" sections
(and "Attributes:" for any classes), and keep the same descriptions and
parameter names/typing; specifically update the docstring for the top helper
that "Map agent run failures..." and the other helpers at the reported ranges
(lines ~104-112, 136-143, 156-164, 187-196, 228-242, 283-298) so they
consistently use "Parameters:", "Returns:", and "Raises:" headings and conform
to the project's docstring conventions.
- Around line 254-257: The loop over run_result.new_messages() currently
overwrites state.turn_summary.llm_response for each ModelResponse with text;
instead, append each message.text to preserve all fragments: when encountering a
ModelResponse in the loop, check state.turn_summary.llm_response (which may be
None/empty) and concatenate the new message.text (with a separator like a space
or newline) so earlier assistant text is not lost; update the code paths that
set llm_response to perform accumulation rather than assignment.
- Around line 311-313: The code casts responses_params.input to str before
calling agent.run but prepare_input() always returns a str and
retrieve_agent_response() has no production callers, so tighten the contract:
update retrieve_agent_response()'s signature to expect input: str (instead of
str | list[ResponseItem]) or add a lightweight runtime assertion inside
retrieve_agent_response() (e.g., assert isinstance(responses_params.input, str)
or raise TypeError) and then remove the cast at the call site where
build_agent(...) and agent.run(...) are invoked; reference
prepare_responses_params, prepare_input, retrieve_agent_response, build_agent,
and agent.run when making this change so the type narrowing/guard is applied
consistently.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 14fcd2e8-5a1a-4dfc-9846-6c5a167f47ce

📥 Commits

Reviewing files that changed from the base of the PR and between 0386bad and dd950ae.

📒 Files selected for processing (2)
  • src/utils/agents/query.py
  • tests/unit/utils/agents/test_query.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🧰 Additional context used
📓 Path-based instructions (2)
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Use pytest.mark.asyncio marker for async tests

Files:

  • tests/unit/utils/agents/test_query.py
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Llama Stack imports: Use from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
All functions must have complete type annotations for parameters and return types, use modern syntax (str | int), and include descriptive docstrings
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Use async def for I/O operations and external API calls
Use standard log levels with clear purposes: debug() for diagnostic info, info() for program execution, warning() for unexpected events, error() for serious problems
All classes must have descriptive docstrings explaining purpose and use PascalCase with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Abstract classes must use ABC with @abstractmethod decorators
Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes

Files:

  • src/utils/agents/query.py

Comment thread src/utils/agents/query.py
Comment thread src/utils/agents/query.py
Comment thread src/utils/agents/query.py

@tisnik tisnik left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment thread src/utils/agents/query.py Outdated
RuntimeError: Re-raised when ``exc`` is a non-agent ``RuntimeError`` that is
not a recognized context-length failure.
"""
if isinstance(exc, AgentRunError):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be rewritten using pattern matching, but it's a nitpick

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/utils/agents/query.py (1)

211-218: ⚠️ Potential issue | 🟠 Major

Fix llm_calls_total undercount for multi-step agent runs
extract_agent_token_usage() sets TokenCounter.llm_calls = max(usage.requests, 1), but it calls recording.record_llm_call() only once; record_llm_call() increments metrics.llm_calls_total...inc() by +1 with no parameter for the actual count. If RunUsage.requests > 1, llm_calls_total will underreport vs the returned token_usage.llm_calls. Make record_llm_call accept a count (or loop) and pass token_counter.llm_calls.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/utils/agents/query.py` around lines 211 - 218, The code records token
usage once but always increments llm_calls_total by 1, causing undercount when
extract_agent_token_usage sets TokenCounter.llm_calls > 1; update
recording.record_llm_call to accept a count parameter (e.g., count:int) or
support being called multiple times, then call it with token_counter.llm_calls
instead of a single call; ensure record_llm_token_usage still records tokens
once and record_llm_call uses the passed count to increment
metrics.llm_calls_total by that amount (or loop internally), updating the
record_llm_call signature and all call sites accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/utils/agents/query.py`:
- Around line 211-218: The code records token usage once but always increments
llm_calls_total by 1, causing undercount when extract_agent_token_usage sets
TokenCounter.llm_calls > 1; update recording.record_llm_call to accept a count
parameter (e.g., count:int) or support being called multiple times, then call it
with token_counter.llm_calls instead of a single call; ensure
record_llm_token_usage still records tokens once and record_llm_call uses the
passed count to increment metrics.llm_calls_total by that amount (or loop
internally), updating the record_llm_call signature and all call sites
accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: a988433f-7e32-4643-82bc-297dc1e3d1ce

📥 Commits

Reviewing files that changed from the base of the PR and between dd950ae and c3d5f18.

📒 Files selected for processing (1)
  • src/utils/agents/query.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: radon
  • GitHub Check: list_outdated_dependencies
  • GitHub Check: integration_tests (3.12)
  • GitHub Check: Pylinter
  • GitHub Check: E2E: library mode / ci / group 3
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: E2E: server mode / ci / group 3
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🧰 Additional context used
📓 Path-based instructions (1)
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Llama Stack imports: Use from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
All functions must have complete type annotations for parameters and return types, use modern syntax (str | int), and include descriptive docstrings
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Use async def for I/O operations and external API calls
Use standard log levels with clear purposes: debug() for diagnostic info, info() for program execution, warning() for unexpected events, error() for serious problems
All classes must have descriptive docstrings explaining purpose and use PascalCase with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Abstract classes must use ABC with @abstractmethod decorators
Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes

Files:

  • src/utils/agents/query.py
🔇 Additional comments (2)
src/utils/agents/query.py (2)

74-86: Still using Args: headings instead of the repo-required Parameters: sections.

This new module still hasn't been aligned with the required Google-style docstring headings. As per coding guidelines, "Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes."

Also applies to: 106-114, 138-145, 158-166, 189-198, 230-244, 285-300

Source: Coding guidelines


256-259: Assistant text is still overwritten instead of accumulated.

If run_result.new_messages() yields multiple text-bearing ModelResponses, earlier fragments are dropped and the summary only keeps the last one.

@jrobertboos jrobertboos left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but some of the e2e tests are failing, is that expected?

@asimurka

Copy link
Copy Markdown
Contributor Author

LGTM but some of the e2e tests are failing, is that expected?

Retried job passed, failing test was unrelated anyway.

@tisnik tisnik merged commit 2bfd657 into lightspeed-core:main Jun 10, 2026
36 of 37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants