LCORE-1270: E2e tests responses tools by asimurka · Pull Request #1414 · lightspeed-core/lightspeed-stack

asimurka · 2026-03-27T15:52:58Z

Description

This PR adds E2E tests for tool choices in responses endpoint

Type of change

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

Assisted-by: (e.g., Claude, CodeRabbit, Ollama, etc., N/A if not used)
Generated by: (e.g., tool name and version; N/A if not used)

Related Tickets & Documents

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Please provide detailed steps to perform tests related to this code change.
How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

Tests
- Added end-to-end coverage for Responses API behavior around tool selection, verifying when tool invocations occur or not, that response text contains expected fragments, and that token metrics change across scenarios.

coderabbitai · 2026-03-27T15:53:15Z

Caution

Review failed

Pull request was closed or merged during review

Walkthrough

Adds end-to-end Gherkin scenarios and new Behave step helpers to test responses endpoint tool_choice behaviors, asserting HTTP 200, presence/absence of tool-invocation output items, expected output text fragments, and token metric changes across multiple tool-choice configurations.

Changes

Cohort / File(s)	Summary
Responses endpoint tool-choice E2E scenarios `tests/e2e/features/responses.feature`	Added ~169 lines of Gherkin scenarios covering `tool_choice` variations (`"none"`, `"auto"`, `"required"`, explicit `{"type":"file_search"}`) and `allowed_tools` filters; assertions include HTTP 200, existence/non-existence of tool invocation output item types (e.g., `file_search_call`), expected output text fragments when file search is used, and token metric increases.
Test validation helpers and steps `tests/e2e/features/steps/llm_query_response.py`	Added `_RESPONSE_TOOL_OUTPUT_ITEM_TYPES` constant and `_collect_output_item_types(response_body: dict[str, Any]) -> list[str]`; added Behave `@then` steps to assert: no tool invocation item types present, presence/absence of a specific `output[*].type`, presence of one of multiple types, and substring fragment checks against `response_json["output_text"]`. Additional guards ensure `context.response` is present and parse Responses JSON structure.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Linked Issues check	⚠️ Warning	The PR implements most stated objectives: tool_choice modes (none, auto, required) are tested, tool invocation presence/absence is asserted, output_text fragments for file_search are verified, and token metrics are captured. However, reviewers note a missing scenario for allowed_tools name-based filtering of specific MCP tool names.	Add test scenario(s) verifying that allowed_tools can filter MCP invocations by specific MCP tool names in auto and required modes, as recommended in the review guidance.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'LCORE-1270: E2e tests responses tools' directly references the issue number and clearly indicates the PR adds end-to-end tests for tool choices in the responses endpoint.
Out of Scope Changes check	✅ Passed	All changes are scoped to end-to-end tests for the responses endpoint's tool choice behavior. The additions to the feature file and test step helpers directly support the PR objectives without unrelated modifications.
Docstring Coverage	✅ Passed	Docstring coverage is 97.92% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

✨ Simplify code

Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

asimurka · 2026-03-27T15:58:19Z

Do not close, needs rebase at the top of #1412

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/responses.md`:
- Around line 283-284: Update the docs to state that the `allowed_tools` filter
is applied to both request-supplied `tools` and the implicit tool set produced
by `prepare_tools()` (including the LCORE tool set), i.e., `allowed_tools`
filters the prepared/implicit tools when the request omits `tools` as well as
the explicit `tools` array; clarify that `mode` remains `"auto"` or `"required"`
and `tools` is a list of key-valued filters that apply to all candidate tools,
including LCORE.

In `@src/utils/responses.py`:
- Around line 493-505: The helper mcp_strip_name_from_allowlist_entries
currently removes "name" for any entry with type "mcp", which can turn
{"type":"mcp","name":"foo"} without a server_label into a permissive
{"type":"mcp"}; change the logic so you only remove the "name" when the entry is
an MCP and also contains a "server_label" (i.e., check new_entry.get("type") ==
"mcp" and new_entry.get("server_label") is not None) so malformed/narrow MCP
filters without server_label are left intact (and thus handled/ignored by
group_mcp_tools_by_server/filter_tools_by_allowed_entries) instead of widening
the allowlist.
- Around line 1515-1517: The early return when tool_choice ==
ToolChoiceMode.none is currently returning None for all three outputs and clears
explicit vector_store_ids; change the behavior so disabling tools only disables
tool-related outputs (e.g., tool config and tool inputs) but preserves and
returns the original vector_store_ids so build_rag_context still receives
file_search.vector_store_ids; locate the check against ToolChoiceMode.none
(variable tool_choice, enum ToolChoiceMode) and adjust the returned tuple to
keep the third value (vector_store_ids) instead of None.

In `@tests/e2e/features/responses.feature`:
- Around line 555-575: Update the scenario's request body so the model is forced
to attempt a file search: inside the JSON passed to the step "When I use
\"responses\" to ask question with authorization header" (the request that
contains "tool_choice" and "instructions"), change the "instructions" value to
explicitly require the file_search tool (e.g., include a sentence like "You MUST
use the file_search tool to answer this question.") so the negative check on
"responses output should not include an item with type \"file_search_call\""
actually verifies that the allowlist filtering (the "tool_choice" -> "tools":
[{"type":"mcp"}]) prevented the model from using file_search.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f40a15ef-64e0-4d60-8dbd-d81eebc4eb50

📥 Commits

Reviewing files that changed from the base of the PR and between e184541 and c7b90c3.

📒 Files selected for processing (5)

docs/responses.md
src/utils/responses.py
tests/e2e/features/responses.feature
tests/e2e/features/steps/llm_query_response.py
tests/unit/utils/test_responses.py

tisnik · 2026-03-28T22:11:34Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🎫 Ticket compliance analysis 🔶 1406 - Partially compliant Compliant requirements: Add end-to-end tests for tool configuration and restrictions in the responses endpoint Cover `tool_choice` modes none, auto, required Assert tool invocation items presence/absence Verify `output_text` fragments for file_search scenarios Capture token metrics in tests Non-compliant requirements: End-to-end tests for `allowed_tools` filtering of MCP tools by specific `name` are missing Requires further human verification: (none)
⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Missing Test There is no end-to-end scenario verifying that `allowed_tools` with specific MCP tool `name` values restricts MCP invocations to those names. Without it, the MCP name-filtering behavior added in code isn’t validated in e2e tests. Scenario: Check if responses endpoint with allowed tools in automatic mode answers knowledge question using file search Given The system is in default state And I set the Authorization header to Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6Ikpva And I capture the current token metrics When I use "responses" to ask question with authorization header """ { "input": "What is the title of the article from Paul?", "model": "{PROVIDER}/{MODEL}", "stream": false, "instructions": "You are an assistant. You MUST use the file_search tool to answer. Answer in lowercase.", "tool_choice": { "type": "allowed_tools", "mode": "auto", "tools": [{"type": "file_search"}] } } """ Then The status code of the response is 200 And The responses output should include an item with type "file_search_call" And The responses output_text should contain following fragments \| Fragments in LLM response \| \| great work \| And The token metrics should have increased Scenario: Check if responses endpoint with allowed tools in required mode invokes file search for a basic question Given The system is in default state And I set the Authorization header to Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6Ikpva And I capture the current token metrics When I use "responses" to ask question with authorization header """ { "input": "Hello world!", "model": "{PROVIDER}/{MODEL}", "stream": false, "tool_choice": { "type": "allowed_tools", "mode": "required", "tools": [{"type": "file_search"}] } } """ Then The status code of the response is 200 And The responses output should include an item with type "file_search_call" And The token metrics should have increased

radofuchs

LGTM

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/e2e/features/responses.feature`:
- Around line 555-598: Tests currently only filter MCP by type and miss
scenarios that whitelist a specific MCP tool name; add two new cases (auto and
required) under the existing responses.feature scenarios that use tool_choice
with type "allowed_tools" and tools set to
[{"type":"mcp","name":"<concrete_tool_name>"}] (replace <concrete_tool_name>
with a real MCP tool identifier used in the suite) and assert that only
invocations for that named MCP occur (for auto: responses output should include
only the specified MCP invocation and no other MCP/file_search_call items; for
required: responses output should contain no other tool invocation item types if
the name is invalid). Make sure to reference the same request structure keys
("input", "model", "tool_choice") and reuse the existing assertions about status
200 and token metric increases.

In `@tests/e2e/features/steps/llm_query_response.py`:
- Around line 234-253: The test check_fragments_in_responses_output_text
currently looks for a non-existent "output_text" field; update it to use the
Responses aggregated text field "n": ensure you check that "n" exists in
response_json (replace the "output_text" existence assert), set output_text =
response_json["n"], and then run the same fragment membership assertions against
that value in the loop inside check_fragments_in_responses_output_text.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7a436334-319c-43a7-804b-ca5e9ac3c8f3

📥 Commits

Reviewing files that changed from the base of the PR and between c7b90c3 and ba99db0.

📒 Files selected for processing (2)

tests/e2e/features/responses.feature
tests/e2e/features/steps/llm_query_response.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request

🧰 Additional context used

📓 Path-based instructions (4)

tests/e2e/features/**/*.feature

📄 CodeRabbit inference engine (AGENTS.md)

Use behave (BDD) framework with Gherkin feature files for end-to-end tests

Files:

tests/e2e/features/responses.feature

**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use absolute imports for internal modules from the Lightspeed Core Stack (e.g., from authentication import get_auth_dependency)

Files:

tests/e2e/features/steps/llm_query_response.py

tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Maintain test coverage of at least 60% for unit tests and 10% for integration tests

Files:

tests/e2e/features/steps/llm_query_response.py

tests/e2e/features/steps/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Define E2E test step implementations in tests/e2e/features/steps/ directory

Files:

tests/e2e/features/steps/llm_query_response.py

🧠 Learnings (2)

📚 Learning: 2026-02-25T07:46:39.608Z

Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:39.608Z
Learning: In the lightspeed-stack codebase, src/models/requests.py uses OpenAIResponseInputTool as Tool while src/models/responses.py uses OpenAIResponseTool as Tool. This type difference is intentional - input tools and output/response tools have different schemas in llama-stack-api.

Applied to files:

tests/e2e/features/responses.feature

📚 Learning: 2025-08-25T09:05:18.350Z

Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 445
File: tests/e2e/features/health.feature:50-53
Timestamp: 2025-08-25T09:05:18.350Z
Learning: The step definition for ignoring specific fields in response body comparisons is implemented in tests/e2e/features/steps/common_http.py at line 175 as `then('The body of the response, ignoring the "{field}" field, is the following')`.

Applied to files:

tests/e2e/features/steps/llm_query_response.py

🔇 Additional comments (2)

tests/e2e/features/responses.feature (1)

432-554: Good tool_choice mode coverage expansion.

These scenarios add solid end-to-end checks for none/auto/required, explicit file-search selection, output item-type assertions, response fragment checks, and token-metric movement.

tests/e2e/features/steps/llm_query_response.py (1)

16-84: Nice addition of reusable Responses output-type assertions.

The shared collector + dedicated @then steps improve readability and scenario intent for tool invocation checks.

tisnik

LGTM

asimurka marked this pull request as draft March 27, 2026 15:53

asimurka changed the title ~~E2e tests responses tools~~ LCORE-1270: E2e tests responses tools Mar 27, 2026

coderabbitai Bot reviewed Mar 27, 2026

View reviewed changes

Comment thread docs/responses.md

Comment thread src/utils/responses.py

Comment thread src/utils/responses.py Outdated

Comment thread tests/e2e/features/responses.feature Outdated

asimurka force-pushed the e2e_tests_responses_tools branch from c7b90c3 to 0b1f3ba Compare March 27, 2026 16:38

tisnik added the Review effort 4/5 label Mar 28, 2026

Added e2e tests for tool choices in responses endpoint

28227a1

asimurka force-pushed the e2e_tests_responses_tools branch 2 times, most recently from ae2d522 to 7265e3a Compare March 31, 2026 14:20

Adjust tools resolution

ba99db0

asimurka force-pushed the e2e_tests_responses_tools branch from 7265e3a to ba99db0 Compare March 31, 2026 14:20

asimurka marked this pull request as ready for review March 31, 2026 14:21

asimurka requested a review from radofuchs March 31, 2026 14:21

radofuchs approved these changes Mar 31, 2026

View reviewed changes

coderabbitai Bot reviewed Mar 31, 2026

View reviewed changes

Comment thread tests/e2e/features/responses.feature

Comment thread tests/e2e/features/steps/llm_query_response.py

tisnik approved these changes Mar 31, 2026

View reviewed changes

black fix

506a86f

tisnik merged commit 18b9a10 into lightspeed-core:main Mar 31, 2026
19 of 23 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LCORE-1270: E2e tests responses tools#1414

LCORE-1270: E2e tests responses tools#1414
tisnik merged 3 commits into
lightspeed-core:mainfrom
asimurka:e2e_tests_responses_tools

asimurka commented Mar 27, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Mar 27, 2026 •

edited

Loading

Review failed

❌ Failed checks (1 warning)

Uh oh!

asimurka commented Mar 27, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tisnik commented Mar 28, 2026

Uh oh!

radofuchs left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

tisnik left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

asimurka commented Mar 27, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Tools used to create PR

Related Tickets & Documents

Checklist before requesting a review

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

asimurka commented Mar 27, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tisnik commented Mar 28, 2026

PR Reviewer Guide 🔍

Uh oh!

radofuchs left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tisnik left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

asimurka commented Mar 27, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 27, 2026 •

edited

Loading