LCORE-1270: E2e tests responses tools#1414
Conversation
|
Caution Review failedPull request was closed or merged during review WalkthroughAdds end-to-end Gherkin scenarios and new Behave step helpers to test Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Do not close, needs rebase at the top of #1412 |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/responses.md`:
- Around line 283-284: Update the docs to state that the `allowed_tools` filter
is applied to both request-supplied `tools` and the implicit tool set produced
by `prepare_tools()` (including the LCORE tool set), i.e., `allowed_tools`
filters the prepared/implicit tools when the request omits `tools` as well as
the explicit `tools` array; clarify that `mode` remains `"auto"` or `"required"`
and `tools` is a list of key-valued filters that apply to all candidate tools,
including LCORE.
In `@src/utils/responses.py`:
- Around line 493-505: The helper mcp_strip_name_from_allowlist_entries
currently removes "name" for any entry with type "mcp", which can turn
{"type":"mcp","name":"foo"} without a server_label into a permissive
{"type":"mcp"}; change the logic so you only remove the "name" when the entry is
an MCP and also contains a "server_label" (i.e., check new_entry.get("type") ==
"mcp" and new_entry.get("server_label") is not None) so malformed/narrow MCP
filters without server_label are left intact (and thus handled/ignored by
group_mcp_tools_by_server/filter_tools_by_allowed_entries) instead of widening
the allowlist.
- Around line 1515-1517: The early return when tool_choice ==
ToolChoiceMode.none is currently returning None for all three outputs and clears
explicit vector_store_ids; change the behavior so disabling tools only disables
tool-related outputs (e.g., tool config and tool inputs) but preserves and
returns the original vector_store_ids so build_rag_context still receives
file_search.vector_store_ids; locate the check against ToolChoiceMode.none
(variable tool_choice, enum ToolChoiceMode) and adjust the returned tuple to
keep the third value (vector_store_ids) instead of None.
In `@tests/e2e/features/responses.feature`:
- Around line 555-575: Update the scenario's request body so the model is forced
to attempt a file search: inside the JSON passed to the step "When I use
\"responses\" to ask question with authorization header" (the request that
contains "tool_choice" and "instructions"), change the "instructions" value to
explicitly require the file_search tool (e.g., include a sentence like "You MUST
use the file_search tool to answer this question.") so the negative check on
"responses output should not include an item with type \"file_search_call\""
actually verifies that the allowlist filtering (the "tool_choice" -> "tools":
[{"type":"mcp"}]) prevented the model from using file_search.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: f40a15ef-64e0-4d60-8dbd-d81eebc4eb50
📒 Files selected for processing (5)
docs/responses.mdsrc/utils/responses.pytests/e2e/features/responses.featuretests/e2e/features/steps/llm_query_response.pytests/unit/utils/test_responses.py
c7b90c3 to
0b1f3ba
Compare
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
ae2d522 to
7265e3a
Compare
7265e3a to
ba99db0
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/e2e/features/responses.feature`:
- Around line 555-598: Tests currently only filter MCP by type and miss
scenarios that whitelist a specific MCP tool name; add two new cases (auto and
required) under the existing responses.feature scenarios that use tool_choice
with type "allowed_tools" and tools set to
[{"type":"mcp","name":"<concrete_tool_name>"}] (replace <concrete_tool_name>
with a real MCP tool identifier used in the suite) and assert that only
invocations for that named MCP occur (for auto: responses output should include
only the specified MCP invocation and no other MCP/file_search_call items; for
required: responses output should contain no other tool invocation item types if
the name is invalid). Make sure to reference the same request structure keys
("input", "model", "tool_choice") and reuse the existing assertions about status
200 and token metric increases.
In `@tests/e2e/features/steps/llm_query_response.py`:
- Around line 234-253: The test check_fragments_in_responses_output_text
currently looks for a non-existent "output_text" field; update it to use the
Responses aggregated text field "n": ensure you check that "n" exists in
response_json (replace the "output_text" existence assert), set output_text =
response_json["n"], and then run the same fragment membership assertions against
that value in the loop inside check_fragments_in_responses_output_text.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 7a436334-319c-43a7-804b-ca5e9ac3c8f3
📒 Files selected for processing (2)
tests/e2e/features/responses.featuretests/e2e/features/steps/llm_query_response.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🧰 Additional context used
📓 Path-based instructions (4)
tests/e2e/features/**/*.feature
📄 CodeRabbit inference engine (AGENTS.md)
Use behave (BDD) framework with Gherkin feature files for end-to-end tests
Files:
tests/e2e/features/responses.feature
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use absolute imports for internal modules from the Lightspeed Core Stack (e.g.,
from authentication import get_auth_dependency)
Files:
tests/e2e/features/steps/llm_query_response.py
tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Maintain test coverage of at least 60% for unit tests and 10% for integration tests
Files:
tests/e2e/features/steps/llm_query_response.py
tests/e2e/features/steps/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Define E2E test step implementations in
tests/e2e/features/steps/directory
Files:
tests/e2e/features/steps/llm_query_response.py
🧠 Learnings (2)
📚 Learning: 2026-02-25T07:46:39.608Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:39.608Z
Learning: In the lightspeed-stack codebase, src/models/requests.py uses OpenAIResponseInputTool as Tool while src/models/responses.py uses OpenAIResponseTool as Tool. This type difference is intentional - input tools and output/response tools have different schemas in llama-stack-api.
Applied to files:
tests/e2e/features/responses.feature
📚 Learning: 2025-08-25T09:05:18.350Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 445
File: tests/e2e/features/health.feature:50-53
Timestamp: 2025-08-25T09:05:18.350Z
Learning: The step definition for ignoring specific fields in response body comparisons is implemented in tests/e2e/features/steps/common_http.py at line 175 as `then('The body of the response, ignoring the "{field}" field, is the following')`.
Applied to files:
tests/e2e/features/steps/llm_query_response.py
🔇 Additional comments (2)
tests/e2e/features/responses.feature (1)
432-554: Good tool_choice mode coverage expansion.These scenarios add solid end-to-end checks for
none/auto/required, explicit file-search selection, output item-type assertions, response fragment checks, and token-metric movement.tests/e2e/features/steps/llm_query_response.py (1)
16-84: Nice addition of reusable Responses output-type assertions.The shared collector + dedicated
@thensteps improve readability and scenario intent for tool invocation checks.
Description
This PR adds E2E tests for tool choices in responses endpoint
Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit