LCORE-2311: Fix Skills by jrobertboos · Pull Request #1972 · lightspeed-core/lightspeed-stack

jrobertboos · 2026-06-22T17:58:50Z

Description

Fix Llama Stack conversation continuation by preventing conflicting parameters (conversation and previous_response_id) from being sent together. On continuation turns, messages are trimmed to only new input and previous_response_id is disabled, allowing Llama Stack to reconstruct prior history from the conversation ID. This fix applies to both streaming and non-streaming request paths.

Type of change

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

Assisted-by: Claude Opus 4.6
Generated by: N/A

Related Tickets & Documents

Related Issue # LCORE-2311
Closes # LCORE-2311

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Ran the service locally with Llama Stack backend
Tested non-streaming query requests to verify the request() override correctly trims messages and disables previous_response_id on continuation turns
Tested streaming query requests to verify request_stream() applies the same conversation continuation fix before calling the Responses API
Verified that first-turn requests (no prior ModelResponse) pass through unmodified
Confirmed Llama Stack no longer rejects requests with conflicting conversation + previous_response_id parameters

Summary by CodeRabbit

Bug Fixes
- Fixed Llama Stack conversation continuation behavior to properly handle message history when resuming conversations, ensuring consistent behavior between streaming and non-streaming requests.

coderabbitai · 2026-06-22T17:59:04Z

Walkthrough

LlamaStackResponsesModel gains a _prepare_conversation_continuation helper and a new request override. Both request and request_stream now invoke the helper to trim messages to those after the last ModelResponse with a provider_response_id and to remove openai_previous_response_id from settings when extra_body["conversation"] is present. ModelResponse is added to imports to support the detection logic.

Changes

LlamaStack conversation continuation preprocessing

Layer / File(s)	Summary
Continuation helper, `request` override, and streaming wiring `src/pydantic_ai_lightspeed/llamastack/_model.py`	Adds `ModelResponse` to imports; implements `_prepare_conversation_continuation` that trims the message list to entries after the last `ModelResponse` with a `provider_response_id` and strips `openai_previous_response_id` from settings; adds a `request` override that calls the helper then delegates to `super().request(...)`; wires the same helper call into `request_stream` before the Responses API call.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

lightspeed-core/lightspeed-stack#1936: Modifies request_stream in LlamaStackResponsesModel for streaming event-order filtering, the same method that this PR also changes to add conversation continuation preprocessing.

Suggested reviewers

asimurka
tisnik

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The PR title 'LCORE-2311: Fix Skills' is vague and does not clearly describe the main change. While it references a ticket ID, the phrase 'Fix Skills' is too generic and doesn't convey that this is about Llama Stack conversation continuation behavior or message trimming.	Provide a more descriptive title that clearly indicates the fix is for Llama Stack conversation continuation, such as 'LCORE-2311: Fix Llama Stack conversation continuation parameter conflicts' or 'LCORE-2311: Trim messages and disable previous_response_id on conversation continuation turns'.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

✨ Simplify code

Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/pydantic_ai_lightspeed/llamastack/_model.py`:
- Around line 191-221: The docstrings for the request() method,
_prepare_conversation_continuation() method, and request_stream() method do not
follow Google Python docstring conventions. Add the required sections to each
docstring: a Parameters section documenting each method argument, a Returns
section describing the return value and its type, and a Raises section if the
method raises any exceptions. Ensure the docstrings are complete and properly
formatted according to Google Python style guidelines.
- Line 202: The `request()` method at line 202 is not forwarding the
`run_context` parameter to the parent implementation call via
`super().request()`. Similarly, the `request_stream()` method at lines 273-274
is not forwarding `run_context` when calling `_responses_create()`. Update both
methods to include `run_context` as an argument when delegating to parent
methods: add `run_context=run_context` to the `super().request()` call in the
`request()` method, and pass `run_context` to the `_responses_create()` method
in `request_stream()` if that method accepts it as a parameter.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0dc72a39-8804-4f28-a420-4d0ef868e8ed

📥 Commits

Reviewing files that changed from the base of the PR and between 9ff72ff and 43e90db.

📒 Files selected for processing (1)

src/pydantic_ai_lightspeed/llamastack/_model.py

📜 Review details

⏰ Context from checks skipped due to timeout. (2)

GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-0-6-on-pull-request

🧰 Additional context used

📓 Path-based instructions (1)

src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Llama Stack imports: Use from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
All functions must have complete type annotations for parameters and return types, use modern syntax (str | int), and include descriptive docstrings
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Use async def for I/O operations and external API calls
Use standard log levels with clear purposes: debug() for diagnostic info, info() for program execution, warning() for unexpected events, error() for serious problems
All classes must have descriptive docstrings explaining purpose and use PascalCase with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Abstract classes must use ABC with @abstractmethod decorators
Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes

Files:

src/pydantic_ai_lightspeed/llamastack/_model.py

🔇 Additional comments (2)

src/pydantic_ai_lightspeed/llamastack/_model.py (2)

29-29: LGTM!

222-243: LGTM!

Also applies to: 268-271

coderabbitai · 2026-06-23T16:39:29Z

+        """Non-streaming request with Llama Stack conversation continuation fix.
+
+        Llama Stack rejects requests containing both ``conversation`` and
+        ``previous_response_id``.  On continuation turns (where a prior
+        ``ModelResponse`` exists), we trim messages to only the new input and
+        disable ``previous_response_id`` so that only ``conversation`` is sent.
+        This ensures all responses are persisted to the conversation.
+        """
+        messages, model_settings = self._prepare_conversation_continuation(
+            messages, model_settings
+        )
+        return await super().request(messages, model_settings, model_request_parameters)
+
+    def _prepare_conversation_continuation(
+        self,
+        messages: list[ModelMessage],
+        model_settings: ModelSettings | None,
+    ) -> tuple[list[ModelMessage], ModelSettings | None]:
+        """Trim messages and disable previous_response_id for conversation continuations.
+
+        Llama Stack rejects requests with both ``previous_response_id`` and
+        ``conversation``. When ``conversation`` is in ``extra_body`` and there's
+        already a ModelResponse in the history (a continuation turn), we:
+
+        1. Trim messages to only those AFTER the last ModelResponse (new input only)
+        2. Disable ``openai_previous_response_id`` so pydantic-ai won't resolve one
+
+        This means Llama Stack receives ``conversation`` (for persistence) plus only
+        the new input items. Llama Stack reconstructs prior history from the
+        conversation and appends the new input correctly.
+        """


📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick win

Add the required Google docstring sections to the new/updated methods.

The new request() and _prepare_conversation_continuation() docstrings, and the updated request_stream() docstring, do not include the required parameter/return/raise sections.

As per coding guidelines, “Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes.”

Also applies to: 253-266

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/pydantic_ai_lightspeed/llamastack/_model.py` around lines 191 - 221, The docstrings for the request() method, _prepare_conversation_continuation() method, and request_stream() method do not follow Google Python docstring conventions. Add the required sections to each docstring: a Parameters section documenting each method argument, a Returns section describing the return value and its type, and a Raises section if the method raises any exceptions. Ensure the docstrings are complete and properly formatted according to Google Python style guidelines.

Source: Coding guidelines

coderabbitai · 2026-06-23T16:39:29Z

+        messages, model_settings = self._prepare_conversation_continuation(
+            messages, model_settings
+        )
+        return await super().request(messages, model_settings, model_request_parameters)


🎯 Functional Correctness | 🟠 Major

❓ Verification inconclusive

🏁 Script executed:

#!/bin/bash # Description: Confirm OpenAIResponsesModel.request accepts run_context in the dependency version under review. python - <<'PY' import inspect from pydantic_ai.models.openai import OpenAIResponsesModel signature = inspect.signature(OpenAIResponsesModel.request) print(signature) assert "run_context" in signature.parameters PY

Repository: lightspeed-core/lightspeed-stack

Length of output: 294

🏁 Script executed:

#!/bin/bash # Explore repository structure git ls-files | grep -E "(pyproject|setup|requirements)" | head -20

Repository: lightspeed-core/lightspeed-stack

Length of output: 476

🏁 Script executed:

#!/bin/bash # Find and read the file in question find . -name "_model.py" -path "*/llamastack/*" | head -5

Repository: lightspeed-core/lightspeed-stack

Length of output: 220

🏁 Script executed:

#!/bin/bash # Get more context on the repository structure fd -t f "\.py$" src/pydantic_ai_lightspeed/llamastack/ | head -20

Repository: lightspeed-core/lightspeed-stack

Length of output: 371

🏁 Script executed:

#!/bin/bash # Read the _model.py file to understand the context cat -n src/pydantic_ai_lightspeed/llamastack/_model.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 15057

🏁 Script executed:

#!/bin/bash # Check the pyproject.toml for pydantic_ai version/dependency info head -100 pyproject.toml && echo "..." && tail -100 pyproject.toml

Repository: lightspeed-core/lightspeed-stack

Length of output: 5570

🏁 Script executed:

#!/bin/bash # Search for run_context usage patterns in the codebase to understand if it's important rg "run_context" --type py -B 2 -A 2 | head -80

Repository: lightspeed-core/lightspeed-stack

Length of output: 170

🏁 Script executed:

#!/bin/bash # Check if there are any test files that might reveal the expected behavior find . -name "*test*" -type f -path "*llamastack*" | head -10

Repository: lightspeed-core/lightspeed-stack

Length of output: 299

🏁 Script executed:

#!/bin/bash # Look for documentation or comments about run_context in the OpenAI model context rg "RunContext\|run_context" src/pydantic_ai_lightspeed/ --type py

Repository: lightspeed-core/lightspeed-stack

Length of output: 170

🏁 Script executed:

#!/bin/bash # Search more broadly in the entire codebase for run_context without filtering rg "run_context" --type py src/

Repository: lightspeed-core/lightspeed-stack

Length of output: 576

🏁 Script executed:

#!/bin/bash # Check the test files to see if they use run_context cat -n ./tests/unit/pydantic_ai_lightspeed/llamastack/test_provider.py | head -100

Repository: lightspeed-core/lightspeed-stack

Length of output: 4722

🏁 Script executed:

#!/bin/bash # Look at how request() is called elsewhere in the codebase rg "\.request\(" --type py -B 1 -A 1 src/ | head -50

Repository: lightspeed-core/lightspeed-stack

Length of output: 414

🏁 Script executed:

#!/bin/bash # Check what the request_stream method does with run_context and _responses_create signature ast-grep outline src/pydantic_ai_lightspeed/llamastack/_model.py --view expanded

Repository: lightspeed-core/lightspeed-stack

Length of output: 834

🏁 Script executed:

#!/bin/bash # Search for _responses_create calls to see the pattern of parameter passing rg "_responses_create" --type py -B 2 -A 2 src/

Repository: lightspeed-core/lightspeed-stack

Length of output: 631

🏁 Script executed:

#!/bin/bash # Check if there are any comments or documentation about the base class contract rg "OpenAIResponsesModel" --type py -B 3 -A 3 src/pydantic_ai_lightspeed/llamastack/

Repository: lightspeed-core/lightspeed-stack

Length of output: 1908

🏁 Script executed:

#!/bin/bash # Check the comment "unused-argument" to see if there's any pattern or explanation rg "unused-argument" --type py -B 5 -A 5 src/pydantic_ai_lightspeed/llamastack/

Repository: lightspeed-core/lightspeed-stack

Length of output: 2202

🏁 Script executed:

#!/bin/bash # Look for any integration tests that might call these methods with run_context find . -name "*.py" -path "*/test*" -exec grep -l "LlamaStackResponsesModel\|request.*run_context" {} \;

Repository: lightspeed-core/lightspeed-stack

Length of output: 170

🏁 Script executed:

#!/bin/bash # Check if there's a base class method signature we can infer from parent class imports rg "from pydantic_ai" src/pydantic_ai_lightspeed/llamastack/_model.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 525

🏁 Script executed:

#!/bin/bash # Search for any comments or documentation about run_context in pydantic_ai usage rg "run_context" --type py -B 5 -A 5 src/ tests/

Repository: lightspeed-core/lightspeed-stack

Length of output: 4151

🏁 Script executed:

#!/bin/bash # Check the git history or any PR comments related to this file git log --oneline src/pydantic_ai_lightspeed/llamastack/_model.py | head -10

Repository: lightspeed-core/lightspeed-stack

Length of output: 199

🏁 Script executed:

#!/bin/bash # Look for usage patterns in request_stream - does it forward run_context anywhere? sed -n '246,310p' src/pydantic_ai_lightspeed/llamastack/_model.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 2860

🏁 Script executed:

#!/bin/bash # Check if there's a requirements file or lockfile that pins pydantic-ai version cat pyproject.toml | grep -A 2 -B 2 "pydantic-ai"

Repository: lightspeed-core/lightspeed-stack

Length of output: 284

🏁 Script executed:

#!/bin/bash # Look for any other usages of pydantic_ai request signature in the codebase rg "super()\s*\.request\(" --type py -B 3 -A 3

Repository: lightspeed-core/lightspeed-stack

Length of output: 170

🏁 Script executed:

#!/bin/bash # Check if request_stream also has the same issue with run_context not being forwarded rg "request_stream" --type py -B 2 -A 10 src/pydantic_ai_lightspeed/llamastack/ | grep -A 20 "async def request_stream"

Repository: lightspeed-core/lightspeed-stack

Length of output: 1254

🏁 Script executed:

#!/bin/bash # Check pydantic-ai documentation or source to understand run_context usage # Since we can't install it, let's see if there are any comments about why run_context is marked as unused grep -n "unused-argument" src/pydantic_ai_lightspeed/llamastack/_model.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 301

🏁 Script executed:

#!/bin/bash # Check if there are any comments in the class or methods about the design decision sed -n '176,210p' src/pydantic_ai_lightspeed/llamastack/_model.py | cat -n

Repository: lightspeed-core/lightspeed-stack

Length of output: 2035

🏁 Script executed:

#!/bin/bash # Check the behavior more carefully - look at what the base class might need from run_context # by examining how it's documented in the docstring grep -A 20 "def request" src/pydantic_ai_lightspeed/llamastack/_model.py | head -30

Repository: lightspeed-core/lightspeed-stack

Length of output: 1571

🏁 Script executed:

#!/bin/bash # Check if there's any indication in pydantic-ai imports about versioning or documentation rg "pydantic.ai|pydantic_ai" pyproject.toml

Repository: lightspeed-core/lightspeed-stack

Length of output: 231

Forward run_context when delegating to parent methods.

Both request() and request_stream() accept run_context, but neither forwards it when calling the parent implementation or helper methods, which could cause context loss if the base OpenAIResponsesModel methods consume it.

🐛 Proposed fixes

For request() at line 202:

- return await super().request(messages, model_settings, model_request_parameters) + return await super().request( + messages, + model_settings, + model_request_parameters, + run_context=run_context, + )

For request_stream() at lines 273–274, if _responses_create accepts run_context:

response = await self._responses_create( - messages, True, model_settings_cast, model_request_parameters + messages, True, model_settings_cast, model_request_parameters, run_context )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/pydantic_ai_lightspeed/llamastack/_model.py` at line 202, The `request()` method at line 202 is not forwarding the `run_context` parameter to the parent implementation call via `super().request()`. Similarly, the `request_stream()` method at lines 273-274 is not forwarding `run_context` when calling `_responses_create()`. Update both methods to include `run_context` as an argument when delegating to parent methods: add `run_context=run_context` to the `super().request()` call in the `request()` method, and pass `run_context` to the `_responses_create()` method in `request_stream()` if that method accepts it as a parameter.

potential fix

afab5ad

jrobertboos force-pushed the fix-skills branch from 949a8de to afab5ad Compare June 23, 2026 00:05

jrobertboos added 2 commits June 22, 2026 20:19

fixed style

4a35644

fix streaming_query

43e90db

jrobertboos marked this pull request as ready for review June 23, 2026 16:33

coderabbitai Bot reviewed Jun 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LCORE-2311: Fix Skills#1972

LCORE-2311: Fix Skills#1972
jrobertboos wants to merge 3 commits into
lightspeed-core:mainfrom
jrobertboos:fix-skills

jrobertboos commented Jun 22, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 22, 2026 •

edited

Loading

❌ Failed checks (1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 23, 2026

Uh oh!

coderabbitai Bot Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jrobertboos commented Jun 22, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Tools used to create PR

Related Tickets & Documents

Checklist before requesting a review

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jrobertboos commented Jun 22, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 22, 2026 •

edited

Loading