Skip to content

LCORE-2311: Fix Skills#1972

Open
jrobertboos wants to merge 3 commits into
lightspeed-core:mainfrom
jrobertboos:fix-skills
Open

LCORE-2311: Fix Skills#1972
jrobertboos wants to merge 3 commits into
lightspeed-core:mainfrom
jrobertboos:fix-skills

Conversation

@jrobertboos

@jrobertboos jrobertboos commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Description

Fix Llama Stack conversation continuation by preventing conflicting parameters (conversation and previous_response_id) from being sent together. On continuation turns, messages are trimmed to only new input and previous_response_id is disabled, allowing Llama Stack to reconstruct prior history from the conversation ID. This fix applies to both streaming and non-streaming request paths.

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement
  • Benchmarks improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: Claude Opus 4.6
  • Generated by: N/A

Related Tickets & Documents

  • Related Issue # LCORE-2311
  • Closes # LCORE-2311

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Ran the service locally with Llama Stack backend
  • Tested non-streaming query requests to verify the request() override correctly trims messages and disables previous_response_id on continuation turns
  • Tested streaming query requests to verify request_stream() applies the same conversation continuation fix before calling the Responses API
  • Verified that first-turn requests (no prior ModelResponse) pass through unmodified
  • Confirmed Llama Stack no longer rejects requests with conflicting conversation + previous_response_id parameters

Summary by CodeRabbit

  • Bug Fixes
    • Fixed Llama Stack conversation continuation behavior to properly handle message history when resuming conversations, ensuring consistent behavior between streaming and non-streaming requests.

@coderabbitai

coderabbitai Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Walkthrough

LlamaStackResponsesModel gains a _prepare_conversation_continuation helper and a new request override. Both request and request_stream now invoke the helper to trim messages to those after the last ModelResponse with a provider_response_id and to remove openai_previous_response_id from settings when extra_body["conversation"] is present. ModelResponse is added to imports to support the detection logic.

Changes

LlamaStack conversation continuation preprocessing

Layer / File(s) Summary
Continuation helper, request override, and streaming wiring
src/pydantic_ai_lightspeed/llamastack/_model.py
Adds ModelResponse to imports; implements _prepare_conversation_continuation that trims the message list to entries after the last ModelResponse with a provider_response_id and strips openai_previous_response_id from settings; adds a request override that calls the helper then delegates to super().request(...); wires the same helper call into request_stream before the Responses API call.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

  • lightspeed-core/lightspeed-stack#1936: Modifies request_stream in LlamaStackResponsesModel for streaming event-order filtering, the same method that this PR also changes to add conversation continuation preprocessing.

Suggested reviewers

  • asimurka
  • tisnik
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The PR title 'LCORE-2311: Fix Skills' is vague and does not clearly describe the main change. While it references a ticket ID, the phrase 'Fix Skills' is too generic and doesn't convey that this is about Llama Stack conversation continuation behavior or message trimming. Provide a more descriptive title that clearly indicates the fix is for Llama Stack conversation continuation, such as 'LCORE-2311: Fix Llama Stack conversation continuation parameter conflicts' or 'LCORE-2311: Trim messages and disable previous_response_id on conversation continuation turns'.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
✨ Simplify code
  • Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@jrobertboos jrobertboos marked this pull request as ready for review June 23, 2026 16:33

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/pydantic_ai_lightspeed/llamastack/_model.py`:
- Around line 191-221: The docstrings for the request() method,
_prepare_conversation_continuation() method, and request_stream() method do not
follow Google Python docstring conventions. Add the required sections to each
docstring: a Parameters section documenting each method argument, a Returns
section describing the return value and its type, and a Raises section if the
method raises any exceptions. Ensure the docstrings are complete and properly
formatted according to Google Python style guidelines.
- Line 202: The `request()` method at line 202 is not forwarding the
`run_context` parameter to the parent implementation call via
`super().request()`. Similarly, the `request_stream()` method at lines 273-274
is not forwarding `run_context` when calling `_responses_create()`. Update both
methods to include `run_context` as an argument when delegating to parent
methods: add `run_context=run_context` to the `super().request()` call in the
`request()` method, and pass `run_context` to the `_responses_create()` method
in `request_stream()` if that method accepts it as a parameter.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0dc72a39-8804-4f28-a420-4d0ef868e8ed

📥 Commits

Reviewing files that changed from the base of the PR and between 9ff72ff and 43e90db.

📒 Files selected for processing (1)
  • src/pydantic_ai_lightspeed/llamastack/_model.py
📜 Review details
⏰ Context from checks skipped due to timeout. (2)
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-0-6-on-pull-request
🧰 Additional context used
📓 Path-based instructions (1)
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Llama Stack imports: Use from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
All functions must have complete type annotations for parameters and return types, use modern syntax (str | int), and include descriptive docstrings
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Use async def for I/O operations and external API calls
Use standard log levels with clear purposes: debug() for diagnostic info, info() for program execution, warning() for unexpected events, error() for serious problems
All classes must have descriptive docstrings explaining purpose and use PascalCase with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Abstract classes must use ABC with @abstractmethod decorators
Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes

Files:

  • src/pydantic_ai_lightspeed/llamastack/_model.py
🔇 Additional comments (2)
src/pydantic_ai_lightspeed/llamastack/_model.py (2)

29-29: LGTM!


222-243: LGTM!

Also applies to: 268-271

Comment on lines +191 to +221
"""Non-streaming request with Llama Stack conversation continuation fix.

Llama Stack rejects requests containing both ``conversation`` and
``previous_response_id``. On continuation turns (where a prior
``ModelResponse`` exists), we trim messages to only the new input and
disable ``previous_response_id`` so that only ``conversation`` is sent.
This ensures all responses are persisted to the conversation.
"""
messages, model_settings = self._prepare_conversation_continuation(
messages, model_settings
)
return await super().request(messages, model_settings, model_request_parameters)

def _prepare_conversation_continuation(
self,
messages: list[ModelMessage],
model_settings: ModelSettings | None,
) -> tuple[list[ModelMessage], ModelSettings | None]:
"""Trim messages and disable previous_response_id for conversation continuations.

Llama Stack rejects requests with both ``previous_response_id`` and
``conversation``. When ``conversation`` is in ``extra_body`` and there's
already a ModelResponse in the history (a continuation turn), we:

1. Trim messages to only those AFTER the last ModelResponse (new input only)
2. Disable ``openai_previous_response_id`` so pydantic-ai won't resolve one

This means Llama Stack receives ``conversation`` (for persistence) plus only
the new input items. Llama Stack reconstructs prior history from the
conversation and appends the new input correctly.
"""

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick win

Add the required Google docstring sections to the new/updated methods.

The new request() and _prepare_conversation_continuation() docstrings, and the updated request_stream() docstring, do not include the required parameter/return/raise sections.

As per coding guidelines, “Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes.”

Also applies to: 253-266

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/pydantic_ai_lightspeed/llamastack/_model.py` around lines 191 - 221, The
docstrings for the request() method, _prepare_conversation_continuation()
method, and request_stream() method do not follow Google Python docstring
conventions. Add the required sections to each docstring: a Parameters section
documenting each method argument, a Returns section describing the return value
and its type, and a Raises section if the method raises any exceptions. Ensure
the docstrings are complete and properly formatted according to Google Python
style guidelines.

Source: Coding guidelines

messages, model_settings = self._prepare_conversation_continuation(
messages, model_settings
)
return await super().request(messages, model_settings, model_request_parameters)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major

❓ Verification inconclusive

🏁 Script executed:

#!/bin/bash
# Description: Confirm OpenAIResponsesModel.request accepts run_context in the dependency version under review.

python - <<'PY'
import inspect
from pydantic_ai.models.openai import OpenAIResponsesModel

signature = inspect.signature(OpenAIResponsesModel.request)
print(signature)
assert "run_context" in signature.parameters
PY

Repository: lightspeed-core/lightspeed-stack

Length of output: 294


🏁 Script executed:

#!/bin/bash
# Explore repository structure
git ls-files | grep -E "(pyproject|setup|requirements)" | head -20

Repository: lightspeed-core/lightspeed-stack

Length of output: 476


🏁 Script executed:

#!/bin/bash
# Find and read the file in question
find . -name "_model.py" -path "*/llamastack/*" | head -5

Repository: lightspeed-core/lightspeed-stack

Length of output: 220


🏁 Script executed:

#!/bin/bash
# Get more context on the repository structure
fd -t f "\.py$" src/pydantic_ai_lightspeed/llamastack/ | head -20

Repository: lightspeed-core/lightspeed-stack

Length of output: 371


🏁 Script executed:

#!/bin/bash
# Read the _model.py file to understand the context
cat -n src/pydantic_ai_lightspeed/llamastack/_model.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 15057


🏁 Script executed:

#!/bin/bash
# Check the pyproject.toml for pydantic_ai version/dependency info
head -100 pyproject.toml && echo "..." && tail -100 pyproject.toml

Repository: lightspeed-core/lightspeed-stack

Length of output: 5570


🏁 Script executed:

#!/bin/bash
# Search for run_context usage patterns in the codebase to understand if it's important
rg "run_context" --type py -B 2 -A 2 | head -80

Repository: lightspeed-core/lightspeed-stack

Length of output: 170


🏁 Script executed:

#!/bin/bash
# Check if there are any test files that might reveal the expected behavior
find . -name "*test*" -type f -path "*llamastack*" | head -10

Repository: lightspeed-core/lightspeed-stack

Length of output: 299


🏁 Script executed:

#!/bin/bash
# Look for documentation or comments about run_context in the OpenAI model context
rg "RunContext\|run_context" src/pydantic_ai_lightspeed/ --type py

Repository: lightspeed-core/lightspeed-stack

Length of output: 170


🏁 Script executed:

#!/bin/bash
# Search more broadly in the entire codebase for run_context without filtering
rg "run_context" --type py src/

Repository: lightspeed-core/lightspeed-stack

Length of output: 576


🏁 Script executed:

#!/bin/bash
# Check the test files to see if they use run_context
cat -n ./tests/unit/pydantic_ai_lightspeed/llamastack/test_provider.py | head -100

Repository: lightspeed-core/lightspeed-stack

Length of output: 4722


🏁 Script executed:

#!/bin/bash
# Look at how request() is called elsewhere in the codebase
rg "\.request\(" --type py -B 1 -A 1 src/ | head -50

Repository: lightspeed-core/lightspeed-stack

Length of output: 414


🏁 Script executed:

#!/bin/bash
# Check what the request_stream method does with run_context and _responses_create signature
ast-grep outline src/pydantic_ai_lightspeed/llamastack/_model.py --view expanded

Repository: lightspeed-core/lightspeed-stack

Length of output: 834


🏁 Script executed:

#!/bin/bash
# Search for _responses_create calls to see the pattern of parameter passing
rg "_responses_create" --type py -B 2 -A 2 src/

Repository: lightspeed-core/lightspeed-stack

Length of output: 631


🏁 Script executed:

#!/bin/bash
# Check if there are any comments or documentation about the base class contract
rg "OpenAIResponsesModel" --type py -B 3 -A 3 src/pydantic_ai_lightspeed/llamastack/

Repository: lightspeed-core/lightspeed-stack

Length of output: 1908


🏁 Script executed:

#!/bin/bash
# Check the comment "unused-argument" to see if there's any pattern or explanation
rg "unused-argument" --type py -B 5 -A 5 src/pydantic_ai_lightspeed/llamastack/

Repository: lightspeed-core/lightspeed-stack

Length of output: 2202


🏁 Script executed:

#!/bin/bash
# Look for any integration tests that might call these methods with run_context
find . -name "*.py" -path "*/test*" -exec grep -l "LlamaStackResponsesModel\|request.*run_context" {} \;

Repository: lightspeed-core/lightspeed-stack

Length of output: 170


🏁 Script executed:

#!/bin/bash
# Check if there's a base class method signature we can infer from parent class imports
rg "from pydantic_ai" src/pydantic_ai_lightspeed/llamastack/_model.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 525


🏁 Script executed:

#!/bin/bash
# Search for any comments or documentation about run_context in pydantic_ai usage
rg "run_context" --type py -B 5 -A 5 src/ tests/

Repository: lightspeed-core/lightspeed-stack

Length of output: 4151


🏁 Script executed:

#!/bin/bash
# Check the git history or any PR comments related to this file
git log --oneline src/pydantic_ai_lightspeed/llamastack/_model.py | head -10

Repository: lightspeed-core/lightspeed-stack

Length of output: 199


🏁 Script executed:

#!/bin/bash
# Look for usage patterns in request_stream - does it forward run_context anywhere?
sed -n '246,310p' src/pydantic_ai_lightspeed/llamastack/_model.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 2860


🏁 Script executed:

#!/bin/bash
# Check if there's a requirements file or lockfile that pins pydantic-ai version
cat pyproject.toml | grep -A 2 -B 2 "pydantic-ai"

Repository: lightspeed-core/lightspeed-stack

Length of output: 284


🏁 Script executed:

#!/bin/bash
# Look for any other usages of pydantic_ai request signature in the codebase
rg "super()\s*\.request\(" --type py -B 3 -A 3

Repository: lightspeed-core/lightspeed-stack

Length of output: 170


🏁 Script executed:

#!/bin/bash
# Check if request_stream also has the same issue with run_context not being forwarded
rg "request_stream" --type py -B 2 -A 10 src/pydantic_ai_lightspeed/llamastack/ | grep -A 20 "async def request_stream"

Repository: lightspeed-core/lightspeed-stack

Length of output: 1254


🏁 Script executed:

#!/bin/bash
# Check pydantic-ai documentation or source to understand run_context usage
# Since we can't install it, let's see if there are any comments about why run_context is marked as unused
grep -n "unused-argument" src/pydantic_ai_lightspeed/llamastack/_model.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 301


🏁 Script executed:

#!/bin/bash
# Check if there are any comments in the class or methods about the design decision
sed -n '176,210p' src/pydantic_ai_lightspeed/llamastack/_model.py | cat -n

Repository: lightspeed-core/lightspeed-stack

Length of output: 2035


🏁 Script executed:

#!/bin/bash
# Check the behavior more carefully - look at what the base class might need from run_context
# by examining how it's documented in the docstring
grep -A 20 "def request" src/pydantic_ai_lightspeed/llamastack/_model.py | head -30

Repository: lightspeed-core/lightspeed-stack

Length of output: 1571


🏁 Script executed:

#!/bin/bash
# Check if there's any indication in pydantic-ai imports about versioning or documentation
rg "pydantic.ai|pydantic_ai" pyproject.toml

Repository: lightspeed-core/lightspeed-stack

Length of output: 231


Forward run_context when delegating to parent methods.

Both request() and request_stream() accept run_context, but neither forwards it when calling the parent implementation or helper methods, which could cause context loss if the base OpenAIResponsesModel methods consume it.

🐛 Proposed fixes

For request() at line 202:

-        return await super().request(messages, model_settings, model_request_parameters)
+        return await super().request(
+            messages,
+            model_settings,
+            model_request_parameters,
+            run_context=run_context,
+        )

For request_stream() at lines 273–274, if _responses_create accepts run_context:

         response = await self._responses_create(
-            messages, True, model_settings_cast, model_request_parameters
+            messages, True, model_settings_cast, model_request_parameters, run_context
         )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/pydantic_ai_lightspeed/llamastack/_model.py` at line 202, The `request()`
method at line 202 is not forwarding the `run_context` parameter to the parent
implementation call via `super().request()`. Similarly, the `request_stream()`
method at lines 273-274 is not forwarding `run_context` when calling
`_responses_create()`. Update both methods to include `run_context` as an
argument when delegating to parent methods: add `run_context=run_context` to the
`super().request()` call in the `request()` method, and pass `run_context` to
the `_responses_create()` method in `request_stream()` if that method accepts it
as a parameter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant