RHIDP-12952: persist interrupted conversation by Jdubrick · Pull Request #1971 · lightspeed-core/lightspeed-stack

Jdubrick · 2026-06-22T14:03:00Z

Description

When query interruption was added initially it replaced the entire conversation portion that was interrupted with the interrupt message. This change allows the half-completed message to remain after fixing any breaking code fences/html/tables/etc.

Type of change

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

Assisted-by: Claude (Cursor)
Generated by: Claude (Cursor)

Related Tickets & Documents

Related Issue https://redhat.atlassian.net/browse/RHIDP-12952
Closes https://redhat.atlassian.net/browse/RHIDP-12952

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Please provide detailed steps to perform tests related to this code change.
How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

Release Notes

New Features
- Interrupted streaming responses now preserve already-generated partial text and emit a sequential interruption update.
- The interruption output automatically repairs incomplete Markdown/HTML so the final message renders cleanly.
Bug Fixes
- Improved persistence and display of interruption content, including correct reconstruction from streamed deltas and consistent chunk ordering.
Updates
- Updated the interruption notice to: “Response stopped by the user.”
Tests
- Expanded coverage for interruption behavior, Markdown repair, and chunk-id sequencing.

coderabbitai · 2026-06-22T14:03:22Z

Walkthrough

The PR replaces the static interrupted-response message with a mechanism that accumulates streamed token deltas into TurnSummary.partial_tokens, tracks the next SSE chunk index in TurnSummary.next_chunk_id, repairs any unclosed Markdown via a new close_open_markdown utility, and emits the resulting suffix as a final token event on asyncio.CancelledError in both the agent and deprecated endpoint streaming paths.

Changes

Structured Interrupted Response from Partial Tokens

Layer / File(s)	Summary
TurnSummary fields and Markdown repair utility `src/models/common/turn_summary.py`, `src/utils/markdown_repair.py`, `tests/unit/utils/test_markdown_repair.py`	`TurnSummary` gains `partial_tokens: list[str]` and `next_chunk_id: int`. New `markdown_repair.py` exposes `close_open_markdown(text)`, which returns the minimal suffix to close any open code fence and block-level HTML tags; fully tested across fence variants, HTML tag nesting, and edge cases.
`build_interrupted_response` and persistence `src/constants.py`, `src/utils/stream_interrupts.py`, `tests/unit/utils/test_stream_interrupts.py`	`build_interrupted_response(partial_tokens)` joins tokens, repairs Markdown, and returns `(full_text, suffix)`. `_on_interrupt` and `persist_interrupted_turn` now use `turn_summary.llm_response` (set to `full_text`) instead of the static constant. `INTERRUPTED_RESPONSE_MESSAGE` wording updated. Tests validate response construction with unclosed fences and empty token lists.
Agent streaming: accumulation and cancellation suffix `src/utils/agents/streaming.py`, `tests/unit/utils/agents/test_streaming.py`	`_process_token` appends token text to `turn_summary.partial_tokens` and sets `turn_summary.next_chunk_id`; `AgentRunResultEvent` handler also sets `next_chunk_id`. On `CancelledError`, `generate_agent_response` calls `build_interrupted_response`, persists with `full_text`, and emits a `TokenStreamPayload` carrying `suffix` at `next_chunk_id`. New tests validate monotonic chunk IDs, repaired response content, and cancellation before any tokens are emitted.
Endpoint cancellation path and delta tracking `src/app/endpoints/streaming_query.py`, `tests/unit/app/endpoints/test_streaming_query.py`	`generate_response` cancellation handler replaced to call `build_interrupted_response` and emit `LLM_TOKEN_EVENT` with the suffix. `response_generator` sets `turn_summary.next_chunk_id` at kickoff, each delta, and completion; deltas are also appended to `turn_summary.partial_tokens`. Tests expect token events during cancellation and verify persisted `llm_response` matches computed indicator.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant generate_response / generate_agent_response
  participant _process_token / response_generator
  participant build_interrupted_response
  participant close_open_markdown
  participant persist_interrupted_turn

  rect rgba(100, 149, 237, 0.5)
    note over _process_token / response_generator: Normal streaming
    _process_token / response_generator-->Client: token SSE events (chunk_id 0, 1, 2…)
    _process_token / response_generator->>generate_response / generate_agent_response: appends partial_tokens, sets next_chunk_id
  end

  rect rgba(220, 100, 100, 0.5)
    note over generate_response / generate_agent_response: asyncio.CancelledError
    generate_response / generate_agent_response->>build_interrupted_response: partial_tokens
    build_interrupted_response->>close_open_markdown: joined text
    close_open_markdown-->>build_interrupted_response: repair suffix
    build_interrupted_response-->>generate_response / generate_agent_response: (full_text, emit_suffix)
    generate_response / generate_agent_response->>persist_interrupted_turn: llm_response = full_text
    generate_response / generate_agent_response-->>Client: final token SSE event (emit_suffix, next_chunk_id)
  end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

lightspeed-core/lightspeed-stack#1870: Modifies the same deprecated generate_response/response_generator utilities in streaming_query.py and their agent stream-event wiring.
lightspeed-core/lightspeed-stack#1918: Changes the stream-interruption flow in streaming_query.py to delegate to stream_interrupts helpers (persist_interrupted_turn, interrupt callback), which this PR further extends.
lightspeed-core/lightspeed-stack#1919: Touches agent streaming SSE/interruption handling and persistence in utils/agents/streaming.py, the same file updated here.

Suggested reviewers

tisnik
jrobertboos

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'persist interrupted conversation' clearly and directly describes the main feature being implemented across the changeset.
Docstring Coverage	✅ Passed	Docstring coverage is 89.47% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

✨ Simplify code

Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/app/endpoints/streaming_query.py`:
- Around line 637-640: The build_interrupted_response call at line 637 relies
solely on turn_summary.partial_tokens, which may be empty or incomplete if
cancellation occurs after response.output_text.done has populated
turn_summary.llm_response but before all deltas are processed. Modify the
build_interrupted_response call to use turn_summary.llm_response as a fallback
when partial_tokens is empty, ensuring that model output is not lost when
interrupted responses are reconstructed and persisted.

In `@src/utils/markdown_repair.py`:
- Around line 75-90: In the fence closing logic (the elif condition checking
`char == fence_char and len(matched_group) >= fence_len`), add validation to
ensure that any trailing content after the fence marker contains only whitespace
characters (spaces and tabs). Extract the remainder of the line after the
matched fence group and check that it either doesn't exist or contains only
whitespace using a string method like strip() or a regex check. Only allow the
fence to close if this whitespace validation passes, otherwise treat the line as
regular content inside the code block.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 093e2b47-86b6-41de-a11e-9ff652c597b9

📥 Commits

Reviewing files that changed from the base of the PR and between 9ff72ff and 6aeea11.

📒 Files selected for processing (10)

src/app/endpoints/streaming_query.py
src/constants.py
src/models/common/turn_summary.py
src/utils/agents/streaming.py
src/utils/markdown_repair.py
src/utils/stream_interrupts.py
tests/unit/app/endpoints/test_streaming_query.py
tests/unit/utils/agents/test_streaming.py
tests/unit/utils/test_markdown_repair.py
tests/unit/utils/test_stream_interrupts.py

📜 Review details

⏰ Context from checks skipped due to timeout. (12)

GitHub Check: E2E: library mode / ci / group 3
GitHub Check: E2E: server mode / ci / group 3
GitHub Check: E2E: library mode / ci / group 2
GitHub Check: E2E: server mode / ci / group 2
GitHub Check: E2E: server mode / ci / group 1
GitHub Check: E2E: library mode / ci / group 1
GitHub Check: build-pr
GitHub Check: E2E Tests for Lightspeed Evaluation job
GitHub Check: integration_tests (3.13)
GitHub Check: integration_tests (3.12)
GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-0-6-on-pull-request

🧰 Additional context used

📓 Path-based instructions (5)

src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Llama Stack imports: Use from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
All functions must have complete type annotations for parameters and return types, use modern syntax (str | int), and include descriptive docstrings
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Use async def for I/O operations and external API calls
Use standard log levels with clear purposes: debug() for diagnostic info, info() for program execution, warning() for unexpected events, error() for serious problems
All classes must have descriptive docstrings explaining purpose and use PascalCase with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Abstract classes must use ABC with @abstractmethod decorators
Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes

Files:

src/constants.py
src/models/common/turn_summary.py
src/utils/markdown_repair.py
src/utils/stream_interrupts.py
src/app/endpoints/streaming_query.py
src/utils/agents/streaming.py

src/constants.py

📄 CodeRabbit inference engine (AGENTS.md)

Use constants.py for shared constants with descriptive comments and type hints using Final[type]

Files:

src/constants.py

src/models/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Pydantic models must use @model_validator and @field_validator for validation and complete type annotations for all attributes, avoiding Any type

Files:

src/models/common/turn_summary.py

tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Use pytest.mark.asyncio marker for async tests

Files:

tests/unit/utils/test_markdown_repair.py
tests/unit/app/endpoints/test_streaming_query.py
tests/unit/utils/agents/test_streaming.py
tests/unit/utils/test_stream_interrupts.py

src/app/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/app/**/*.py: FastAPI dependencies: Import from fastapi module for APIRouter, HTTPException, Request, status, Depends
Use FastAPI HTTPException with appropriate status codes for API endpoints and handle APIConnectionError from Llama Stack

Files:

src/app/endpoints/streaming_query.py

🧠 Learnings (3)

📚 Learning: 2026-01-12T10:58:40.230Z

Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.

Applied to files:

src/models/common/turn_summary.py

📚 Learning: 2026-02-25T07:46:33.545Z

Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.

Applied to files:

src/models/common/turn_summary.py

📚 Learning: 2026-04-06T20:18:07.852Z

Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.

Applied to files:

src/app/endpoints/streaming_query.py

🔇 Additional comments (7)

src/models/common/turn_summary.py (1)

117-126: LGTM!

src/constants.py (1)

15-15: LGTM!

src/utils/stream_interrupts.py (1)

23-23: LGTM!

Also applies to: 219-239, 277-277, 286-286, 368-369

tests/unit/utils/test_stream_interrupts.py (1)

8-21: LGTM!

Also applies to: 49-49, 71-72, 100-100, 169-193

src/utils/agents/streaming.py (1)

28-28: LGTM!

Also applies to: 68-68, 201-217, 358-364, 415-415

tests/unit/utils/agents/test_streaming.py (1)

67-68: LGTM!

Also applies to: 720-722, 813-813, 966-1106

tests/unit/app/endpoints/test_streaming_query.py (1)

54-54: LGTM!

Also applies to: 74-75, 1385-1385, 1394-1400

Jdubrick · 2026-06-22T19:09:15Z

/cc @tisnik

Signed-off-by: Jordan Dubrick <jdubrick@redhat.com>

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/app/endpoints/streaming_query.py (1)

774-783: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Advance next_chunk_id before yielding each SSE chunk.

Because response_generator suspends at yield, cancellation immediately after generate_response re-yields a token can skip Lines 782-783 or 808-809. The cancellation handler then emits the interruption suffix with the stale turn_summary.next_chunk_id, duplicating the ID of the just-emitted token. Move the ID advancement before each yield while preserving the event’s current ID.

Proposed fix

         if event_type == "response.content_part.added":
+            event_id = chunk_id
+            chunk_id += 1
+            turn_summary.next_chunk_id = chunk_id
             yield stream_event(
                 {
-                    "id": chunk_id,
+                    "id": event_id,
                     "token": "",
                 },
                 LLM_TOKEN_EVENT,
                 media_type,
             )
-            chunk_id += 1
-            turn_summary.next_chunk_id = chunk_id
 
         # Text streaming - emit token delta
         elif event_type == "response.output_text.delta":
             delta_chunk = cast(TextDeltaChunk, chunk)
             text_parts.append(delta_chunk.delta)
             turn_summary.partial_tokens.append(delta_chunk.delta)
+            event_id = chunk_id
+            chunk_id += 1
+            turn_summary.next_chunk_id = chunk_id
             yield stream_event(
                 {
-                    "id": chunk_id,
+                    "id": event_id,
                     "token": delta_chunk.delta,
                 },
                 LLM_TOKEN_EVENT,
                 media_type,
             )
-            chunk_id += 1
-            turn_summary.next_chunk_id = chunk_id
@@
         elif event_type == "response.completed":
             latest_response_object = cast(
                 OpenAIResponseObject,
                 getattr(chunk, "response"),  # noqa: B009
             )
             turn_summary.llm_response = turn_summary.llm_response or "".join(text_parts)
             # Capture structured output items for compacted-mode turn storage
             # (LCORE-1572), so the persisted turn keeps non-text output items
             # rather than being flattened to the response text.
             turn_summary.output_items = list(latest_response_object.output or [])
+            event_id = chunk_id
+            chunk_id += 1
+            turn_summary.next_chunk_id = chunk_id
             yield stream_event(
                 {
-                    "id": chunk_id,
+                    "id": event_id,
                     "token": turn_summary.llm_response,
                 },
                 LLM_TURN_COMPLETE_EVENT,
                 media_type,
             )
-            chunk_id += 1
-            turn_summary.next_chunk_id = chunk_id

Also applies to: 800-809, 889-898

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/app/endpoints/streaming_query.py` around lines 774 - 783, The ID
advancement in the stream_event yielding blocks happens after the yield
statement, which can cause the increment to be skipped during cancellation. Move
the `chunk_id += 1` and `turn_summary.next_chunk_id = chunk_id` statements to
occur before each `yield stream_event()` call while preserving the event's ID
field to use the current chunk_id value at the time of emission. This pattern
appears in three locations: the stream_event yielding block around lines
774-783, the similar block around lines 800-809, and the block around lines
889-898. Save the current chunk_id value before incrementing, use the saved
value in the event's ID field, and increment the counters before yielding to
ensure the ID advancement cannot be skipped by cancellation.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/utils/markdown_repair.py`:
- Around line 30-50: The function `_process_html_tags` currently mutates its
`html_stack` parameter in-place using `.pop()` and `.append()` operations.
Refactor this function to return the updated stack instead of modifying the
parameter directly. Change the function signature to return a list (the updated
html_stack) and update all callers to capture the returned value rather than
relying on side effects. This applies to both the main function definition and
any other similar patterns mentioned at lines 77-79.
- Around line 31-36: The docstring for the function starting at line 31 uses
"Parameters:" section header, but Google Python docstring conventions require
"Args:" instead. Update the docstring header from "Parameters:" to "Args:" in
the function at line 31-36, and apply the same fix to the other docstring
mentioned at lines 53-64. Additionally, review both docstrings to ensure they
include all required Google format sections (Args, Returns, Raises) where
applicable based on what each function actually does.

In `@src/utils/stream_interrupts.py`:
- Around line 219-232: The docstring for the build_interrupted_response function
uses "Parameters:" and "Returns:" sections, but the repository standard requires
Google-style docstring format with "Args", "Returns", and "Raises" sections.
Update the docstring by renaming the "Parameters:" section to "Args:" to match
the required convention. Additionally, add a "Raises:" section if the function
can raise any exceptions during execution, following the repository's Google
Python docstring conventions.

---

Outside diff comments:
In `@src/app/endpoints/streaming_query.py`:
- Around line 774-783: The ID advancement in the stream_event yielding blocks
happens after the yield statement, which can cause the increment to be skipped
during cancellation. Move the `chunk_id += 1` and `turn_summary.next_chunk_id =
chunk_id` statements to occur before each `yield stream_event()` call while
preserving the event's ID field to use the current chunk_id value at the time of
emission. This pattern appears in three locations: the stream_event yielding
block around lines 774-783, the similar block around lines 800-809, and the
block around lines 889-898. Save the current chunk_id value before incrementing,
use the saved value in the event's ID field, and increment the counters before
yielding to ensure the ID advancement cannot be skipped by cancellation.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 187ae3b8-ac25-444d-befd-f5f266b56a23

📥 Commits

Reviewing files that changed from the base of the PR and between 6aeea11 and 13718e0.

📒 Files selected for processing (10)

src/app/endpoints/streaming_query.py
src/constants.py
src/models/common/turn_summary.py
src/utils/agents/streaming.py
src/utils/markdown_repair.py
src/utils/stream_interrupts.py
tests/unit/app/endpoints/test_streaming_query.py
tests/unit/utils/agents/test_streaming.py
tests/unit/utils/test_markdown_repair.py
tests/unit/utils/test_stream_interrupts.py

📜 Review details

⏰ Context from checks skipped due to timeout. (14)

GitHub Check: unit_tests (3.13)
GitHub Check: unit_tests (3.12)
GitHub Check: build-pr
GitHub Check: E2E Tests for Lightspeed Evaluation job
GitHub Check: integration_tests (3.13)
GitHub Check: integration_tests (3.12)
GitHub Check: E2E: library mode / ci / group 2
GitHub Check: E2E: server mode / ci / group 3
GitHub Check: E2E: server mode / ci / group 2
GitHub Check: E2E: server mode / ci / group 1
GitHub Check: E2E: library mode / ci / group 1
GitHub Check: E2E: library mode / ci / group 3
GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-0-6-on-pull-request
GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request

🧰 Additional context used

📓 Path-based instructions (5)

src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Llama Stack imports: Use from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
All functions must have complete type annotations for parameters and return types, use modern syntax (str | int), and include descriptive docstrings
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Use async def for I/O operations and external API calls
Use standard log levels with clear purposes: debug() for diagnostic info, info() for program execution, warning() for unexpected events, error() for serious problems
All classes must have descriptive docstrings explaining purpose and use PascalCase with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Abstract classes must use ABC with @abstractmethod decorators
Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes

Files:

src/utils/markdown_repair.py
src/constants.py
src/models/common/turn_summary.py
src/utils/agents/streaming.py
src/utils/stream_interrupts.py
src/app/endpoints/streaming_query.py

src/constants.py

📄 CodeRabbit inference engine (AGENTS.md)

Use constants.py for shared constants with descriptive comments and type hints using Final[type]

Files:

src/constants.py

src/models/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Pydantic models must use @model_validator and @field_validator for validation and complete type annotations for all attributes, avoiding Any type

Files:

src/models/common/turn_summary.py

tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Use pytest.mark.asyncio marker for async tests

Files:

tests/unit/app/endpoints/test_streaming_query.py
tests/unit/utils/test_markdown_repair.py
tests/unit/utils/agents/test_streaming.py
tests/unit/utils/test_stream_interrupts.py

src/app/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/app/**/*.py: FastAPI dependencies: Import from fastapi module for APIRouter, HTTPException, Request, status, Depends
Use FastAPI HTTPException with appropriate status codes for API endpoints and handle APIConnectionError from Llama Stack

Files:

src/app/endpoints/streaming_query.py

🧠 Learnings (3)

📚 Learning: 2026-01-12T10:58:40.230Z

Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.

Applied to files:

src/models/common/turn_summary.py

📚 Learning: 2026-02-25T07:46:33.545Z

Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.

Applied to files:

src/models/common/turn_summary.py

📚 Learning: 2026-04-06T20:18:07.852Z

Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.

Applied to files:

src/app/endpoints/streaming_query.py

🔇 Additional comments (13)

src/models/common/turn_summary.py (1)

117-126: LGTM!

tests/unit/utils/test_markdown_repair.py (1)

1-175: LGTM!

src/constants.py (1)

15-15: LGTM!

src/utils/stream_interrupts.py (1)

23-23: LGTM!

Also applies to: 277-287, 368-369

tests/unit/utils/test_stream_interrupts.py (1)

8-21: LGTM!

Also applies to: 49-49, 71-73, 100-100, 169-193

src/utils/agents/streaming.py (3)

28-28: LGTM!

Also applies to: 68-68

201-217: LGTM!

358-364: LGTM!

Also applies to: 415-415

tests/unit/utils/agents/test_streaming.py (2)

67-68: LGTM!

Also applies to: 720-722, 813-813

966-1106: LGTM!

src/app/endpoints/streaming_query.py (1)

124-124: LGTM!

Also applies to: 637-652

tests/unit/app/endpoints/test_streaming_query.py (2)

54-54: LGTM!

Also applies to: 74-75

1385-1400: LGTM!

coderabbitai · 2026-06-23T17:29:34Z

+def _process_html_tags(line: str, html_stack: list[str]) -> None:
+    """Update *html_stack* with block-level HTML open/close tags found in *line*.
+
+    Parameters:
+        line: A single line of text to scan for HTML tags.
+        html_stack: Mutable stack tracking open block-level tags.
+    """
+    for tag_match in _TAG_RE.finditer(line):
+        is_closing = tag_match.group(1) == "/"
+        tag_name = tag_match.group(2).lower()
+        is_self_closing = tag_match.group(4) == "/"
+
+        if tag_name not in BLOCK_HTML_TAGS or is_self_closing:
+            continue
+
+        if is_closing:
+            if html_stack and html_stack[-1] == tag_name:
+                html_stack.pop()
+        else:
+            html_stack.append(tag_name)
+


📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick win

Avoid in-place mutation of function parameters in _process_html_tags.

_process_html_tags currently mutates html_stack directly. Return an updated stack instead so the helper remains side-effect explicit and easier to reason about.

As per coding guidelines, "Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters."

Proposed refactor

-def _process_html_tags(line: str, html_stack: list[str]) -> None: +def _process_html_tags(line: str, html_stack: list[str]) -> list[str]: @@ - for tag_match in _TAG_RE.finditer(line): + updated_stack = [*html_stack] + for tag_match in _TAG_RE.finditer(line): @@ - if is_closing: - if html_stack and html_stack[-1] == tag_name: - html_stack.pop() + if is_closing: + if updated_stack and updated_stack[-1] == tag_name: + updated_stack.pop() else: - html_stack.append(tag_name) + updated_stack.append(tag_name) + return updated_stack @@ - _process_html_tags(line, html_stack) + html_stack = _process_html_tags(line, html_stack)

Also applies to: 77-79

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/utils/markdown_repair.py` around lines 30 - 50, The function `_process_html_tags` currently mutates its `html_stack` parameter in-place using `.pop()` and `.append()` operations. Refactor this function to return the updated stack instead of modifying the parameter directly. Change the function signature to return a list (the updated html_stack) and update all callers to capture the returned value rather than relying on side effects. This applies to both the main function definition and any other similar patterns mentioned at lines 77-79.

Source: Coding guidelines

coderabbitai · 2026-06-23T17:29:34Z

+    """Update *html_stack* with block-level HTML open/close tags found in *line*.
+
+    Parameters:
+        line: A single line of text to scan for HTML tags.
+        html_stack: Mutable stack tracking open block-level tags.
+    """


📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick win

Align new function docstrings with required Google docstring sections.

The new docstrings should use the repository’s required Google format (Args, Returns, Raises) for consistency.

As per coding guidelines, "Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes."

Also applies to: 53-64

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/utils/markdown_repair.py` around lines 31 - 36, The docstring for the function starting at line 31 uses "Parameters:" section header, but Google Python docstring conventions require "Args:" instead. Update the docstring header from "Parameters:" to "Args:" in the function at line 31-36, and apply the same fix to the other docstring mentioned at lines 53-64. Additionally, review both docstrings to ensure they include all required Google format sections (Args, Returns, Raises) where applicable based on what each function actually does.

Source: Coding guidelines

coderabbitai · 2026-06-23T17:29:34Z

+def build_interrupted_response(partial_tokens: list[str]) -> tuple[str, str]:
+    """Build the final interrupted response text from accumulated tokens.
+
+    Joins partial tokens, repairs any open markdown constructs, and appends
+    an italicized interruption indicator.
+
+    Parameters:
+        partial_tokens: List of text deltas accumulated during streaming.
+
+    Returns:
+        A tuple of (full_response_text, suffix_to_emit) where full_response_text
+        is the complete message for persistence and suffix_to_emit is the new
+        content to send as a final SSE token event.
+    """


📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick win

Use required Google-style docstring sections in build_interrupted_response.

Please update the new helper docstring to the required convention (Args, Returns, Raises) for consistency with repository standards.

As per coding guidelines, "Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes."

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/utils/stream_interrupts.py` around lines 219 - 232, The docstring for the build_interrupted_response function uses "Parameters:" and "Returns:" sections, but the repository standard requires Google-style docstring format with "Args", "Returns", and "Raises" sections. Update the docstring by renaming the "Parameters:" section to "Args:" to match the required convention. Additionally, add a "Raises:" section if the function can raise any exceptions during execution, following the repository's Google Python docstring conventions.

Source: Coding guidelines

coderabbitai Bot reviewed Jun 22, 2026

View reviewed changes

Comment thread src/app/endpoints/streaming_query.py

Comment thread src/utils/markdown_repair.py

Jdubrick added 6 commits June 23, 2026 13:23

add markdown repair function and tests

0114dd0

Signed-off-by: Jordan Dubrick <jdubrick@redhat.com>

add storing of partial text for interrupt handling

215308f

Signed-off-by: Jordan Dubrick <jdubrick@redhat.com>

add renderable token to interrupted event streams

2a5f579

Signed-off-by: Jordan Dubrick <jdubrick@redhat.com>

track chunk id for interrupted response

485e520

Signed-off-by: Jordan Dubrick <jdubrick@redhat.com>

update interrupted query response message

4fda381

Signed-off-by: Jordan Dubrick <jdubrick@redhat.com>

update fence closure to match common mark spec

13718e0

Signed-off-by: Jordan Dubrick <jdubrick@redhat.com>

Jdubrick force-pushed the interrupt-message-persistence branch from 969f5d6 to 13718e0 Compare June 23, 2026 17:23

coderabbitai Bot reviewed Jun 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RHIDP-12952: persist interrupted conversation#1971

RHIDP-12952: persist interrupted conversation#1971
Jdubrick wants to merge 6 commits into
lightspeed-core:mainfrom
Jdubrick:interrupt-message-persistence

Jdubrick commented Jun 22, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 22, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Jdubrick commented Jun 22, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 23, 2026

Uh oh!

coderabbitai Bot Jun 23, 2026

Uh oh!

coderabbitai Bot Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Jdubrick commented Jun 22, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Tools used to create PR

Related Tickets & Documents

Checklist before requesting a review

Testing

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Jdubrick commented Jun 22, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Jdubrick commented Jun 22, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 22, 2026 •

edited

Loading