Bump websockets from 15.0.1 to 16.0 by dependabot[bot] · Pull Request #4 · tisnik/lightspeed-stack

dependabot · 2026-04-28T00:05:17Z

Bumps websockets from 15.0.1 to 16.0.

Release notes

16.0

See https://websockets.readthedocs.io/en/stable/project/changelog.html for details.

Commits

d4303a5 Release version 16.0.
851bcd7 Bump pypa/cibuildwheel from 3.3.0 to 3.3.1
740c8d3 Temporarily remove the trio implementation.
92ea055 Add missing changelog entry.
ba74244 Document bug fix.
9410483 Pin sphinx to avoid error in sphinxcontrib-trio.
8e4d408 Document asyncio's TLS read buffer.
cb3500b Stop referring to the asyncio implementation as new.
6563a9c The threading implementation supports max_queue.
9f17e92 Clarify that protocol_mutex protects pending_pings.
Additional commits viewable in compare view

Bumps [websockets](https://github.com/python-websockets/websockets) from 15.0.1 to 16.0. - [Release notes](https://github.com/python-websockets/websockets/releases) - [Commits](python-websockets/websockets@15.0.1...16.0) --- updated-dependencies: - dependency-name: websockets dependency-version: '16.0' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>

dependabot · 2026-05-13T13:10:08Z

Looks like websockets is up-to-date now, so this is no longer needed.

…ghtspeed-core#1796) * LCORE-1572: add conversation compaction and wire it into /v1/query Introduce runtime conversation compaction (Option A): once a conversation approaches the model's context window, lightspeed-stack summarizes older turns and owns the LLM context itself instead of letting Llama Stack reload the full history. - src/utils/conversation_compaction.py: apply_compaction() async generator and apply_compaction_blocking() wrapper. Holds a per-conversation lock (R11), estimates tokens (LCORE-1569), partitions and summarizes old turns (LCORE-1570), writes the summary into the conversation as a marker item, and rebuilds the request as explicit input (summaries + recent verbatim turns + new query). Marker items track the boundary; the conversation_id is preserved and the full history stays in Llama Stack items for audit. - models/common/responses/responses_api_params.py: omit_conversation flag so the conversation parameter is dropped from the request body in compacted mode while remaining on the object for identity. - configuration.py: AppConfig.compaction accessor. - app/endpoints/query.py: apply compaction after preparing params; in compacted mode store the completed turn against the original user query (the conversation parameter is no longer sent, so Llama Stack does not persist the turn automatically). Background: the spec's original marker-keeps-conversation-parameter approach was found unimplementable on llama-stack 0.6.0, which always reloads the full conversation history when the conversation parameter is set. This restores the spike's original explicit-input approach. * LCORE-1572: unit tests for conversation compaction core and /v1/query Cover marker detection and boundary selection, explicit-input assembly, the trigger threshold, the disabled / no-context-window / existing-marker / triggered paths of apply_compaction, the streaming CompactionStartedEvent ordering, and compacted-turn storage. * LCORE-1572: apply conversation compaction in the A2A endpoint The A2A executor uses the same prepare_responses_params + Responses API flow as /v1/query and persists conversation_id for multi-turn contexts, so it accumulates context and must compact too. - Run apply_compaction_blocking before responses.create (A2A is not a browser SSE stream, so no progress event is emitted). - In compacted mode, persist the completed turn from the response.completed stream event, since the conversation parameter is no longer sent and Llama Stack therefore does not store the turn automatically. * LCORE-1572: apply conversation compaction in the streaming_query endpoint Stream /v1/streaming_query through the compaction-aware path only when the conversation actually compacts, so non-compacting requests are unaffected (byte-for-byte the existing flow, including HTTP error handling). - conversation_compaction: add needs_compaction_path(), a cheap pre-stream predicate (no LLM, no lock) that is true only when the conversation already has a summary marker or would trigger a new compaction. - streaming_query: when the predicate is true, stream via the new generate_response_with_compaction(), which emits the compaction progress event before the summarization LLM call (R12) and creates the response inside the stream, surfacing create-time errors as SSE error events. generate_response gains emit_start/compacted parameters and, in compacted mode, appends the completed turn to the conversation (the conversation parameter is not sent, so Llama Stack does not store it automatically). - a2a: silence too-many-lines after the earlier compaction wiring. * LCORE-1572: tests for the streaming compaction gate Cover needs_compaction_path: disabled, existing-marker, over-threshold, and under-threshold — the gate that keeps non-compacting requests on the unchanged streaming path. * LCORE-1572: apply conversation compaction in the /v1/responses endpoint /v1/responses is the OpenAI-compatible Responses API, so compaction is silent: no custom SSE event is injected (preserving wire compatibility) and create-time error handling is unchanged. Summarization runs before the response is created, on both the streaming and non-streaming paths. - responses_endpoint_handler: run apply_compaction_blocking before the streaming/non-streaming split, gated to stateful single-conversation requests (store=True, a conversation present, no previous_response_id). - ResponsesContext: carry compacted_original_input so the finalization can store the turn against the original user input. - _append_previous_response_turn: generalized to also append the turn in compacted mode (the conversation parameter is dropped, so Llama Stack does not store the turn automatically) using the original input. * LCORE-1572: tests for /v1/responses compacted-turn storage Verify _append_previous_response_turn stores the turn against the original input in compacted mode, and stores nothing when store is disabled. * LCORE-1572: update spec doc to the as-built compaction design Revise R10, R12, the architecture flow, the changed-request-flow section, and the implementation guidance to match what was built: in compacted mode lightspeed-stack builds explicit input and omits the Llama Stack conversation parameter (which always reloads full history), preserving conversation_id and the full item history. Record the redesign and the four affected endpoints (query, streaming_query, A2A, /v1/responses) in a new Changelog section. * LCORE-1572: fix needs_compaction_path docstring (pydocstyle D400) * LCORE-1572: build compacted input as typed messages (silence Pydantic warning) The explicit compacted input was assembled as plain dicts, which produced PydanticSerializationUnexpectedValue warnings when ResponsesApiParams was dumped (its input field is typed ResponseInput). Build the summary, recent verbatim, and query items as typed OpenAIResponseMessage objects instead. Verified end-to-end against a live stack: the serializer warning is gone and compaction still triggers, preserves conversation identity, and recalls earlier context correctly. * LCORE-1572: raise instead of assert on the drained compaction result apply_compaction_blocking asserted that the generator yielded a result. Under python -O asserts are stripped, so the guard would vanish and a None result could propagate to callers. Replace it with an explicit None check that raises RuntimeError. Clears a GitHub code-scanning (CodeQL) "use of assert" finding. The repository's Bandit configuration skips B101, so this only surfaced via code scanning, not the Bandit CI job. * LCORE-1572: wire persisted recursive fold (R3) via the summary cache Make the conversation summary cache the preferred source of truth for compaction summaries and the home of the persisted recursive fold. - apply_compaction / apply_compaction_blocking gain cache + user_id + skip_user_id_check. Summaries are read from the cache (get_summaries) and each new chunk is written to it (store_summary); the Llama Stack marker texts remain an authoritative fallback when no persisting cache is configured (marker-only mode, additive summaries, no fold). - When the persisted summaries themselves exceed the threshold, they are folded via recursively_resummarize and the fold is persisted with replace_summaries, so it is computed once and reused rather than recomputed per request (R3). - configured_conversation_cache() resolves the configured cache (or None) for the endpoints. - Wired into /v1/query, /v1/streaming_query, and /v1/responses. The A2A executor stays marker-only: it has no resolved user_id for the (user_id, conversation_id) cache key. Adds 7 unit tests: cache-preferred reads, store-on-compaction, fold trigger and persistence, no-fold-without-cache, marker fallback, and the cache resolver. * LCORE-1572: address CodeRabbit review — list-form input tokens + clarity rename - Count tokens for list-form ResponseInput (e.g. /v1/responses), not only the string form, so compaction is not skipped on large item-list inputs that could otherwise still hit HTTP 413. Adds _estimate_response_input_tokens and a regression test. - Rename CompactionResult.summarized to compacted: the flag means "served in compacted / explicit-input mode" (set whenever the conversation has any summary, reused or fresh), not "a summary was created this request". The old name caused reviewer confusion about turn-persistence gating, which is correct as written. * LCORE-1572: persist compacted streaming turns with structure (CodeRabbit #4) In compacted mode the streaming endpoint persisted the completed turn as flattened strings via append_turn_to_conversation, dropping attachments and non-text output items, and double-storing for shield-blocked requests. Persist the structured turn instead: - Capture the response's structured output items onto TurnSummary.output_items (set at response.completed, and to the refusal item on a shield block). - generate_response now takes original_input and persists via store_compacted_turn with the original input plus structured output items, matching the /v1/query and A2A paths. - The shield-blocked branch no longer stores the turn when the conversation parameter was omitted (compacted mode); generate_response stores it once with the correct original input, avoiding the duplicate refusal turn. Adds tests for the structured compacted persistence and the shield dedup (compacted and non-compacted). * LCORE-1572: do not initialize the conversation cache when compaction is disabled configured_conversation_cache() is evaluated eagerly as a call argument in the query endpoint, so it ran on every request and accessed configuration.conversation_cache unconditionally — forcing the (SQLite) cache to initialize even when compaction is disabled. On configurations whose cache file could not be opened that raised and returned HTTP 500, which failed the e2e suites (where compaction is off). Return None without touching the cache when compaction is disabled; the cache is only used by compaction on this path. Adds a regression test. * LCORE-1572: address CodeRabbit round 2 (compacted-mode persistence edges) Follow-ups to the streaming-persistence work, all for non-happy-path terminals in compacted mode (conversation parameter omitted), so the persisted turn uses the original user input + structured output rather than the explicit rewrite or flattened strings: - /v1/responses: shield-blocked turns persist against compacted_original_input, not api_params.input (the explicit rewrite). - streaming: interrupted (CancelledError) turns thread original_input through the interrupt callback and persist structured items, fixing the wrong-input storage and the cast(str, input) break on list inputs. - streaming: capture output_items on response.failed / response.incomplete terminals too, not only response.completed, so compacted persistence keeps partial output. - TurnSummary.output_items typed as list[OpenAIResponseOutput] instead of list[Any]. Also documents that disabling compaction mid-conversation on an already-compacted conversation reverts it to full-history replay (unsupported transition); the enabled flag stays a full off-switch (CodeRabbit E, declined by design). Adds unit tests for the blocked /responses path, the interrupted compacted path, and output_items capture on a failed terminal. * LCORE-1572: document the disable-after-compaction limitation in the spec doc (CodeRabbit E) * LCORE-1572: document as-built divergences in spec doc (cache source-of-truth, persisted fold) The spec still described the earlier design (cache as a parallel/best-effort layer, markers as the summary source). Update Summary storage, Additive summarization, and Changed request flow to the as-built design, and add a Changelog entry: the cache is the preferred source of truth for summaries (marker texts as fallback + audit/boundary), the recursive fold is persisted via replace_summaries (in-memory fold rejected), A2A is marker-only, and the enabled flag stays a full off-switch. * LCORE-1572: fix line-too-long (C0301) in interrupted-turn test docstring * LCORE-1572: harden disabled-cache regression test to fail on eager cache access (CodeRabbit) * LCORE-1572: ref-count per-conversation lock + extract apply_compaction helpers (review) Addresses two inline review nits from tisnik on the LCORE-1572 PR. Per-conversation lock cleanup (R11): Replace the bare ``dict[str, asyncio.Lock]`` registry with a ref-counted ``_LockEntry`` and an ``@asynccontextmanager`` helper guarded by a registry mutex. Entries are removed once the last waiter exits, so the registry no longer grows unbounded with the set of conversation_ids ever seen by the process. Adds tests for serialization, deletion-after-last-release, entry-kept-while-waiters-queued, and cleanup-on-cancellation. apply_compaction refactor: Extract five helpers — ``_load_compaction_state``, ``_estimate_total_tokens``, ``_persist_new_summary_chunk``, ``_maybe_persist_fold``, ``_compacted_result`` — leaving the orchestrating generator linear and roughly one screen long. The state-loading, token-estimation, persistence-side-effects, and result- building concerns are now each named and individually testable. * LCORE-1572: tighten typed-item handling in compaction helpers (review) Addresses asimurka's review nit about the dual dict-or-model branches in ``_verbatim_input_message`` and the surrounding token-estimator helpers. Llama Stack's ``client.conversations.items.list`` returns items as typed Pydantic models (the ``ItemListResponse`` discriminated union). The dict branches in ``is_message_item``, ``extract_message_text``, ``estimate_conversation_tokens``, ``format_conversation_for_summary`` and ``_verbatim_input_message`` were defensive code for a shape that never arrives from production code paths — they only kept the dict-using test fixtures alive. Drop the dict branches and tighten the docstrings to state the typed-item contract. Update the compaction test fixtures (``_msg``, ``_marker``) to return ``OpenAIResponseMessage`` instances instead of dicts. Remove the token-estimator and compaction tests that explicitly asserted dict-shape acceptance; replace with single tests verifying that dicts are now ignored. * LCORE-1572: soften R12 doc on silent /v1/responses compaction (review) Addresses asimurka's review note: emitting a compaction event on the ``/v1/responses`` endpoint would itself be spec-compliant under the OpenResponses extension-events convention, so framing silent compaction as a forced choice for "wire compatibility" overstated the constraint. Reword R12 and the changelog entry to acknowledge the spec-compliant option and to frame silent as the *initial* choice, kept to preserve drop-in compatibility with clients written against the upstream OpenAI Responses API; emitting the event on this endpoint is left open as a follow-up. Lightspeed's own clients can already use ``/v1/streaming_query`` to receive the event.

dependabot Bot added dependencies Pull requests that update a dependency file python:uv Pull requests that update python:uv code labels Apr 28, 2026

github-actions Bot added the title needs formatting label Apr 28, 2026

dependabot Bot force-pushed the dependabot/uv/websockets-16.0 branch from 6ff6c86 to 2ffbc59 Compare May 2, 2026 09:06

dependabot Bot force-pushed the dependabot/uv/websockets-16.0 branch from 2ffbc59 to 563908e Compare May 5, 2026 09:07

dependabot Bot closed this May 13, 2026

dependabot Bot deleted the dependabot/uv/websockets-16.0 branch May 13, 2026 13:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bump websockets from 15.0.1 to 16.0#4

Bump websockets from 15.0.1 to 16.0#4
dependabot[bot] wants to merge 1 commit into
mainfrom
dependabot/uv/websockets-16.0

dependabot Bot commented on behalf of github Apr 28, 2026 •

edited

Loading

Uh oh!

dependabot Bot commented on behalf of github May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Conversation

dependabot Bot commented on behalf of github Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

16.0

Uh oh!

dependabot Bot commented on behalf of github May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

dependabot Bot commented on behalf of github Apr 28, 2026 •

edited

Loading