Bump websockets from 15.0.1 to 16.0#4
Closed
dependabot[bot] wants to merge 1 commit into
Closed
Conversation
6ff6c86 to
2ffbc59
Compare
Bumps [websockets](https://github.com/python-websockets/websockets) from 15.0.1 to 16.0. - [Release notes](https://github.com/python-websockets/websockets/releases) - [Commits](python-websockets/websockets@15.0.1...16.0) --- updated-dependencies: - dependency-name: websockets dependency-version: '16.0' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>
2ffbc59 to
563908e
Compare
Author
|
Looks like websockets is up-to-date now, so this is no longer needed. |
tisnik
pushed a commit
that referenced
this pull request
Jun 9, 2026
…ghtspeed-core#1796) * LCORE-1572: add conversation compaction and wire it into /v1/query Introduce runtime conversation compaction (Option A): once a conversation approaches the model's context window, lightspeed-stack summarizes older turns and owns the LLM context itself instead of letting Llama Stack reload the full history. - src/utils/conversation_compaction.py: apply_compaction() async generator and apply_compaction_blocking() wrapper. Holds a per-conversation lock (R11), estimates tokens (LCORE-1569), partitions and summarizes old turns (LCORE-1570), writes the summary into the conversation as a marker item, and rebuilds the request as explicit input (summaries + recent verbatim turns + new query). Marker items track the boundary; the conversation_id is preserved and the full history stays in Llama Stack items for audit. - models/common/responses/responses_api_params.py: omit_conversation flag so the conversation parameter is dropped from the request body in compacted mode while remaining on the object for identity. - configuration.py: AppConfig.compaction accessor. - app/endpoints/query.py: apply compaction after preparing params; in compacted mode store the completed turn against the original user query (the conversation parameter is no longer sent, so Llama Stack does not persist the turn automatically). Background: the spec's original marker-keeps-conversation-parameter approach was found unimplementable on llama-stack 0.6.0, which always reloads the full conversation history when the conversation parameter is set. This restores the spike's original explicit-input approach. * LCORE-1572: unit tests for conversation compaction core and /v1/query Cover marker detection and boundary selection, explicit-input assembly, the trigger threshold, the disabled / no-context-window / existing-marker / triggered paths of apply_compaction, the streaming CompactionStartedEvent ordering, and compacted-turn storage. * LCORE-1572: apply conversation compaction in the A2A endpoint The A2A executor uses the same prepare_responses_params + Responses API flow as /v1/query and persists conversation_id for multi-turn contexts, so it accumulates context and must compact too. - Run apply_compaction_blocking before responses.create (A2A is not a browser SSE stream, so no progress event is emitted). - In compacted mode, persist the completed turn from the response.completed stream event, since the conversation parameter is no longer sent and Llama Stack therefore does not store the turn automatically. * LCORE-1572: apply conversation compaction in the streaming_query endpoint Stream /v1/streaming_query through the compaction-aware path only when the conversation actually compacts, so non-compacting requests are unaffected (byte-for-byte the existing flow, including HTTP error handling). - conversation_compaction: add needs_compaction_path(), a cheap pre-stream predicate (no LLM, no lock) that is true only when the conversation already has a summary marker or would trigger a new compaction. - streaming_query: when the predicate is true, stream via the new generate_response_with_compaction(), which emits the compaction progress event before the summarization LLM call (R12) and creates the response inside the stream, surfacing create-time errors as SSE error events. generate_response gains emit_start/compacted parameters and, in compacted mode, appends the completed turn to the conversation (the conversation parameter is not sent, so Llama Stack does not store it automatically). - a2a: silence too-many-lines after the earlier compaction wiring. * LCORE-1572: tests for the streaming compaction gate Cover needs_compaction_path: disabled, existing-marker, over-threshold, and under-threshold — the gate that keeps non-compacting requests on the unchanged streaming path. * LCORE-1572: apply conversation compaction in the /v1/responses endpoint /v1/responses is the OpenAI-compatible Responses API, so compaction is silent: no custom SSE event is injected (preserving wire compatibility) and create-time error handling is unchanged. Summarization runs before the response is created, on both the streaming and non-streaming paths. - responses_endpoint_handler: run apply_compaction_blocking before the streaming/non-streaming split, gated to stateful single-conversation requests (store=True, a conversation present, no previous_response_id). - ResponsesContext: carry compacted_original_input so the finalization can store the turn against the original user input. - _append_previous_response_turn: generalized to also append the turn in compacted mode (the conversation parameter is dropped, so Llama Stack does not store the turn automatically) using the original input. * LCORE-1572: tests for /v1/responses compacted-turn storage Verify _append_previous_response_turn stores the turn against the original input in compacted mode, and stores nothing when store is disabled. * LCORE-1572: update spec doc to the as-built compaction design Revise R10, R12, the architecture flow, the changed-request-flow section, and the implementation guidance to match what was built: in compacted mode lightspeed-stack builds explicit input and omits the Llama Stack conversation parameter (which always reloads full history), preserving conversation_id and the full item history. Record the redesign and the four affected endpoints (query, streaming_query, A2A, /v1/responses) in a new Changelog section. * LCORE-1572: fix needs_compaction_path docstring (pydocstyle D400) * LCORE-1572: build compacted input as typed messages (silence Pydantic warning) The explicit compacted input was assembled as plain dicts, which produced PydanticSerializationUnexpectedValue warnings when ResponsesApiParams was dumped (its input field is typed ResponseInput). Build the summary, recent verbatim, and query items as typed OpenAIResponseMessage objects instead. Verified end-to-end against a live stack: the serializer warning is gone and compaction still triggers, preserves conversation identity, and recalls earlier context correctly. * LCORE-1572: raise instead of assert on the drained compaction result apply_compaction_blocking asserted that the generator yielded a result. Under python -O asserts are stripped, so the guard would vanish and a None result could propagate to callers. Replace it with an explicit None check that raises RuntimeError. Clears a GitHub code-scanning (CodeQL) "use of assert" finding. The repository's Bandit configuration skips B101, so this only surfaced via code scanning, not the Bandit CI job. * LCORE-1572: wire persisted recursive fold (R3) via the summary cache Make the conversation summary cache the preferred source of truth for compaction summaries and the home of the persisted recursive fold. - apply_compaction / apply_compaction_blocking gain cache + user_id + skip_user_id_check. Summaries are read from the cache (get_summaries) and each new chunk is written to it (store_summary); the Llama Stack marker texts remain an authoritative fallback when no persisting cache is configured (marker-only mode, additive summaries, no fold). - When the persisted summaries themselves exceed the threshold, they are folded via recursively_resummarize and the fold is persisted with replace_summaries, so it is computed once and reused rather than recomputed per request (R3). - configured_conversation_cache() resolves the configured cache (or None) for the endpoints. - Wired into /v1/query, /v1/streaming_query, and /v1/responses. The A2A executor stays marker-only: it has no resolved user_id for the (user_id, conversation_id) cache key. Adds 7 unit tests: cache-preferred reads, store-on-compaction, fold trigger and persistence, no-fold-without-cache, marker fallback, and the cache resolver. * LCORE-1572: address CodeRabbit review — list-form input tokens + clarity rename - Count tokens for list-form ResponseInput (e.g. /v1/responses), not only the string form, so compaction is not skipped on large item-list inputs that could otherwise still hit HTTP 413. Adds _estimate_response_input_tokens and a regression test. - Rename CompactionResult.summarized to compacted: the flag means "served in compacted / explicit-input mode" (set whenever the conversation has any summary, reused or fresh), not "a summary was created this request". The old name caused reviewer confusion about turn-persistence gating, which is correct as written. * LCORE-1572: persist compacted streaming turns with structure (CodeRabbit #4) In compacted mode the streaming endpoint persisted the completed turn as flattened strings via append_turn_to_conversation, dropping attachments and non-text output items, and double-storing for shield-blocked requests. Persist the structured turn instead: - Capture the response's structured output items onto TurnSummary.output_items (set at response.completed, and to the refusal item on a shield block). - generate_response now takes original_input and persists via store_compacted_turn with the original input plus structured output items, matching the /v1/query and A2A paths. - The shield-blocked branch no longer stores the turn when the conversation parameter was omitted (compacted mode); generate_response stores it once with the correct original input, avoiding the duplicate refusal turn. Adds tests for the structured compacted persistence and the shield dedup (compacted and non-compacted). * LCORE-1572: do not initialize the conversation cache when compaction is disabled configured_conversation_cache() is evaluated eagerly as a call argument in the query endpoint, so it ran on every request and accessed configuration.conversation_cache unconditionally — forcing the (SQLite) cache to initialize even when compaction is disabled. On configurations whose cache file could not be opened that raised and returned HTTP 500, which failed the e2e suites (where compaction is off). Return None without touching the cache when compaction is disabled; the cache is only used by compaction on this path. Adds a regression test. * LCORE-1572: address CodeRabbit round 2 (compacted-mode persistence edges) Follow-ups to the streaming-persistence work, all for non-happy-path terminals in compacted mode (conversation parameter omitted), so the persisted turn uses the original user input + structured output rather than the explicit rewrite or flattened strings: - /v1/responses: shield-blocked turns persist against compacted_original_input, not api_params.input (the explicit rewrite). - streaming: interrupted (CancelledError) turns thread original_input through the interrupt callback and persist structured items, fixing the wrong-input storage and the cast(str, input) break on list inputs. - streaming: capture output_items on response.failed / response.incomplete terminals too, not only response.completed, so compacted persistence keeps partial output. - TurnSummary.output_items typed as list[OpenAIResponseOutput] instead of list[Any]. Also documents that disabling compaction mid-conversation on an already-compacted conversation reverts it to full-history replay (unsupported transition); the enabled flag stays a full off-switch (CodeRabbit E, declined by design). Adds unit tests for the blocked /responses path, the interrupted compacted path, and output_items capture on a failed terminal. * LCORE-1572: document the disable-after-compaction limitation in the spec doc (CodeRabbit E) * LCORE-1572: document as-built divergences in spec doc (cache source-of-truth, persisted fold) The spec still described the earlier design (cache as a parallel/best-effort layer, markers as the summary source). Update Summary storage, Additive summarization, and Changed request flow to the as-built design, and add a Changelog entry: the cache is the preferred source of truth for summaries (marker texts as fallback + audit/boundary), the recursive fold is persisted via replace_summaries (in-memory fold rejected), A2A is marker-only, and the enabled flag stays a full off-switch. * LCORE-1572: fix line-too-long (C0301) in interrupted-turn test docstring * LCORE-1572: harden disabled-cache regression test to fail on eager cache access (CodeRabbit) * LCORE-1572: ref-count per-conversation lock + extract apply_compaction helpers (review) Addresses two inline review nits from tisnik on the LCORE-1572 PR. Per-conversation lock cleanup (R11): Replace the bare ``dict[str, asyncio.Lock]`` registry with a ref-counted ``_LockEntry`` and an ``@asynccontextmanager`` helper guarded by a registry mutex. Entries are removed once the last waiter exits, so the registry no longer grows unbounded with the set of conversation_ids ever seen by the process. Adds tests for serialization, deletion-after-last-release, entry-kept-while-waiters-queued, and cleanup-on-cancellation. apply_compaction refactor: Extract five helpers — ``_load_compaction_state``, ``_estimate_total_tokens``, ``_persist_new_summary_chunk``, ``_maybe_persist_fold``, ``_compacted_result`` — leaving the orchestrating generator linear and roughly one screen long. The state-loading, token-estimation, persistence-side-effects, and result- building concerns are now each named and individually testable. * LCORE-1572: tighten typed-item handling in compaction helpers (review) Addresses asimurka's review nit about the dual dict-or-model branches in ``_verbatim_input_message`` and the surrounding token-estimator helpers. Llama Stack's ``client.conversations.items.list`` returns items as typed Pydantic models (the ``ItemListResponse`` discriminated union). The dict branches in ``is_message_item``, ``extract_message_text``, ``estimate_conversation_tokens``, ``format_conversation_for_summary`` and ``_verbatim_input_message`` were defensive code for a shape that never arrives from production code paths — they only kept the dict-using test fixtures alive. Drop the dict branches and tighten the docstrings to state the typed-item contract. Update the compaction test fixtures (``_msg``, ``_marker``) to return ``OpenAIResponseMessage`` instances instead of dicts. Remove the token-estimator and compaction tests that explicitly asserted dict-shape acceptance; replace with single tests verifying that dicts are now ignored. * LCORE-1572: soften R12 doc on silent /v1/responses compaction (review) Addresses asimurka's review note: emitting a compaction event on the ``/v1/responses`` endpoint would itself be spec-compliant under the OpenResponses extension-events convention, so framing silent compaction as a forced choice for "wire compatibility" overstated the constraint. Reword R12 and the changelog entry to acknowledge the spec-compliant option and to frame silent as the *initial* choice, kept to preserve drop-in compatibility with clients written against the upstream OpenAI Responses API; emitting the event on this endpoint is left open as a follow-up. Lightspeed's own clients can already use ``/v1/streaming_query`` to receive the event.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bumps websockets from 15.0.1 to 16.0.
Release notes
Sourced from websockets's releases.
Commits
d4303a5Release version 16.0.851bcd7Bump pypa/cibuildwheel from 3.3.0 to 3.3.1740c8d3Temporarily remove the trio implementation.92ea055Add missing changelog entry.ba74244Document bug fix.9410483Pin sphinx to avoid error in sphinxcontrib-trio.8e4d408Document asyncio's TLS read buffer.cb3500bStop referring to the asyncio implementation as new.6563a9cThe threading implementation supports max_queue.9f17e92Clarify that protocol_mutex protects pending_pings.