Skip to content

Bump websockets from 15.0.1 to 16.0#4

Closed
dependabot[bot] wants to merge 1 commit into
mainfrom
dependabot/uv/websockets-16.0
Closed

Bump websockets from 15.0.1 to 16.0#4
dependabot[bot] wants to merge 1 commit into
mainfrom
dependabot/uv/websockets-16.0

Conversation

@dependabot

@dependabot dependabot Bot commented on behalf of github Apr 28, 2026

Copy link
Copy Markdown

Bumps websockets from 15.0.1 to 16.0.

Release notes

Sourced from websockets's releases.

16.0

See https://websockets.readthedocs.io/en/stable/project/changelog.html for details.

Commits
  • d4303a5 Release version 16.0.
  • 851bcd7 Bump pypa/cibuildwheel from 3.3.0 to 3.3.1
  • 740c8d3 Temporarily remove the trio implementation.
  • 92ea055 Add missing changelog entry.
  • ba74244 Document bug fix.
  • 9410483 Pin sphinx to avoid error in sphinxcontrib-trio.
  • 8e4d408 Document asyncio's TLS read buffer.
  • cb3500b Stop referring to the asyncio implementation as new.
  • 6563a9c The threading implementation supports max_queue.
  • 9f17e92 Clarify that protocol_mutex protects pending_pings.
  • Additional commits viewable in compare view

@dependabot dependabot Bot added dependencies Pull requests that update a dependency file python:uv Pull requests that update python:uv code labels Apr 28, 2026
@dependabot dependabot Bot force-pushed the dependabot/uv/websockets-16.0 branch from 6ff6c86 to 2ffbc59 Compare May 2, 2026 09:06
Bumps [websockets](https://github.com/python-websockets/websockets) from 15.0.1 to 16.0.
- [Release notes](https://github.com/python-websockets/websockets/releases)
- [Commits](python-websockets/websockets@15.0.1...16.0)

---
updated-dependencies:
- dependency-name: websockets
  dependency-version: '16.0'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot dependabot Bot force-pushed the dependabot/uv/websockets-16.0 branch from 2ffbc59 to 563908e Compare May 5, 2026 09:07
@dependabot @github

dependabot Bot commented on behalf of github May 13, 2026

Copy link
Copy Markdown
Author

Looks like websockets is up-to-date now, so this is no longer needed.

@dependabot dependabot Bot closed this May 13, 2026
@dependabot dependabot Bot deleted the dependabot/uv/websockets-16.0 branch May 13, 2026 13:10
tisnik pushed a commit that referenced this pull request Jun 9, 2026
…ghtspeed-core#1796)

* LCORE-1572: add conversation compaction and wire it into /v1/query

Introduce runtime conversation compaction (Option A): once a conversation
approaches the model's context window, lightspeed-stack summarizes older
turns and owns the LLM context itself instead of letting Llama Stack reload
the full history.

- src/utils/conversation_compaction.py: apply_compaction() async generator
  and apply_compaction_blocking() wrapper. Holds a per-conversation lock
  (R11), estimates tokens (LCORE-1569), partitions and summarizes old turns
  (LCORE-1570), writes the summary into the conversation as a marker item,
  and rebuilds the request as explicit input (summaries + recent verbatim
  turns + new query). Marker items track the boundary; the conversation_id
  is preserved and the full history stays in Llama Stack items for audit.
- models/common/responses/responses_api_params.py: omit_conversation flag so
  the conversation parameter is dropped from the request body in compacted
  mode while remaining on the object for identity.
- configuration.py: AppConfig.compaction accessor.
- app/endpoints/query.py: apply compaction after preparing params; in
  compacted mode store the completed turn against the original user query
  (the conversation parameter is no longer sent, so Llama Stack does not
  persist the turn automatically).

Background: the spec's original marker-keeps-conversation-parameter approach
was found unimplementable on llama-stack 0.6.0, which always reloads the full
conversation history when the conversation parameter is set. This restores
the spike's original explicit-input approach.

* LCORE-1572: unit tests for conversation compaction core and /v1/query

Cover marker detection and boundary selection, explicit-input assembly, the
trigger threshold, the disabled / no-context-window / existing-marker /
triggered paths of apply_compaction, the streaming CompactionStartedEvent
ordering, and compacted-turn storage.

* LCORE-1572: apply conversation compaction in the A2A endpoint

The A2A executor uses the same prepare_responses_params + Responses API
flow as /v1/query and persists conversation_id for multi-turn contexts, so
it accumulates context and must compact too.

- Run apply_compaction_blocking before responses.create (A2A is not a
  browser SSE stream, so no progress event is emitted).
- In compacted mode, persist the completed turn from the response.completed
  stream event, since the conversation parameter is no longer sent and Llama
  Stack therefore does not store the turn automatically.

* LCORE-1572: apply conversation compaction in the streaming_query endpoint

Stream /v1/streaming_query through the compaction-aware path only when the
conversation actually compacts, so non-compacting requests are unaffected
(byte-for-byte the existing flow, including HTTP error handling).

- conversation_compaction: add needs_compaction_path(), a cheap pre-stream
  predicate (no LLM, no lock) that is true only when the conversation already
  has a summary marker or would trigger a new compaction.
- streaming_query: when the predicate is true, stream via the new
  generate_response_with_compaction(), which emits the compaction progress
  event before the summarization LLM call (R12) and creates the response
  inside the stream, surfacing create-time errors as SSE error events.
  generate_response gains emit_start/compacted parameters and, in compacted
  mode, appends the completed turn to the conversation (the conversation
  parameter is not sent, so Llama Stack does not store it automatically).
- a2a: silence too-many-lines after the earlier compaction wiring.

* LCORE-1572: tests for the streaming compaction gate

Cover needs_compaction_path: disabled, existing-marker, over-threshold, and
under-threshold — the gate that keeps non-compacting requests on the
unchanged streaming path.

* LCORE-1572: apply conversation compaction in the /v1/responses endpoint

/v1/responses is the OpenAI-compatible Responses API, so compaction is
silent: no custom SSE event is injected (preserving wire compatibility) and
create-time error handling is unchanged. Summarization runs before the
response is created, on both the streaming and non-streaming paths.

- responses_endpoint_handler: run apply_compaction_blocking before the
  streaming/non-streaming split, gated to stateful single-conversation
  requests (store=True, a conversation present, no previous_response_id).
- ResponsesContext: carry compacted_original_input so the finalization can
  store the turn against the original user input.
- _append_previous_response_turn: generalized to also append the turn in
  compacted mode (the conversation parameter is dropped, so Llama Stack does
  not store the turn automatically) using the original input.

* LCORE-1572: tests for /v1/responses compacted-turn storage

Verify _append_previous_response_turn stores the turn against the original
input in compacted mode, and stores nothing when store is disabled.

* LCORE-1572: update spec doc to the as-built compaction design

Revise R10, R12, the architecture flow, the changed-request-flow section, and
the implementation guidance to match what was built: in compacted mode
lightspeed-stack builds explicit input and omits the Llama Stack conversation
parameter (which always reloads full history), preserving conversation_id and
the full item history. Record the redesign and the four affected endpoints
(query, streaming_query, A2A, /v1/responses) in a new Changelog section.

* LCORE-1572: fix needs_compaction_path docstring (pydocstyle D400)

* LCORE-1572: build compacted input as typed messages (silence Pydantic warning)

The explicit compacted input was assembled as plain dicts, which produced
PydanticSerializationUnexpectedValue warnings when ResponsesApiParams was
dumped (its input field is typed ResponseInput). Build the summary, recent
verbatim, and query items as typed OpenAIResponseMessage objects instead.

Verified end-to-end against a live stack: the serializer warning is gone and
compaction still triggers, preserves conversation identity, and recalls
earlier context correctly.

* LCORE-1572: raise instead of assert on the drained compaction result

apply_compaction_blocking asserted that the generator yielded a result. Under
python -O asserts are stripped, so the guard would vanish and a None result
could propagate to callers. Replace it with an explicit None check that raises
RuntimeError.

Clears a GitHub code-scanning (CodeQL) "use of assert" finding. The repository's
Bandit configuration skips B101, so this only surfaced via code scanning, not
the Bandit CI job.

* LCORE-1572: wire persisted recursive fold (R3) via the summary cache

Make the conversation summary cache the preferred source of truth for
compaction summaries and the home of the persisted recursive fold.

- apply_compaction / apply_compaction_blocking gain cache + user_id +
  skip_user_id_check. Summaries are read from the cache (get_summaries) and each
  new chunk is written to it (store_summary); the Llama Stack marker texts remain
  an authoritative fallback when no persisting cache is configured (marker-only
  mode, additive summaries, no fold).
- When the persisted summaries themselves exceed the threshold, they are folded
  via recursively_resummarize and the fold is persisted with replace_summaries,
  so it is computed once and reused rather than recomputed per request (R3).
- configured_conversation_cache() resolves the configured cache (or None) for
  the endpoints.
- Wired into /v1/query, /v1/streaming_query, and /v1/responses. The A2A executor
  stays marker-only: it has no resolved user_id for the (user_id, conversation_id)
  cache key.

Adds 7 unit tests: cache-preferred reads, store-on-compaction, fold trigger and
persistence, no-fold-without-cache, marker fallback, and the cache resolver.

* LCORE-1572: address CodeRabbit review — list-form input tokens + clarity rename

- Count tokens for list-form ResponseInput (e.g. /v1/responses), not only the
  string form, so compaction is not skipped on large item-list inputs that
  could otherwise still hit HTTP 413. Adds _estimate_response_input_tokens and a
  regression test.
- Rename CompactionResult.summarized to compacted: the flag means "served in
  compacted / explicit-input mode" (set whenever the conversation has any
  summary, reused or fresh), not "a summary was created this request". The old
  name caused reviewer confusion about turn-persistence gating, which is correct
  as written.

* LCORE-1572: persist compacted streaming turns with structure (CodeRabbit #4)

In compacted mode the streaming endpoint persisted the completed turn as
flattened strings via append_turn_to_conversation, dropping attachments and
non-text output items, and double-storing for shield-blocked requests. Persist
the structured turn instead:

- Capture the response's structured output items onto TurnSummary.output_items
  (set at response.completed, and to the refusal item on a shield block).
- generate_response now takes original_input and persists via store_compacted_turn
  with the original input plus structured output items, matching the /v1/query
  and A2A paths.
- The shield-blocked branch no longer stores the turn when the conversation
  parameter was omitted (compacted mode); generate_response stores it once with
  the correct original input, avoiding the duplicate refusal turn.

Adds tests for the structured compacted persistence and the shield dedup
(compacted and non-compacted).

* LCORE-1572: do not initialize the conversation cache when compaction is disabled

configured_conversation_cache() is evaluated eagerly as a call argument in the
query endpoint, so it ran on every request and accessed
configuration.conversation_cache unconditionally — forcing the (SQLite) cache to
initialize even when compaction is disabled. On configurations whose cache file
could not be opened that raised and returned HTTP 500, which failed the e2e
suites (where compaction is off). Return None without touching the cache when
compaction is disabled; the cache is only used by compaction on this path.

Adds a regression test.

* LCORE-1572: address CodeRabbit round 2 (compacted-mode persistence edges)

Follow-ups to the streaming-persistence work, all for non-happy-path terminals in
compacted mode (conversation parameter omitted), so the persisted turn uses the
original user input + structured output rather than the explicit rewrite or
flattened strings:

- /v1/responses: shield-blocked turns persist against compacted_original_input,
  not api_params.input (the explicit rewrite).
- streaming: interrupted (CancelledError) turns thread original_input through the
  interrupt callback and persist structured items, fixing the wrong-input storage
  and the cast(str, input) break on list inputs.
- streaming: capture output_items on response.failed / response.incomplete
  terminals too, not only response.completed, so compacted persistence keeps
  partial output.
- TurnSummary.output_items typed as list[OpenAIResponseOutput] instead of list[Any].

Also documents that disabling compaction mid-conversation on an already-compacted
conversation reverts it to full-history replay (unsupported transition); the
enabled flag stays a full off-switch (CodeRabbit E, declined by design).

Adds unit tests for the blocked /responses path, the interrupted compacted path,
and output_items capture on a failed terminal.

* LCORE-1572: document the disable-after-compaction limitation in the spec doc (CodeRabbit E)

* LCORE-1572: document as-built divergences in spec doc (cache source-of-truth, persisted fold)

The spec still described the earlier design (cache as a parallel/best-effort
layer, markers as the summary source). Update Summary storage, Additive
summarization, and Changed request flow to the as-built design, and add a
Changelog entry: the cache is the preferred source of truth for summaries (marker
texts as fallback + audit/boundary), the recursive fold is persisted via
replace_summaries (in-memory fold rejected), A2A is marker-only, and the
enabled flag stays a full off-switch.

* LCORE-1572: fix line-too-long (C0301) in interrupted-turn test docstring

* LCORE-1572: harden disabled-cache regression test to fail on eager cache access (CodeRabbit)

* LCORE-1572: ref-count per-conversation lock + extract apply_compaction helpers (review)

Addresses two inline review nits from tisnik on the LCORE-1572 PR.

Per-conversation lock cleanup (R11):
  Replace the bare ``dict[str, asyncio.Lock]`` registry with a ref-counted
  ``_LockEntry`` and an ``@asynccontextmanager`` helper guarded by a registry
  mutex. Entries are removed once the last waiter exits, so the registry no
  longer grows unbounded with the set of conversation_ids ever seen by the
  process. Adds tests for serialization, deletion-after-last-release,
  entry-kept-while-waiters-queued, and cleanup-on-cancellation.

apply_compaction refactor:
  Extract five helpers — ``_load_compaction_state``, ``_estimate_total_tokens``,
  ``_persist_new_summary_chunk``, ``_maybe_persist_fold``, ``_compacted_result``
  — leaving the orchestrating generator linear and roughly one screen long.
  The state-loading, token-estimation, persistence-side-effects, and result-
  building concerns are now each named and individually testable.

* LCORE-1572: tighten typed-item handling in compaction helpers (review)

Addresses asimurka's review nit about the dual dict-or-model branches in
``_verbatim_input_message`` and the surrounding token-estimator helpers.

Llama Stack's ``client.conversations.items.list`` returns items as typed
Pydantic models (the ``ItemListResponse`` discriminated union). The dict
branches in ``is_message_item``, ``extract_message_text``,
``estimate_conversation_tokens``, ``format_conversation_for_summary`` and
``_verbatim_input_message`` were defensive code for a shape that never
arrives from production code paths — they only kept the dict-using test
fixtures alive.

Drop the dict branches and tighten the docstrings to state the typed-item
contract. Update the compaction test fixtures (``_msg``, ``_marker``) to
return ``OpenAIResponseMessage`` instances instead of dicts. Remove the
token-estimator and compaction tests that explicitly asserted dict-shape
acceptance; replace with single tests verifying that dicts are now ignored.

* LCORE-1572: soften R12 doc on silent /v1/responses compaction (review)

Addresses asimurka's review note: emitting a compaction event on the
``/v1/responses`` endpoint would itself be spec-compliant under the
OpenResponses extension-events convention, so framing silent compaction as a
forced choice for "wire compatibility" overstated the constraint. Reword R12
and the changelog entry to acknowledge the spec-compliant option and to frame
silent as the *initial* choice, kept to preserve drop-in compatibility with
clients written against the upstream OpenAI Responses API; emitting the event
on this endpoint is left open as a follow-up. Lightspeed's own clients can
already use ``/v1/streaming_query`` to receive the event.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file python:uv Pull requests that update python:uv code title needs formatting

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants