Skip to content

Implement wire-byte stability (proposal 0047)#145

Merged
chris-colinsky merged 2 commits into
mainfrom
feature/0047-wire-byte-stability
Jun 10, 2026
Merged

Implement wire-byte stability (proposal 0047)#145
chris-colinsky merged 2 commits into
mainfrom
feature/0047-wire-byte-stability

Conversation

@chris-colinsky

Copy link
Copy Markdown
Member

Summary

  • Adds _canonicalize_dict_keys(value) — recursive dict-key sort, list-order preserved per spec Q5 — applied at every user-supplied-dict boundary in the OpenAI provider's wire body: tool.parameters (with the surrounding function record per Q5), response_format.json_schema.schema, RuntimeConfig extras, and the JSON encoding of tool_call.arguments. A top-level belt-and-suspenders pass over the assembled body catches anything the per-field passes miss.
  • Equivalent OA inputs now produce byte-identical wire output regardless of dict insertion order, so prefix-cache hits land consistently against vLLM (--enable-prefix-caching), OpenAI hosted prompt caching, llama.cpp, and other APC-implementing servers.
  • Pinned by 8 new unit tests on the provider (per-boundary permutation tests, list-order-preserved counter-test, body-level top-key sort assertion, image content-block stability, direct canonicalizer unit test) and 2 substring-stability tests on the prompt-management path (TextPrompt + ChatPrompt variants).
  • New docs/concepts/prompts.md section Prefix-cache friendly authoring (APC) explains the contract, OA's role, the spec's five informative authoring patterns, and a vLLM debugging callout per Q3 ack.

Closes proposal 0047 end-to-end

Piece Where it landed
Response.usage.cached_tokens / cache_creation_tokens fields + OpenAI provider sources from prompt_tokens_details v0.12.0
OTel observer emits openarmature.llm.cache_read.input_tokens / openarmature.llm.cache_creation.input_tokens v0.12.0
§8 intra-impl wire-byte canonicalization this PR (v0.13.0)
§13 cross-variable substring stability this PR (verified by tests; satisfied by existing Jinja2 strict-undefined path)

conformance.toml flips [proposals."0047"] from not-yet to implemented since = "0.13.0"; leading-comment block describes the three-piece deliverable.

Behavior change worth flagging

tool_call.arguments JSON encoding now uses sort_keys=True. Functionally equivalent (parses to the same dict) but byte-different from the previous insertion-order encoding. Anything downstream comparing encoded-string equality across cycles would observe the shift.

Scope

Chat Completions endpoint only. The OpenAI Responses API endpoint and the Anthropic / Gemini wire-format mappings stay deferred (no python consumer today). Anthropic fixture llm-provider/055-anthropic-wire-byte-stability stays parser-deferred per Q1 ack.

Test plan

  • uv run pytest tests/ — 1235 passed (was 1232; +3 from the review round, +5 from the initial implementation).
  • uv run pyright clean.
  • uv run ruff check + ruff format --check clean.
  • Wire-byte tests: uv run pytest tests/unit/test_llm_provider.py -k wire_byte -v.
  • Substring stability: uv run pytest tests/unit/test_prompts.py -k cross_variable_substring -v.

Add intra-impl wire-byte stability to the OpenAI provider so
equivalent OA inputs produce byte-identical wire output regardless
of dict insertion order. A new ``_canonicalize_dict_keys`` helper
recursively sorts dict keys at every nesting level while preserving
caller-supplied array ordering (the spec's split: object keys are
sorted, array order is caller-controlled).

The helper applies at four user-supplied-dict boundaries: tool
definitions (the ``function`` record top-level plus the parameters
JSON Schema), ``response_format.json_schema.schema``, RuntimeConfig
extras, and the JSON encoding of ``tool_call.arguments``. A top-
level belt-and-suspenders pass over the assembled body catches
anything the per-field passes miss.

Closes proposal 0047 end-to-end: pieces 1 and 2 (Response.usage
cache fields sourced from prompt_tokens_details + OTel observer
emits the cache attributes) landed in v0.12.0; this is piece 3.
Prompt-management §13 cross-variable substring stability is
satisfied by the existing Jinja2 strict-undefined render path on
both TextPrompt and ChatPrompt; pinned by new tests.

A new ``docs/concepts/prompts.md`` section explains APC, what OA
handles for users (wire-byte canonicalization, deterministic
rendering), what users own (the spec's five informative authoring
patterns), and a vLLM debugging callout for the cache-attribute-
not-appearing case (server-side ``--enable-prefix-caching`` plus
``--enable-prompt-tokens-details``).

Scope is the Chat Completions endpoint only. The OpenAI Responses
API endpoint and the Anthropic / Gemini wire-format mappings are
deferred (no python consumer today).

Behavior change worth flagging: ``tool_call.arguments`` JSON
encoding now uses ``sort_keys=True``. Functionally equivalent
(parses to the same dict) but byte-different from the previous
insertion-order encoding.
Copilot AI review requested due to automatic review settings June 9, 2026 23:58

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements proposal 0047’s determinism requirements by making the OpenAI Chat Completions request body byte-stable for equivalent inputs (regardless of dict insertion order), and pins prompt-management §13 cross-variable substring stability with unit tests.

Changes:

  • Add recursive dict-key canonicalization in the OpenAI provider (including tool_call.arguments JSON string encoding) and apply a final full-body canonicalization pass.
  • Add wire-byte stability tests covering tool schemas, response schemas, runtime extras, tool-call arguments, list-order preservation, and image content blocks.
  • Add prompt substring-stability tests and documentation describing APC/prefix-cache-friendly authoring.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/openarmature/llm/providers/openai.py Canonicalizes user-supplied dicts and tool-call argument encoding to produce byte-stable request bodies.
tests/unit/test_llm_provider.py Adds unit tests asserting wire-byte stability across dict permutations and preserving list order.
tests/unit/test_prompts.py Adds tests pinning cross-variable substring stability for text and chat prompts.
docs/concepts/prompts.md Documents APC/prefix-cache-friendly authoring guidance and debugging notes.
conformance.toml Marks proposal 0047 as implemented since 0.13.0 and documents the deliverables.
CHANGELOG.md Adds an unreleased changelog entry describing proposal 0047 completion.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread CHANGELOG.md Outdated
Comment thread conformance.toml
Two dead-pointer fixes flagged by CoPilot, both review-round-rename
casualties:

1. CHANGELOG entry referenced ``_canonicalize_json_schema``; the
   helper was renamed to ``_canonicalize_dict_keys`` because it
   canonicalizes every user-supplied dict on the wire, not just
   JSON Schemas.

2. ``conformance.toml`` 0047 leading-comment block pointed at
   ``test_cross_variable_substring_stability``; that test got
   split into ``..._text_prompt`` and ``..._chat_prompt`` when
   coverage extended to the ChatPrompt variant.
@chris-colinsky chris-colinsky merged commit a6b6f26 into main Jun 10, 2026
6 checks passed
@chris-colinsky chris-colinsky deleted the feature/0047-wire-byte-stability branch June 10, 2026 00:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants