Implement wire-byte stability (proposal 0047)#145
Merged
Conversation
Add intra-impl wire-byte stability to the OpenAI provider so equivalent OA inputs produce byte-identical wire output regardless of dict insertion order. A new ``_canonicalize_dict_keys`` helper recursively sorts dict keys at every nesting level while preserving caller-supplied array ordering (the spec's split: object keys are sorted, array order is caller-controlled). The helper applies at four user-supplied-dict boundaries: tool definitions (the ``function`` record top-level plus the parameters JSON Schema), ``response_format.json_schema.schema``, RuntimeConfig extras, and the JSON encoding of ``tool_call.arguments``. A top- level belt-and-suspenders pass over the assembled body catches anything the per-field passes miss. Closes proposal 0047 end-to-end: pieces 1 and 2 (Response.usage cache fields sourced from prompt_tokens_details + OTel observer emits the cache attributes) landed in v0.12.0; this is piece 3. Prompt-management §13 cross-variable substring stability is satisfied by the existing Jinja2 strict-undefined render path on both TextPrompt and ChatPrompt; pinned by new tests. A new ``docs/concepts/prompts.md`` section explains APC, what OA handles for users (wire-byte canonicalization, deterministic rendering), what users own (the spec's five informative authoring patterns), and a vLLM debugging callout for the cache-attribute- not-appearing case (server-side ``--enable-prefix-caching`` plus ``--enable-prompt-tokens-details``). Scope is the Chat Completions endpoint only. The OpenAI Responses API endpoint and the Anthropic / Gemini wire-format mappings are deferred (no python consumer today). Behavior change worth flagging: ``tool_call.arguments`` JSON encoding now uses ``sort_keys=True``. Functionally equivalent (parses to the same dict) but byte-different from the previous insertion-order encoding.
There was a problem hiding this comment.
Pull request overview
Implements proposal 0047’s determinism requirements by making the OpenAI Chat Completions request body byte-stable for equivalent inputs (regardless of dict insertion order), and pins prompt-management §13 cross-variable substring stability with unit tests.
Changes:
- Add recursive dict-key canonicalization in the OpenAI provider (including
tool_call.argumentsJSON string encoding) and apply a final full-body canonicalization pass. - Add wire-byte stability tests covering tool schemas, response schemas, runtime extras, tool-call arguments, list-order preservation, and image content blocks.
- Add prompt substring-stability tests and documentation describing APC/prefix-cache-friendly authoring.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
src/openarmature/llm/providers/openai.py |
Canonicalizes user-supplied dicts and tool-call argument encoding to produce byte-stable request bodies. |
tests/unit/test_llm_provider.py |
Adds unit tests asserting wire-byte stability across dict permutations and preserving list order. |
tests/unit/test_prompts.py |
Adds tests pinning cross-variable substring stability for text and chat prompts. |
docs/concepts/prompts.md |
Documents APC/prefix-cache-friendly authoring guidance and debugging notes. |
conformance.toml |
Marks proposal 0047 as implemented since 0.13.0 and documents the deliverables. |
CHANGELOG.md |
Adds an unreleased changelog entry describing proposal 0047 completion. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Two dead-pointer fixes flagged by CoPilot, both review-round-rename casualties: 1. CHANGELOG entry referenced ``_canonicalize_json_schema``; the helper was renamed to ``_canonicalize_dict_keys`` because it canonicalizes every user-supplied dict on the wire, not just JSON Schemas. 2. ``conformance.toml`` 0047 leading-comment block pointed at ``test_cross_variable_substring_stability``; that test got split into ``..._text_prompt`` and ``..._chat_prompt`` when coverage extended to the ChatPrompt variant.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
_canonicalize_dict_keys(value)— recursive dict-key sort, list-order preserved per spec Q5 — applied at every user-supplied-dict boundary in the OpenAI provider's wire body:tool.parameters(with the surroundingfunctionrecord per Q5),response_format.json_schema.schema,RuntimeConfigextras, and the JSON encoding oftool_call.arguments. A top-level belt-and-suspenders pass over the assembled body catches anything the per-field passes miss.--enable-prefix-caching), OpenAI hosted prompt caching, llama.cpp, and other APC-implementing servers.docs/concepts/prompts.mdsection Prefix-cache friendly authoring (APC) explains the contract, OA's role, the spec's five informative authoring patterns, and a vLLM debugging callout per Q3 ack.Closes proposal 0047 end-to-end
Response.usage.cached_tokens/cache_creation_tokensfields + OpenAI provider sources fromprompt_tokens_detailsopenarmature.llm.cache_read.input_tokens/openarmature.llm.cache_creation.input_tokensconformance.tomlflips[proposals."0047"]fromnot-yettoimplemented since = "0.13.0"; leading-comment block describes the three-piece deliverable.Behavior change worth flagging
tool_call.argumentsJSON encoding now usessort_keys=True. Functionally equivalent (parses to the same dict) but byte-different from the previous insertion-order encoding. Anything downstream comparing encoded-string equality across cycles would observe the shift.Scope
Chat Completions endpoint only. The OpenAI Responses API endpoint and the Anthropic / Gemini wire-format mappings stay deferred (no python consumer today). Anthropic fixture
llm-provider/055-anthropic-wire-byte-stabilitystays parser-deferred per Q1 ack.Test plan
uv run pytest tests/— 1235 passed (was 1232; +3 from the review round, +5 from the initial implementation).uv run pyrightclean.uv run ruff check+ruff format --checkclean.uv run pytest tests/unit/test_llm_provider.py -k wire_byte -v.uv run pytest tests/unit/test_prompts.py -k cross_variable_substring -v.