Implement wire-byte stability (proposal 0047) by chris-colinsky · Pull Request #145 · LunarCommand/openarmature-python

chris-colinsky · 2026-06-09T23:58:42Z

Summary

Adds _canonicalize_dict_keys(value) — recursive dict-key sort, list-order preserved per spec Q5 — applied at every user-supplied-dict boundary in the OpenAI provider's wire body: tool.parameters (with the surrounding function record per Q5), response_format.json_schema.schema, RuntimeConfig extras, and the JSON encoding of tool_call.arguments. A top-level belt-and-suspenders pass over the assembled body catches anything the per-field passes miss.
Equivalent OA inputs now produce byte-identical wire output regardless of dict insertion order, so prefix-cache hits land consistently against vLLM (--enable-prefix-caching), OpenAI hosted prompt caching, llama.cpp, and other APC-implementing servers.
Pinned by 8 new unit tests on the provider (per-boundary permutation tests, list-order-preserved counter-test, body-level top-key sort assertion, image content-block stability, direct canonicalizer unit test) and 2 substring-stability tests on the prompt-management path (TextPrompt + ChatPrompt variants).
New docs/concepts/prompts.md section Prefix-cache friendly authoring (APC) explains the contract, OA's role, the spec's five informative authoring patterns, and a vLLM debugging callout per Q3 ack.

Closes proposal 0047 end-to-end

Piece	Where it landed
`Response.usage.cached_tokens` / `cache_creation_tokens` fields + OpenAI provider sources from `prompt_tokens_details`	v0.12.0
OTel observer emits `openarmature.llm.cache_read.input_tokens` / `openarmature.llm.cache_creation.input_tokens`	v0.12.0
§8 intra-impl wire-byte canonicalization	this PR (v0.13.0)
§13 cross-variable substring stability	this PR (verified by tests; satisfied by existing Jinja2 strict-undefined path)

conformance.toml flips [proposals."0047"] from not-yet to implemented since = "0.13.0"; leading-comment block describes the three-piece deliverable.

Behavior change worth flagging

tool_call.arguments JSON encoding now uses sort_keys=True. Functionally equivalent (parses to the same dict) but byte-different from the previous insertion-order encoding. Anything downstream comparing encoded-string equality across cycles would observe the shift.

Scope

Chat Completions endpoint only. The OpenAI Responses API endpoint and the Anthropic / Gemini wire-format mappings stay deferred (no python consumer today). Anthropic fixture llm-provider/055-anthropic-wire-byte-stability stays parser-deferred per Q1 ack.

Test plan

uv run pytest tests/ — 1235 passed (was 1232; +3 from the review round, +5 from the initial implementation).
uv run pyright clean.
uv run ruff check + ruff format --check clean.
Wire-byte tests: uv run pytest tests/unit/test_llm_provider.py -k wire_byte -v.
Substring stability: uv run pytest tests/unit/test_prompts.py -k cross_variable_substring -v.

Add intra-impl wire-byte stability to the OpenAI provider so equivalent OA inputs produce byte-identical wire output regardless of dict insertion order. A new ``_canonicalize_dict_keys`` helper recursively sorts dict keys at every nesting level while preserving caller-supplied array ordering (the spec's split: object keys are sorted, array order is caller-controlled). The helper applies at four user-supplied-dict boundaries: tool definitions (the ``function`` record top-level plus the parameters JSON Schema), ``response_format.json_schema.schema``, RuntimeConfig extras, and the JSON encoding of ``tool_call.arguments``. A top- level belt-and-suspenders pass over the assembled body catches anything the per-field passes miss. Closes proposal 0047 end-to-end: pieces 1 and 2 (Response.usage cache fields sourced from prompt_tokens_details + OTel observer emits the cache attributes) landed in v0.12.0; this is piece 3. Prompt-management §13 cross-variable substring stability is satisfied by the existing Jinja2 strict-undefined render path on both TextPrompt and ChatPrompt; pinned by new tests. A new ``docs/concepts/prompts.md`` section explains APC, what OA handles for users (wire-byte canonicalization, deterministic rendering), what users own (the spec's five informative authoring patterns), and a vLLM debugging callout for the cache-attribute- not-appearing case (server-side ``--enable-prefix-caching`` plus ``--enable-prompt-tokens-details``). Scope is the Chat Completions endpoint only. The OpenAI Responses API endpoint and the Anthropic / Gemini wire-format mappings are deferred (no python consumer today). Behavior change worth flagging: ``tool_call.arguments`` JSON encoding now uses ``sort_keys=True``. Functionally equivalent (parses to the same dict) but byte-different from the previous insertion-order encoding.

Copilot

Pull request overview

Implements proposal 0047’s determinism requirements by making the OpenAI Chat Completions request body byte-stable for equivalent inputs (regardless of dict insertion order), and pins prompt-management §13 cross-variable substring stability with unit tests.

Changes:

Add recursive dict-key canonicalization in the OpenAI provider (including tool_call.arguments JSON string encoding) and apply a final full-body canonicalization pass.
Add wire-byte stability tests covering tool schemas, response schemas, runtime extras, tool-call arguments, list-order preservation, and image content blocks.
Add prompt substring-stability tests and documentation describing APC/prefix-cache-friendly authoring.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`src/openarmature/llm/providers/openai.py`	Canonicalizes user-supplied dicts and tool-call argument encoding to produce byte-stable request bodies.
`tests/unit/test_llm_provider.py`	Adds unit tests asserting wire-byte stability across dict permutations and preserving list order.
`tests/unit/test_prompts.py`	Adds tests pinning cross-variable substring stability for text and chat prompts.
`docs/concepts/prompts.md`	Documents APC/prefix-cache-friendly authoring guidance and debugging notes.
`conformance.toml`	Marks proposal 0047 as implemented since 0.13.0 and documents the deliverables.
`CHANGELOG.md`	Adds an unreleased changelog entry describing proposal 0047 completion.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Two dead-pointer fixes flagged by CoPilot, both review-round-rename casualties: 1. CHANGELOG entry referenced ``_canonicalize_json_schema``; the helper was renamed to ``_canonicalize_dict_keys`` because it canonicalizes every user-supplied dict on the wire, not just JSON Schemas. 2. ``conformance.toml`` 0047 leading-comment block pointed at ``test_cross_variable_substring_stability``; that test got split into ``..._text_prompt`` and ``..._chat_prompt`` when coverage extended to the ChatPrompt variant.

Copilot AI review requested due to automatic review settings June 9, 2026 23:58

Copilot started reviewing on behalf of chris-colinsky June 9, 2026 23:58 View session

Copilot AI reviewed Jun 10, 2026

View reviewed changes

Comment thread CHANGELOG.md Outdated

Comment thread conformance.toml

chris-colinsky merged commit a6b6f26 into main Jun 10, 2026
6 checks passed

chris-colinsky deleted the feature/0047-wire-byte-stability branch June 10, 2026 00:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement wire-byte stability (proposal 0047)#145

Implement wire-byte stability (proposal 0047)#145
chris-colinsky merged 2 commits into
mainfrom
feature/0047-wire-byte-stability

chris-colinsky commented Jun 9, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

chris-colinsky commented Jun 9, 2026

Summary

Closes proposal 0047 end-to-end

Behavior change worth flagging

Scope

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants