You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Harden OpenAIProvider readiness probe
Add a ``readiness_probe`` constructor kwarg accepting "models",
"chat_completions", or "both", and flip the default from the older
catalog-only ``GET /v1/models`` probe to a new ``POST /v1/chat/
completions`` probe with ``max_tokens=1``.
Motivation: OpenAI-compatible proxies can return 200 on the catalog
endpoint while rejecting completions (Bifrost is the field-reported
case), so the previous probe reported ready while every real call
failed. The new default actually exercises the inference wire path,
so that failure class surfaces at preflight. Non-200 chat-probe
responses route through ``classify_http_error`` so canonical error
categories surface consistently. Catalog-only behavior remains opt-in
for cost-sensitive cloud callers.
Conformance harness picks ``readiness_probe`` mode from the fixture's
mocked ``health_endpoint.path`` so fixture 007's catalog semantics
keep working without spec changes.
* Tighten readiness probe per CoPilot review
Three findings on PR #109:
1. Runtime guard for ``readiness_probe``. The Literal type is a static
hint; an unknown string would silently no-op both dispatch branches
in ``ready()`` and report ready. Validate in ``__init__`` against a
module-level frozenset and raise ValueError.
2. Route ``_probe_models`` non-200 responses through
``classify_http_error``. Previously hard-coded 401/403 to
ProviderAuthentication and everything-else to ProviderUnavailable,
missing ProviderRateLimit (429), ProviderModelNotLoaded (503+marker),
and ProviderInvalidModel (404+marker). The docstring's
mode-independence claim is now true.
3. Validate ``_probe_chat_completions`` 200 response shape. A proxy
answering 200 with an error payload or non-OpenAI-shape JSON
previously passed the probe. Mirror ``_do_complete``'s parse +
``_parse_response(payload, None, None)`` step.
Adds five new tests covering: invalid mode at construction, catalog
probe 429 → ProviderRateLimit, catalog probe 503+marker →
ProviderModelNotLoaded, chat probe 200 with error payload, and chat
probe 200 with non-JSON body.
Copy file name to clipboardExpand all lines: CHANGELOG.md
+6Lines changed: 6 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,6 +4,12 @@ All notable changes to `openarmature-python` are documented in this file.
4
4
5
5
The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). The package follows [Semantic Versioning](https://semver.org/); pre-1.0 minor bumps may carry behavioral changes per [spec governance](https://github.com/LunarCommand/openarmature-spec/blob/main/GOVERNANCE.md).
6
6
7
+
## [Unreleased]
8
+
9
+
### Changed (breaking)
10
+
11
+
- **`OpenAIProvider.ready()` default probe flipped to `chat_completions`.** A new constructor kwarg `readiness_probe: Literal["models", "chat_completions", "both"]` selects which wire path `ready()` exercises; the default is now the chat-completions path (`POST /v1/chat/completions` with `max_tokens=1`), which actually exercises the inference path. The previous catalog-only behavior is still available as `readiness_probe="models"`, and `readiness_probe="both"` runs catalog then chat for the strongest signal. Motivation: OpenAI-compatible proxies (Bifrost and similar) can return 200 on `GET /v1/models` while rejecting `POST /v1/chat/completions`, leaving the catalog probe green while every real call fails. The new default surfaces that class of failure at preflight rather than at first inference. Non-200 chat-probe responses route through `classify_http_error`, so the canonical error categories (`provider_authentication`, `provider_unavailable`, `provider_invalid_model`, etc.) surface consistently. Callers that depended on the catalog-only behavior (cost-sensitive cloud setups where every `ready()` would now bill prompt tokens) can opt back in by passing `readiness_probe="models"`.
12
+
7
13
## [0.11.0] — 2026-06-01
8
14
9
15
Observability + prompt-management release. The pinned spec advances from v0.27.1 to v0.38.0, absorbing eight accepted proposals (0039-0046). Two headlines: (1) the Langfuse observer grows native `trace.input` / `trace.output` sourcing with caller hooks (0043) and the per-async-context augmentation boundary becomes lineage-aware for nested fan-out / parallel-branches topologies (0045); (2) prompt-management gains a Chat-prompt variant alongside the existing Text-prompt (0046) and `LangfusePromptBackend` lands for both Langfuse text and chat prompts. Caller-supplied `invocation_id` (0039), mid-invocation open-span metadata update (0040), three reserved-key surfaces (0041 + 0042), and the parallel-branches OTel dispatch span (0044) round out the cycle.
Copy file name to clipboardExpand all lines: docs/agent/non-obvious-shapes.md
+4Lines changed: 4 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -80,6 +80,10 @@ A common shape is "after this LLM call, route to either a JSON-extraction node o
80
80
81
81
When the branches operate on different sub-shapes of state — e.g., one path is "extract JSON, then validate" while another is "dispatch tools, loop until done, then summarize" — encapsulate each as a `SubgraphNode` and route from the LLM node to the right subgraph. Each subgraph has its own state schema (projected from the parent), its own entry node, and its own internal topology. The parent graph becomes a switchboard with a few edges; the complexity lives one layer down where it composes cleanly.
82
82
83
+
### `OpenAIProvider.ready()` exercises `chat/completions` by default; opt back into the catalog-only probe for cost-sensitive callers
84
+
85
+
`OpenAIProvider(..., readiness_probe=...)` accepts `"chat_completions"` (default), `"models"`, or `"both"`. The default issues `POST /v1/chat/completions` with a `max_tokens=1` body so a green `ready()` actually proves the inference wire path works, not just that the catalog endpoint answers. The motivating failure class: OpenAI-compatible proxies (Bifrost is the field-reported case) that return 200 on `GET /v1/models` while 405'ing the completions endpoint — the previous catalog-only default reported ready and every real call broke. The `"models"` opt-in is the old behavior, useful for cost-sensitive cloud callers where every `ready()` would otherwise bill one prompt's worth of tokens. `"both"` runs catalog then chat — strongest signal at double the cost. Non-200 responses on either probe route through `classify_http_error`, so the canonical error categories (`ProviderAuthentication`, `ProviderUnavailable`, `ProviderInvalidModel`, etc.) surface consistently regardless of which probe ran.
86
+
83
87
### Be explicit with `tool_choice`; don't trust the provider's default
84
88
85
89
`Provider.complete(messages, tools, tool_choice=...)` accepts `"auto"`, `"required"`, `"none"`, or a `ForceTool(name=...)` record. When you omit `tool_choice`, the OpenAI provider's own default applies — usually `"auto"` when `tools` is non-empty, but documented per-provider. A pipeline that wants deterministic tool-calling (a routing node that MUST produce a tool call, a guarded LLM call that MUST NOT call tools) should pin `tool_choice` explicitly rather than relying on the provider default.
Copy file name to clipboardExpand all lines: src/openarmature/AGENTS.md
+4Lines changed: 4 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -943,6 +943,10 @@ A common shape is "after this LLM call, route to either a JSON-extraction node o
943
943
944
944
When the branches operate on different sub-shapes of state — e.g., one path is "extract JSON, then validate" while another is "dispatch tools, loop until done, then summarize" — encapsulate each as a `SubgraphNode` and route from the LLM node to the right subgraph. Each subgraph has its own state schema (projected from the parent), its own entry node, and its own internal topology. The parent graph becomes a switchboard with a few edges; the complexity lives one layer down where it composes cleanly.
945
945
946
+
### `OpenAIProvider.ready()` exercises `chat/completions` by default; opt back into the catalog-only probe for cost-sensitive callers
947
+
948
+
`OpenAIProvider(..., readiness_probe=...)` accepts `"chat_completions"` (default), `"models"`, or `"both"`. The default issues `POST /v1/chat/completions` with a `max_tokens=1` body so a green `ready()` actually proves the inference wire path works, not just that the catalog endpoint answers. The motivating failure class: OpenAI-compatible proxies (Bifrost is the field-reported case) that return 200 on `GET /v1/models` while 405'ing the completions endpoint — the previous catalog-only default reported ready and every real call broke. The `"models"` opt-in is the old behavior, useful for cost-sensitive cloud callers where every `ready()` would otherwise bill one prompt's worth of tokens. `"both"` runs catalog then chat — strongest signal at double the cost. Non-200 responses on either probe route through `classify_http_error`, so the canonical error categories (`ProviderAuthentication`, `ProviderUnavailable`, `ProviderInvalidModel`, etc.) surface consistently regardless of which probe ran.
949
+
946
950
### Be explicit with `tool_choice`; don't trust the provider's default
947
951
948
952
`Provider.complete(messages, tools, tool_choice=...)` accepts `"auto"`, `"required"`, `"none"`, or a `ForceTool(name=...)` record. When you omit `tool_choice`, the OpenAI provider's own default applies — usually `"auto"` when `tools` is non-empty, but documented per-provider. A pipeline that wants deterministic tool-calling (a routing node that MUST produce a tool call, a guarded LLM call that MUST NOT call tools) should pin `tool_choice` explicitly rather than relying on the provider default.
0 commit comments