Skip to content

Commit ddc99f2

Browse files
feat(prompts): prompt-management core (proposal 0017) (#45)
* feat(prompts): error classes and category constants Establishes the prompt-management subpackage with the three canonical error categories from spec §10: - PromptNotFound (non-transient): no prompt matches (name, label). - PromptRenderError (non-transient): undefined variable, template parse error, or variable-coercion failure. - PromptStoreUnavailable (transient): backend infrastructure failure (network, I/O, vendor API). Exports PROMPT_TRANSIENT_CATEGORIES mirroring the TRANSIENT_CATEGORIES frozenset in openarmature.llm.errors, so retry-middleware classifiers can identify transient prompt-management failures by category. * feat(prompts): Prompt, PromptResult, PromptGroup types Pydantic models for the prompt-management capability shapes from spec §3, §4, and §9. Prompt carries the raw template source string plus identity metadata (name, version, label, template_hash, fetched_at, optional metadata). The raw-string representation keeps Prompt serializable and engine-agnostic; compilation happens on render. PromptResult propagates identity from the source Prompt and carries the rendered messages list (compatible with openarmature.llm.Message and directly consumable by Provider.complete()), the variables used, rendered_hash, and rendered_at. PromptGroup wraps an ordered N>=2 sequence of PromptResult instances with a stable group_name. The validator rejects empty and single-member groups per §9 (single-prompt tagging is already served by per-prompt observability attributes). Hashing helpers compute SHA-256 over UTF-8 bytes (template) and over a canonical JSON serialization with sort_keys + minimal separators (rendered). Both prefixed with 'sha256:' so future algorithm changes are self-describing. * feat(prompts): PromptBackend protocol, PromptManager, jinja2 dep PromptBackend is a runtime-checkable Protocol with a single async fetch(name, label) method, matching the openarmature.llm.Provider pattern. The docstring restates the §5 contract: reentrant, no mutation, raises PromptNotFound / PromptStoreUnavailable, and the rule that cached results MUST preserve the original fetched_at. PromptManager composes one or more PromptBackends and exposes: - fetch: §8 fallback semantics. First successful fetch wins; PromptNotFound STOPS the chain (logical absence MUST NOT silently substitute); PromptStoreUnavailable continues to the next backend; all-exhausted raises PromptStoreUnavailable with the last unavailable chained as __cause__. WARN-level log on each fallback per §8. - render: synchronous string transform via Jinja2 with StrictUndefined per §7. Produces a single UserMessage in v1 (multi-message decomposition deferred). UndefinedError and TemplateError both map to PromptRenderError carrying the prompt's identity + the variables + a description. Pydantic ValidationError on the UserMessage(content=rendered_text) construction (empty-string render case) also maps to PromptRenderError per §10's 'variable's value not coercible' framing. - get: convenience equivalent to render(await fetch(...), variables). Adds jinja2>=3.1 to runtime dependencies. * feat(prompts): FilesystemPromptBackend + OTel attribute propagation FilesystemPromptBackend reads prompts from <root>/<label>/<name>.j2. The subdirectory-per-label layout keeps name-collisions across labels distinct without prefix-escape concerns. version is derived from the first 12 hex chars of the template_hash so two file contents map deterministically to two distinct versions without needing a sidecar metadata file (spec §3 lets backends pick any stable identifier). The docstring notes that future caching backends MUST preserve the original fetched_at on returned Prompts per spec §3. Adds the context-variable propagation mechanism for spec §11 LLM-call span attributes: - openarmature.prompts.context module exposes with_active_prompt(result) and with_active_prompt_group(group) context managers plus current_prompt_result() / current_prompt_group() inspectors. - OTelObserver._on_llm_event reads the two ContextVars at LLM- call span start and surfaces: openarmature.prompt.name openarmature.prompt.version openarmature.prompt.label openarmature.prompt.template_hash openarmature.prompt.rendered_hash openarmature.prompt.group_name - Nesting is innermost-wins (matches Python's natural ContextVar token-stacking behavior; spec §11 doesn't mandate a policy). The attribute names match spec §11's normative list. The mechanism (context variables) is one of the two example mechanisms §11 names; bundling it now keeps the §11 surface discoverable from the moment prompt-management lands. * test(conformance): prompt-management harness and 12 fixtures Adds prompt-management as the fifth conformance capability: - harness/prompt_management.py — typed YAML models for the new fixture shape (backends + manager + calls with target / operation / capture_as, plus per-call and top-level expected blocks for raises / result_equivalence / prompt_group / rendered_hash_equal / rendered_hash_different). - harness/fixtures.py — PromptManagementFixture added to the discriminated union; the discriminator recognizes top-level 'backends:' (without 'mock_provider:') as the prompt-management shape. - harness/loader.py — 'prompt-management' added to CAPABILITIES so test_fixture_parsing.py discovers and parses the new fixtures. test_prompt_management.py drives all 12 spec fixtures (001-fetch-success through 012-prompt-result-rendered-hash-stability) against the real PromptManager + a MockPromptBackend that implements the protocol with optional simulate_unavailable + preloaded prompts + a call_count for fixtures that assert fallback chain visits. All 12 fixtures pass. * test(unit): prompts subpackage + OTel attribute propagation Adds tests/unit/test_prompts.py (25 tests) covering gaps the conformance fixtures don't exercise directly: - error categories match spec §10 strings; PROMPT_TRANSIENT_CATEGORIES contains only prompt_store_unavailable. - error attribute carriage (PromptNotFound name/label/backend, PromptRenderError name/version/label/variables/description). - template_hash / rendered_hash determinism, prefix, and length; divergence for different inputs. - Prompt extra-field rejection; PromptGroup 0/1-member rejection and 2+ acceptance. - PromptManager construction (zero-backend rejection). - Empty-string render output boundary wrap (the spec-agent's concern about Jinja2 cleanly rendering '' but UserMessage rejecting empty content — verified to surface as PromptRenderError). - Identity-field propagation from Prompt to PromptResult on render. - FilesystemPromptBackend disk I/O: success path, missing file raises PromptNotFound, OSError that isn't FileNotFoundError raises PromptStoreUnavailable. - Context-var propagation: with_active_prompt / _prompt_group set + reset, innermost-wins nesting, async-task visibility. - PromptManager fallback gaps: first-match short-circuits later backends; render returns a UserMessage carrying the rendered text. Adds two OTel observer tests under tests/unit/test_observability_otel.py: - Active prompt + active prompt group propagates the six openarmature.prompt.* span attributes (name, version, label, template_hash, rendered_hash, group_name) on the openarmature.llm.complete span. - Without an active prompt, the LLM-call span carries no openarmature.prompt.* attributes. * docs: prompts concept page, API reference, changelog docs/concepts/prompts.md walks through the prompt-management capability: the fetch + render split (and why both, not just get()), Prompt identity fields, strict-by-default variables, composite-backend fallback (PromptStoreUnavailable continues, PromptNotFound stops), the three error categories, PromptGroup for tracing related prompts, observability propagation via with_active_prompt and the six normative openarmature.prompt.* attributes, determinism + content-addressed caching, a minimal example, and what's out of scope (vendor backends, versioning workflows, cache invalidation, multi-message decomposition). docs/reference/prompts.md is an mkdocstrings autodoc page in the same shape as docs/reference/llm.md. mkdocs.yml gains the two new pages in the Concepts and Reference nav sections. CHANGELOG.md adds two entries under [Unreleased]: - the new openarmature.prompts subpackage with PromptManager, the three error categories, FilesystemPromptBackend, and the jinja2>=3.1 runtime dependency. - the observability propagation surface in openarmature.prompts.context plus the OTel observer wiring. * fix: CoPilot review pass on PR #45 - manager.py: hoist Jinja2 Environment to module-level singleton (stateless config; thread-safe for compile + render; avoids re-parsing config on every render call), keep the autoescape-disabled-by-design comment. - errors.py: PromptStoreUnavailable carries optional name / label / backends_tried for operator diagnosability; PromptManager's aggregate raise populates backends_tried with the ordered list of consulted backends. PromptRenderError docstring documents spec §10's non-transient mandate. - backends/filesystem.py: widen the version-prefix length from 12 to 16 hex chars (~64 bits; birthday-paradox boundary at ~4B templates), document the rationale + the wider-prefix / alternative-identifier guidance for higher-scale backends. Also carries name / label on PromptStoreUnavailable raises. - observability/otel/observer.py: hoist prompts.context import to module top-level (no longer optional; cost off the per-event hot path). - harness/fixtures.py: tighten the prompt-management discriminator from `backends:` alone to `backends:` co-occurring with `calls:` AND absence of graph-shape keys; avoids silently misrouting future fixtures that introduce a backends list for some other purpose. - test_prompt_management.py: lift per-call call-count assertions out of the raises branch so they apply on both success and error paths; add internal-consistency check that a fixture's fields_must_match and fields_may_differ sets don't overlap. - test_prompts.py: mock Path.read_text for the OSError-routing test instead of relying on platform-dependent NotADirectoryError behavior; update the version-prefix length assertion to match the widened 16-char prefix. * docs(prompts): drop em dashes from concept page Memory rule: no em dashes in user-facing copy. Reworded the new docs/concepts/prompts.md to use colons, semicolons, parens, or sentence restructuring in place of em dashes. * docs: drop em dashes from llms.md, model-providers/{index,authoring}.md Sweep of leftover em dashes from PR-1/PR-2 docs that slipped past the no-em-dashes-in-user-facing-copy rule. Same substitutions as the prompts.md cleanup (colons, semicolons, parens, or sentence restructuring). * fix: CoPilot review round-2 pass on PR #45 - CHANGELOG.md: update 12 → 16 hex chars to match the widened FilesystemPromptBackend.version derivation. - prompt.py: PromptResult.messages gains Field(min_length=1) so the spec §4 'Ordered non-empty sequence' mandate is enforced at the type boundary, not just by the construction path. - errors.py: PromptStoreUnavailable gains an optional causes list[BaseException] attribute carrying per-backend exceptions index-aligned to backends_tried. - manager.py: aggregate raise populates causes with the per-backend exceptions in fallback order, while keeping the __cause__ chain pointing at the last unavailable for stack-trace continuity. - manager.py: PromptManager carries a per-instance dict[str, jinja2.Template] keyed by template_hash. Render consults the cache and only re-parses on miss. Unbounded for v1 (typical apps have O(10) prompts; an LRU follow-on can land if benchmarks show memory pressure). template_hash is content-derived, so cache invalidation is automatic when a backend returns updated content. - test_prompts.py: new tests for empty-messages rejection and for the compiled-template cache hit behavior. * fix: CoPilot review round-3 pass on PR #45 - harness/prompt_management.py: fix misleading comment on FixtureExpectedRaises.carries (secondary_backend_call_count is a sibling field on FixtureExpectedPerCall, not inside carries). - manager.py: replace 'assert causes' with an explicit 'if not causes: raise RuntimeError(...)' guard so the invariant holds under 'python -O' (asserts stripped) and surfaces as a clear RuntimeError rather than an opaque IndexError if a future change ever silently swallows an exception in the fallback loop. - test_prompts.py: rewrite the active-prompt-in-nested-async-function test to spawn via asyncio.create_task so it actually exercises context-copy across the task boundary, matching the function name's implied claim. The previous form's await ran in the same context where ContextVar propagation is trivially expected.
1 parent 5f6f1e1 commit ddc99f2

26 files changed

Lines changed: 2113 additions & 14 deletions

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,8 @@ The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). The
88

99
### Added
1010

11+
- **Prompt-management capability (proposal 0017, introduced in spec v0.15.0).** New `openarmature.prompts` subpackage. `PromptManager` composes one or more `PromptBackend`s, exposes `fetch` / `render` / `get`, applies the §8 fallback semantics (`prompt_store_unavailable` continues to the next backend; `prompt_not_found` stops the chain), and renders templates with Jinja2's `StrictUndefined` per §7. `Prompt` / `PromptResult` / `PromptGroup` are Pydantic models matching spec §3 / §4 / §9. Three error categories (`PromptNotFound`, `PromptRenderError`, `PromptStoreUnavailable`) with `PROMPT_TRANSIENT_CATEGORIES` exported for retry-middleware classifiers. `FilesystemPromptBackend` is the minimum local-filesystem reference backend (layout: `<root>/<label>/<name>.j2`; `version` derived from the first 16 hex chars of `template_hash`). New runtime dependency: `jinja2>=3.1`.
12+
- **`openarmature.prompts.context` — observability propagation per spec §11.** `with_active_prompt(result)` and `with_active_prompt_group(group)` context managers + `current_prompt_result()` / `current_prompt_group()` inspectors. When the OTel observer is active and an LLM call fires inside `with_active_prompt`, the `openarmature.llm.complete` span carries the normative `openarmature.prompt.*` attributes (`name`, `version`, `label`, `template_hash`, `rendered_hash`, `group_name`). Nesting is innermost-wins.
1113
- **Image content blocks for user messages (proposal 0015, introduced in spec v0.13.0).** `UserMessage.content` now accepts `str | list[ContentBlock]`. The block surface introduces `TextBlock`, `ImageBlock`, `ImageSourceURL`, `ImageSourceInline`, and the `ContentBlock` / `ImageSource` discriminated unions over the block / source `type` field. `ImageBlock` carries a `media_type` (required for inline sources; ignored for URL sources; typed as `str | None` so callers MAY pass any `image/*` type the bound model supports) and an optional `detail` hint (`"auto"` / `"low"` / `"high"`; `None` default omits the field from the wire so providers apply their own default). System, assistant, and tool messages stay text-string-only; image inputs are user-only in v1.
1214
- **`OpenAIProvider` content-array wire mapping.** When `UserMessage.content` is a content-block sequence, the wire body uses OpenAI's `content` array per §8.1.1. `TextBlock → {type: "text", text}`. `ImageBlock` with a URL source maps to `{type: "image_url", image_url: {url, detail?}}`. `ImageBlock` with an inline source constructs an RFC 2397 `data:<media_type>;base64,<base64_data>` URI and goes through the same `image_url` entry shape. Inline bytes pass through unchanged — no inspection, transcoding, or re-encoding.
1315
- **New error category `ProviderUnsupportedContentBlock` (non-transient).** Raised when the bound model rejects a content block type / media variant. Distinct from `ProviderInvalidRequest` (which covers spec-shape malformation): this category surfaces a *capability* mismatch, letting callers route differently (e.g., fall back to a multimodal-capable provider) without overloading the malformed-request category. Carries `block_type` ("image" / "audio" / "video") and `reason` (provider's human-readable message) when those are recoverable from the rejection. `OpenAIProvider` detects content rejection via HTTP 400 bodies — heuristic on `error.code` (known set: `image_content_not_supported`, `unsupported_image_media_type`, `audio_content_not_supported`, etc.), `error.type` (`image_parse_error`), and `error.message` ("does not support" + image/audio/video).

docs/concepts/llms.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -196,7 +196,7 @@ response post-receive against the supplied schema; strict is a
196196
wire-level optimization, not a correctness requirement.
197197

198198
`strict_mode_supported(schema)` (exported from `openarmature.llm`)
199-
performs the deep recursive check. The heuristic is conservative
199+
performs the deep recursive check. The heuristic is conservative:
200200
anything not on the list below trips to `strict: false`:
201201

202202
- Top-level schema is `type: "object"`.
@@ -240,7 +240,7 @@ A text block is the array-form equivalent of a text-string message:
240240
text block is normatively equivalent to one with `content="describe
241241
this"`.
242242

243-
An image block carries one source URL or inline base64 plus an
243+
An image block carries one source (URL or inline base64) plus an
244244
optional `detail` hint:
245245

246246
```python
@@ -302,7 +302,7 @@ fidelity: `"auto"`, `"low"`, or `"high"`. The class default is `None`,
302302
which **omits the field from the wire** and lets the provider apply
303303
its own default (conceptually `"auto"`). Setting `detail="auto"`
304304
explicitly on the spec block forces the wire to carry an explicit
305-
`"auto"`usually unnecessary, since the provider's default is the
305+
`"auto"`, usually unnecessary since the provider's default is the
306306
same value.
307307

308308
### When the model can't handle the block
@@ -324,12 +324,12 @@ provider on this category) compose cleanly against it.
324324
"audio", "video") and `reason` (the provider's human-readable
325325
message) when those are recoverable from the rejection.
326326

327-
`OpenAIProvider` detects content rejection via the response body
327+
`OpenAIProvider` detects content rejection via the response body:
328328
HTTP 400 with an error code like `image_content_not_supported` or a
329329
message like "does not support image inputs." Pre-send capability
330330
checks (failing fast before the wire trip when you know the model
331331
doesn't support images) live above the provider as userland
332-
middleware the provider doesn't ship a static model-capability
332+
middleware; the provider doesn't ship a static model-capability
333333
catalog.
334334

335335
## Routing on parsed fields

docs/concepts/prompts.md

Lines changed: 278 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,278 @@
1+
# Prompts
2+
3+
Named, versioned, content-addressed prompts. OpenArmature's
4+
prompt-management capability separates *fetching* a template
5+
from *rendering* it, lets you compose multiple backends with
6+
explicit fallback, and propagates prompt identity to your
7+
observability backend so trace UIs can pivot on the prompt
8+
that produced a call.
9+
10+
Skip ahead to [a minimal example](#a-minimal-example) if you
11+
want code first.
12+
13+
## The two halves: fetch and render
14+
15+
A `PromptBackend` knows how to find a template by `name` and
16+
`label`; nothing more. A `PromptManager` composes one or more
17+
backends and adds rendering on top:
18+
19+
```python
20+
from openarmature.prompts import PromptManager, FilesystemPromptBackend
21+
22+
manager = PromptManager(FilesystemPromptBackend("./prompts"))
23+
24+
# Fetch returns a Prompt (the raw template + identity metadata).
25+
prompt = await manager.fetch("greeting", "production")
26+
27+
# Render applies variables and returns a PromptResult (the
28+
# rendered messages plus a content-addressed identity).
29+
result = manager.render(prompt, {"user": "Alice"})
30+
31+
# Or do both in one shot:
32+
result = await manager.get("greeting", "production", {"user": "Alice"})
33+
```
34+
35+
Why two operations instead of one? Three reasons:
36+
37+
- **Inspect templates without binding variables.** Schema
38+
validation, prompt diffing, tooling that walks the prompt
39+
catalogue.
40+
- **Cache templates separately from rendered output.** The
41+
fetch step is the I/O step; rendering is pure local
42+
computation.
43+
- **Render the same template with different variables in
44+
tight loops.** Map-reduce over chunks, batch evaluation,
45+
fan-out fixtures.
46+
47+
The convenience `get()` operation gives you the single-call
48+
shape when you want it without removing the separability.
49+
50+
## Prompt identity
51+
52+
Every `Prompt` carries five identity fields:
53+
54+
- `name`: your stable identifier (`"greeting"`).
55+
- `version`: the backend's version string. Implementation-defined:
56+
a backend MAY use semver, monotonic integers, content
57+
hashes, git short-SHAs, or any stable identifier. The
58+
filesystem backend derives it from the template content
59+
hash.
60+
- `label`: the slot the prompt was fetched from
61+
(`"production"`, `"latest"`, `"variant-a"`). The label is
62+
part of the query.
63+
- `template_hash`: SHA-256 of the raw template source.
64+
Two prompts with different content always have different
65+
hashes.
66+
- `fetched_at`: when the prompt was fetched. Cached
67+
backends preserve the original fetch time, not the
68+
cache-hit time.
69+
70+
The `name + version + label` triple identifies the prompt;
71+
the `template_hash` lets you tell two prompts apart by
72+
*content*, which matters when a vendor backend serves
73+
different content under the same `latest` label over time.
74+
75+
A `PromptResult` propagates all of those, plus:
76+
77+
- `rendered_hash`: SHA-256 over the rendered messages.
78+
Same template + same variables → same hash. This is the
79+
cache-key value a memoization layer wants.
80+
- `messages`: the rendered output as an LLM-ready
81+
`list[Message]`. Directly consumable by
82+
`Provider.complete()`.
83+
- `variables`: what was applied. Audit-trail friendly.
84+
- `rendered_at`: when the render happened. Distinct from
85+
`fetched_at`.
86+
87+
## Strict variables by default
88+
89+
A template that references a variable not in the mapping
90+
raises `PromptRenderError`:
91+
92+
```python
93+
prompt = await manager.fetch("greeting", "production") # "Hello, {{ user }}! Today is {{ day }}."
94+
manager.render(prompt, {"user": "Alice"}) # raises: "day" is undefined
95+
```
96+
97+
This is intentional. Silently substituting empty strings for
98+
missing variables masks bugs: a typo'd variable name produces
99+
a working-but-wrong prompt, often invisibly. If you need
100+
lenient behavior, wrap your variables in your own defaulting
101+
layer before passing them to `render()`.
102+
103+
The Python implementation uses Jinja2's `StrictUndefined`.
104+
105+
## Composite backends and fallback
106+
107+
A manager constructed with multiple backends consults them in
108+
order. The fallback rule distinguishes infrastructure failure
109+
from logical absence:
110+
111+
```python
112+
from openarmature.prompts import PromptManager
113+
from openarmature_langfuse import LangfusePromptBackend # hypothetical sibling
114+
115+
manager = PromptManager(
116+
LangfusePromptBackend(api_key=...),
117+
FilesystemPromptBackend("./prompts"), # local fallback
118+
)
119+
```
120+
121+
- **`PromptStoreUnavailable` from a backend → try the next.**
122+
Network's down, vendor API is 5xx-ing, filesystem hiccupped,
123+
so the manager falls back. This is the "Langfuse is degraded,
124+
use the local copy" case.
125+
- **`PromptNotFound` from a backend → STOP the chain.** The
126+
error propagates. This is the "operator deliberately deleted
127+
the prompt from Langfuse to retire it" case; falling back here
128+
would silently resurface a stale local copy under a name the
129+
operator wanted gone.
130+
- **All backends `PromptStoreUnavailable` → manager raises
131+
`PromptStoreUnavailable`.** Everything's down.
132+
133+
The two error categories have different operational
134+
meanings; the manager keeps them separated.
135+
136+
## Errors
137+
138+
Three categories cover every failure mode:
139+
140+
| Error | When | Transient |
141+
| ------------------------- | ------------------------------------------------------------------- | --------- |
142+
| `PromptNotFound` | No prompt matches `(name, label)` in any backend (after §8 rules) | No |
143+
| `PromptRenderError` | Undefined variable, template parse error, coercion failure | No |
144+
| `PromptStoreUnavailable` | Backend infrastructure failure (network, I/O, vendor API) | Yes |
145+
146+
`PROMPT_TRANSIENT_CATEGORIES` is exported as a frozenset for
147+
retry-middleware classifiers, matching the pattern
148+
`openarmature.llm` uses with its `TRANSIENT_CATEGORIES`.
149+
150+
## PromptGroup: tracing related prompts together
151+
152+
A `PromptGroup` is a structural grouping of two or more
153+
`PromptResult` instances under a stable `group_name`. The
154+
group itself doesn't execute anything; it gives observability
155+
a shared name to render related calls under.
156+
157+
```python
158+
from openarmature.prompts import PromptGroup, with_active_prompt_group
159+
160+
classify = await manager.get("classify", variables={"input": user_query})
161+
answer = await manager.get("answer", variables={"input": user_query, ...})
162+
163+
group = PromptGroup(group_name="classifier_chain", members=[classify, answer])
164+
with with_active_prompt_group(group):
165+
# Every LLM call in this scope carries
166+
# openarmature.prompt.group_name="classifier_chain".
167+
classification = await provider.complete(classify.messages, ...)
168+
final = await provider.complete(answer.messages, ...)
169+
```
170+
171+
Canonical patterns the primitive covers:
172+
173+
- **Multi-stage classification**: `[coarse, fine, answer]`.
174+
- **RAG with reranking**: `[query_rewrite, retrieve, rerank, answer]`.
175+
- **Self-correction loops**: `[generate, critique, revise]`.
176+
- **Map-reduce over chunks**: `[chunk_classify_1..N, synthesize]`.
177+
178+
The N=2 case ("classifier + follow-up") is the simplest;
179+
larger groups work under the same primitive. The group rejects
180+
empty and single-member shapes; single-prompt tagging is
181+
already served by the per-prompt observability attributes
182+
below.
183+
184+
## Observability propagation
185+
186+
When an LLM call fires inside `with_active_prompt(result)` (or
187+
`with_active_prompt_group(group)`), the OTel observer surfaces
188+
six normative attributes on the `openarmature.llm.complete`
189+
span:
190+
191+
- `openarmature.prompt.name`
192+
- `openarmature.prompt.version`
193+
- `openarmature.prompt.label`
194+
- `openarmature.prompt.template_hash`
195+
- `openarmature.prompt.rendered_hash`
196+
- `openarmature.prompt.group_name`
197+
198+
Pattern:
199+
200+
```python
201+
result = await manager.get("greeting", "production", {"user": "Alice"})
202+
with with_active_prompt(result):
203+
response = await provider.complete(result.messages, ...)
204+
```
205+
206+
Trace UIs can then pivot on `prompt.name`, filter on
207+
`prompt.template_hash` to find every call that used a given
208+
template version, or surface `prompt.group_name` to group
209+
related calls into a single workflow view.
210+
211+
Nesting is innermost-wins. If you activate a result inside
212+
another active result, the inner one wins for the duration
213+
of the inner block.
214+
215+
## Determinism and content-addressed caching
216+
217+
`render` is deterministic: same `Prompt`, same `variables`
218+
bytewise-identical `messages` and `rendered_hash` across
219+
calls. This is the cache-key contract: `rendered_hash`
220+
gives a downstream memoization layer the right equivalence
221+
relation for free.
222+
223+
Templates MAY reference user-supplied variables that capture
224+
nondeterministic values (`now=datetime.utcnow()`); the
225+
determinism contract applies to the render operation given
226+
fixed inputs, not to user-supplied variable content.
227+
228+
## A minimal example
229+
230+
```python
231+
import asyncio
232+
from pathlib import Path
233+
234+
from openarmature.prompts import FilesystemPromptBackend, PromptManager
235+
236+
237+
async def main() -> None:
238+
manager = PromptManager(FilesystemPromptBackend(Path("./prompts")))
239+
result = await manager.get(
240+
"greeting",
241+
"production",
242+
variables={"user": "Alice"},
243+
)
244+
print(result.messages[0].content) # rendered text
245+
print(result.rendered_hash) # cache key
246+
247+
248+
asyncio.run(main())
249+
```
250+
251+
The filesystem backend layout is
252+
`<root>/<label>/<name>.j2`; for the example above,
253+
`./prompts/production/greeting.j2`.
254+
255+
## What's out of scope (for now)
256+
257+
- **Specific vendor backends**: Langfuse, PromptLayer, etc.,
258+
ship as sibling packages (`openarmature-langfuse`, …). The
259+
core ships the protocol + a filesystem reference.
260+
- **Prompt versioning workflows**: how versions are assigned,
261+
promoted, pinned. Per project. The spec defines the
262+
`version` field; the discipline is yours.
263+
- **Cache invalidation policies**: `template_hash` and
264+
`rendered_hash` are the keys; the cache itself is a
265+
separate concern.
266+
- **Prompt linting / evaluation**: quality checks belong to
267+
separate tools (or the future eval capability).
268+
- **Multi-message render decomposition**: v1 emits a single
269+
`UserMessage` carrying the rendered text. If you need
270+
`system + user` splits, construct the messages list
271+
manually outside `render()` for now.
272+
273+
## Where to next
274+
275+
- **[Model Providers](../model-providers/index.md)**:
276+
what to pass `result.messages` into.
277+
- **[API reference: `openarmature.prompts`](../reference/prompts.md)**:
278+
the full public surface.

docs/model-providers/authoring.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,7 @@ class MyProvider:
6969
response_schema: dict[str, Any] | type[BaseModel] | None = None,
7070
) -> Response:
7171
# response_schema is part of the Protocol; a skeleton provider
72-
# MUST NOT silently ignore it callers expect either
72+
# MUST NOT silently ignore it: callers expect either
7373
# Response.parsed populated or a StructuredOutputInvalid raise.
7474
# Until the wire path is implemented, raise
7575
# ProviderInvalidRequest when response_schema is set. A
@@ -206,8 +206,8 @@ of:
206206
`ImageSourceInline`) are stable across providers; only the wire
207207
shape differs. Provider authors targeting non-multimodal models
208208
MUST surface `ProviderUnsupportedContentBlock` when the request
209-
carries blocks the bound model can't serve pre-send or
210-
post-receive per §7.
209+
carries blocks the bound model can't serve (pre-send or
210+
post-receive per §7).
211211
- **Structured output.** Threading `response_schema` through the
212212
request body (native `response_format` if the underlying wire
213213
supports it; prompt-augmentation fallback otherwise) and validating

docs/model-providers/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -89,7 +89,7 @@ in the LLMs concept page for the multimodal contract; see
8989

9090
`OpenAIProvider` detects unsupported-content-block rejections via
9191
the response body (HTTP 400 with an error code or message indicating
92-
content rejection) a post-receive mapping rather than a static
92+
content rejection): a post-receive mapping rather than a static
9393
pre-send capability check. Pre-send protection is a userland
9494
middleware pattern when callers know the bound model's capabilities
9595
up front.

docs/reference/prompts.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
# openarmature.prompts
2+
3+
::: openarmature.prompts
4+
options:
5+
show_root_heading: false
6+
show_source: false
7+
heading_level: 2

mkdocs.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -95,6 +95,7 @@ nav:
9595
- Composition: concepts/composition.md
9696
- Fan-out: concepts/fan-out.md
9797
- LLMs: concepts/llms.md
98+
- Prompts: concepts/prompts.md
9899
- Observability: concepts/observability.md
99100
- Checkpointing: concepts/checkpointing.md
100101
- Model Providers:
@@ -104,6 +105,7 @@ nav:
104105
- reference/index.md
105106
- openarmature.graph: reference/graph.md
106107
- openarmature.llm: reference/llm.md
108+
- openarmature.prompts: reference/prompts.md
107109
- openarmature.checkpoint: reference/checkpoint.md
108110
- openarmature.observability: reference/observability.md
109111

pyproject.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ dependencies = [
2525
"pydantic>=2.7",
2626
"httpx>=0.27",
2727
"jsonschema>=4.0",
28+
"jinja2>=3.1",
2829
]
2930

3031
[project.optional-dependencies]

0 commit comments

Comments
 (0)