docs: prompts concept page, API reference, changelog

chris-colinsky · chris-colinsky · commit ad2e89638236 · 2026-05-15T19:07:22.000-07:00
docs/concepts/prompts.md walks through the prompt-management
capability: the fetch + render split (and why both, not just
get()), Prompt identity fields, strict-by-default variables,
composite-backend fallback (PromptStoreUnavailable continues,
PromptNotFound stops), the three error categories, PromptGroup
for tracing related prompts, observability propagation via
with_active_prompt and the six normative openarmature.prompt.*
attributes, determinism + content-addressed caching, a minimal
example, and what's out of scope (vendor backends, versioning
workflows, cache invalidation, multi-message decomposition).

docs/reference/prompts.md is an mkdocstrings autodoc page in
the same shape as docs/reference/llm.md.

mkdocs.yml gains the two new pages in the Concepts and
Reference nav sections.

CHANGELOG.md adds two entries under [Unreleased]:

- the new openarmature.prompts subpackage with PromptManager,
  the three error categories, FilesystemPromptBackend, and the
  jinja2&gt;=3.1 runtime dependency.
- the observability propagation surface in
  openarmature.prompts.context plus the OTel observer wiring.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -8,6 +8,8 @@ The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). The
 
 ### Added
 
+- **Prompt-management capability (proposal 0017, introduced in spec v0.15.0).** New `openarmature.prompts` subpackage. `PromptManager` composes one or more `PromptBackend`s, exposes `fetch` / `render` / `get`, applies the §8 fallback semantics (`prompt_store_unavailable` continues to the next backend; `prompt_not_found` stops the chain), and renders templates with Jinja2's `StrictUndefined` per §7. `Prompt` / `PromptResult` / `PromptGroup` are Pydantic models matching spec §3 / §4 / §9. Three error categories (`PromptNotFound`, `PromptRenderError`, `PromptStoreUnavailable`) with `PROMPT_TRANSIENT_CATEGORIES` exported for retry-middleware classifiers. `FilesystemPromptBackend` is the minimum local-filesystem reference backend (layout: `<root>/<label>/<name>.j2`; `version` derived from the first 12 chars of `template_hash`). New runtime dependency: `jinja2>=3.1`.
+- **`openarmature.prompts.context` — observability propagation per spec §11.** `with_active_prompt(result)` and `with_active_prompt_group(group)` context managers + `current_prompt_result()` / `current_prompt_group()` inspectors. When the OTel observer is active and an LLM call fires inside `with_active_prompt`, the `openarmature.llm.complete` span carries the normative `openarmature.prompt.*` attributes (`name`, `version`, `label`, `template_hash`, `rendered_hash`, `group_name`). Nesting is innermost-wins.
 - **Image content blocks for user messages (proposal 0015, introduced in spec v0.13.0).** `UserMessage.content` now accepts `str | list[ContentBlock]`. The block surface introduces `TextBlock`, `ImageBlock`, `ImageSourceURL`, `ImageSourceInline`, and the `ContentBlock` / `ImageSource` discriminated unions over the block / source `type` field. `ImageBlock` carries a `media_type` (required for inline sources; ignored for URL sources; typed as `str | None` so callers MAY pass any `image/*` type the bound model supports) and an optional `detail` hint (`"auto"` / `"low"` / `"high"`; `None` default omits the field from the wire so providers apply their own default). System, assistant, and tool messages stay text-string-only; image inputs are user-only in v1.
 - **`OpenAIProvider` content-array wire mapping.** When `UserMessage.content` is a content-block sequence, the wire body uses OpenAI's `content` array per §8.1.1. `TextBlock → {type: "text", text}`. `ImageBlock` with a URL source maps to `{type: "image_url", image_url: {url, detail?}}`. `ImageBlock` with an inline source constructs an RFC 2397 `data:<media_type>;base64,<base64_data>` URI and goes through the same `image_url` entry shape. Inline bytes pass through unchanged — no inspection, transcoding, or re-encoding.
 - **New error category `ProviderUnsupportedContentBlock` (non-transient).** Raised when the bound model rejects a content block type / media variant. Distinct from `ProviderInvalidRequest` (which covers spec-shape malformation): this category surfaces a *capability* mismatch, letting callers route differently (e.g., fall back to a multimodal-capable provider) without overloading the malformed-request category. Carries `block_type` ("image" / "audio" / "video") and `reason` (provider's human-readable message) when those are recoverable from the rejection. `OpenAIProvider` detects content rejection via HTTP 400 bodies — heuristic on `error.code` (known set: `image_content_not_supported`, `unsupported_image_media_type`, `audio_content_not_supported`, etc.), `error.type` (`image_parse_error`), and `error.message` ("does not support" + image/audio/video).
diff --git a/docs/concepts/prompts.md b/docs/concepts/prompts.md
@@ -0,0 +1,287 @@
+# Prompts
+
+Named, versioned, content-addressed prompts. OpenArmature's
+prompt-management capability separates *fetching* a template
+from *rendering* it, lets you compose multiple backends with
+explicit fallback, and propagates prompt identity to your
+observability backend so trace UIs can pivot on the prompt
+that produced a call.
+
+Skip ahead to [a minimal example](#a-minimal-example) if you
+want code first.
+
+## The two halves: fetch and render
+
+A `PromptBackend` knows how to find a template by `name` and
+`label`; nothing more. A `PromptManager` composes one or more
+backends and adds rendering on top:
+
+```python
+from openarmature.prompts import PromptManager, FilesystemPromptBackend
+
+manager = PromptManager(FilesystemPromptBackend("./prompts"))
+
+# Fetch returns a Prompt (the raw template + identity metadata).
+prompt = await manager.fetch("greeting", "production")
+
+# Render applies variables and returns a PromptResult (the
+# rendered messages plus a content-addressed identity).
+result = manager.render(prompt, {"user": "Alice"})
+
+# Or do both in one shot:
+result = await manager.get("greeting", "production", {"user": "Alice"})
+```
+
+Why two operations instead of one? Three reasons:
+
+- **Inspect templates without binding variables.** Schema
+  validation, prompt diffing, tooling that walks the prompt
+  catalogue.
+- **Cache templates separately from rendered output.** The
+  fetch step is the I/O step; rendering is pure local
+  computation.
+- **Render the same template with different variables in
+  tight loops.** Map-reduce over chunks, batch evaluation,
+  fan-out fixtures.
+
+The convenience `get()` operation gives you the single-call
+shape when you want it without removing the separability.
+
+## Prompt identity
+
+Every `Prompt` carries five identity fields:
+
+- `name` — your stable identifier (`"greeting"`).
+- `version` — the backend's version string. Implementation-defined:
+  a backend MAY use semver, monotonic integers, content
+  hashes, git short-SHAs, or any stable identifier. The
+  filesystem backend derives it from the template content
+  hash.
+- `label` — the slot the prompt was fetched from
+  (`"production"`, `"latest"`, `"variant-a"`). The label is
+  part of the query.
+- `template_hash` — SHA-256 of the raw template source.
+  Two prompts with different content always have different
+  hashes.
+- `fetched_at` — when the prompt was fetched. Cached
+  backends preserve the original fetch time, not the
+  cache-hit time.
+
+The `name + version + label` triple identifies the prompt;
+the `template_hash` lets you tell two prompts apart by
+*content*, which matters when a vendor backend serves
+different content under the same `latest` label over time.
+
+A `PromptResult` propagates all of those, plus:
+
+- `rendered_hash` — SHA-256 over the rendered messages.
+  Same template + same variables → same hash. This is the
+  cache-key value a memoization layer wants.
+- `messages` — the rendered output as an LLM-ready
+  `list[Message]`. Directly consumable by
+  `Provider.complete()`.
+- `variables` — what was applied. Audit-trail friendly.
+- `rendered_at` — when the render happened. Distinct from
+  `fetched_at`.
+
+## Strict variables by default
+
+A template that references a variable not in the mapping
+raises `PromptRenderError`:
+
+```python
+prompt = await manager.fetch("greeting", "production")  # "Hello, {{ user }}! Today is {{ day }}."
+manager.render(prompt, {"user": "Alice"})  # raises — "day" is undefined
+```
+
+This is intentional. Silently substituting empty strings for
+missing variables masks bugs: a typo'd variable name produces
+a working-but-wrong prompt, often invisibly. If you need
+lenient behavior, wrap your variables in your own defaulting
+layer before passing them to `render()`.
+
+The Python implementation uses Jinja2's `StrictUndefined`.
+
+## Composite backends and fallback
+
+A manager constructed with multiple backends consults them in
+order. The fallback rule distinguishes infrastructure failure
+from logical absence:
+
+```python
+from openarmature.prompts import PromptManager
+from openarmature_langfuse import LangfusePromptBackend  # hypothetical sibling
+
+manager = PromptManager(
+    LangfusePromptBackend(api_key=...),
+    FilesystemPromptBackend("./prompts"),  # local fallback
+)
+```
+
+- **`PromptStoreUnavailable` from a backend → try the next.**
+  Network's down, vendor API is 5xx-ing, filesystem hiccupped —
+  the manager falls back. This is the "Langfuse is degraded,
+  use the local copy" case.
+- **`PromptNotFound` from a backend → STOP the chain.** The
+  error propagates. This is the "operator deliberately
+  deleted the prompt from Langfuse to retire it" case —
+  falling back here would silently resurface a stale local
+  copy under a name the operator wanted gone.
+- **All backends `PromptStoreUnavailable` → manager raises
+  `PromptStoreUnavailable`.** Everything's down.
+
+The two error categories have different operational
+meanings; the manager keeps them separated.
+
+## Errors
+
+Three categories cover every failure mode:
+
+| Error                     | When                                                                | Transient |
+| ------------------------- | ------------------------------------------------------------------- | --------- |
+| `PromptNotFound`          | No prompt matches `(name, label)` in any backend (after §8 rules)   | No        |
+| `PromptRenderError`       | Undefined variable, template parse error, coercion failure          | No        |
+| `PromptStoreUnavailable`  | Backend infrastructure failure (network, I/O, vendor API)           | Yes       |
+
+`PROMPT_TRANSIENT_CATEGORIES` is exported as a frozenset for
+retry-middleware classifiers — the same pattern
+`openarmature.llm` uses with its `TRANSIENT_CATEGORIES`.
+
+## PromptGroup — tracing related prompts together
+
+A `PromptGroup` is a structural grouping of two or more
+`PromptResult` instances under a stable `group_name`. The
+group itself doesn't execute anything; it gives observability
+a shared name to render related calls under.
+
+```python
+from openarmature.prompts import PromptGroup, with_active_prompt_group
+
+classify = await manager.get("classify", variables={"input": user_query})
+answer = await manager.get("answer", variables={"input": user_query, ...})
+
+group = PromptGroup(group_name="classifier_chain", members=[classify, answer])
+with with_active_prompt_group(group):
+    # Every LLM call in this scope carries
+    # openarmature.prompt.group_name="classifier_chain".
+    classification = await provider.complete(classify.messages, ...)
+    final = await provider.complete(answer.messages, ...)
+```
+
+Canonical patterns the primitive covers:
+
+- **Multi-stage classification** — `[coarse, fine, answer]`.
+- **RAG with reranking** — `[query_rewrite, retrieve, rerank, answer]`.
+- **Self-correction loops** — `[generate, critique, revise]`.
+- **Map-reduce over chunks** — `[chunk_classify_1..N, synthesize]`.
+
+The N=2 case ("classifier + follow-up") is the simplest;
+larger groups work under the same primitive. The group rejects
+empty and single-member shapes — single-prompt tagging is
+already served by the per-prompt observability attributes
+below.
+
+## Observability propagation
+
+When an LLM call fires inside `with_active_prompt(result)` (or
+`with_active_prompt_group(group)`), the OTel observer surfaces
+six normative attributes on the `openarmature.llm.complete`
+span:
+
+- `openarmature.prompt.name`
+- `openarmature.prompt.version`
+- `openarmature.prompt.label`
+- `openarmature.prompt.template_hash`
+- `openarmature.prompt.rendered_hash`
+- `openarmature.prompt.group_name`
+
+Pattern:
+
+```python
+result = await manager.get("greeting", "production", {"user": "Alice"})
+with with_active_prompt(result):
+    response = await provider.complete(result.messages, ...)
+```
+
+Trace UIs can then pivot on `prompt.name`, filter on
+`prompt.template_hash` to find every call that used a given
+template version, or surface `prompt.group_name` to group
+related calls into a single workflow view.
+
+Nesting is innermost-wins. If you activate a result inside
+another active result, the inner one wins for the duration
+of the inner block.
+
+## Determinism and content-addressed caching
+
+`render` is deterministic: same `Prompt`, same `variables` →
+bytewise-identical `messages` and `rendered_hash` across
+calls. This is the cache-key contract — `rendered_hash`
+gives a downstream memoization layer the right equivalence
+relation for free.
+
+Templates MAY reference user-supplied variables that capture
+nondeterministic values (`now=datetime.utcnow()`); the
+determinism contract applies to the render operation given
+fixed inputs, not to user-supplied variable content.
+
+## A minimal example
+
+```python
+import asyncio
+from pathlib import Path
+
+from openarmature.prompts import (
+    FilesystemPromptBackend,
+    PromptManager,
+    with_active_prompt,
+)
+
+
+async def main() -> None:
+    manager = PromptManager(FilesystemPromptBackend(Path("./prompts")))
+    result = await manager.get(
+        "greeting",
+        "production",
+        variables={"user": "Alice"},
+    )
+    print(result.messages[0].content)         # rendered text
+    print(result.rendered_hash)               # cache key
+    # Run an LLM call inside the active-prompt context so the
+    # OTel observer can surface prompt.* span attributes.
+    # with with_active_prompt(result):
+    #     response = await provider.complete(result.messages)
+    _ = with_active_prompt  # marker for the snippet above
+
+
+asyncio.run(main())
+```
+
+The filesystem backend layout is
+`<root>/<label>/<name>.j2` — for the example above,
+`./prompts/production/greeting.j2`.
+
+## What's out of scope (for now)
+
+- **Specific vendor backends** — Langfuse, PromptLayer, etc.,
+  ship as sibling packages (`openarmature-langfuse`, …). The
+  core ships the protocol + a filesystem reference.
+- **Prompt versioning workflows** — how versions are assigned,
+  promoted, pinned. Per project. The spec defines the
+  `version` field; the discipline is yours.
+- **Cache invalidation policies** — `template_hash` and
+  `rendered_hash` are the keys; the cache itself is a
+  separate concern.
+- **Prompt linting / evaluation** — quality checks belong to
+  separate tools (or the future eval capability).
+- **Multi-message render decomposition** — v1 emits a single
+  `UserMessage` carrying the rendered text. If you need
+  `system + user` splits, construct the messages list
+  manually outside `render()` for now.
+
+## Where to next
+
+- **[Model Providers](../model-providers/index.md)** —
+  what to pass `result.messages` into.
+- **[API reference: `openarmature.prompts`](../reference/prompts.md)** —
+  the full public surface.
diff --git a/docs/reference/prompts.md b/docs/reference/prompts.md
@@ -0,0 +1,7 @@
+# openarmature.prompts
+
+::: openarmature.prompts
+    options:
+      show_root_heading: false
+      show_source: false
+      heading_level: 2
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -95,6 +95,7 @@ nav:
     - Composition: concepts/composition.md
     - Fan-out: concepts/fan-out.md
     - LLMs: concepts/llms.md
+    - Prompts: concepts/prompts.md
     - Observability: concepts/observability.md
     - Checkpointing: concepts/checkpointing.md
   - Model Providers:
@@ -104,6 +105,7 @@ nav:
     - reference/index.md
     - openarmature.graph: reference/graph.md
     - openarmature.llm: reference/llm.md
+    - openarmature.prompts: reference/prompts.md
     - openarmature.checkpoint: reference/checkpoint.md
     - openarmature.observability: reference/observability.md