|
| 1 | +# Prompts |
| 2 | + |
| 3 | +Named, versioned, content-addressed prompts. OpenArmature's |
| 4 | +prompt-management capability separates *fetching* a template |
| 5 | +from *rendering* it, lets you compose multiple backends with |
| 6 | +explicit fallback, and propagates prompt identity to your |
| 7 | +observability backend so trace UIs can pivot on the prompt |
| 8 | +that produced a call. |
| 9 | + |
| 10 | +Skip ahead to [a minimal example](#a-minimal-example) if you |
| 11 | +want code first. |
| 12 | + |
| 13 | +## The two halves: fetch and render |
| 14 | + |
| 15 | +A `PromptBackend` knows how to find a template by `name` and |
| 16 | +`label`; nothing more. A `PromptManager` composes one or more |
| 17 | +backends and adds rendering on top: |
| 18 | + |
| 19 | +```python |
| 20 | +from openarmature.prompts import PromptManager, FilesystemPromptBackend |
| 21 | + |
| 22 | +manager = PromptManager(FilesystemPromptBackend("./prompts")) |
| 23 | + |
| 24 | +# Fetch returns a Prompt (the raw template + identity metadata). |
| 25 | +prompt = await manager.fetch("greeting", "production") |
| 26 | + |
| 27 | +# Render applies variables and returns a PromptResult (the |
| 28 | +# rendered messages plus a content-addressed identity). |
| 29 | +result = manager.render(prompt, {"user": "Alice"}) |
| 30 | + |
| 31 | +# Or do both in one shot: |
| 32 | +result = await manager.get("greeting", "production", {"user": "Alice"}) |
| 33 | +``` |
| 34 | + |
| 35 | +Why two operations instead of one? Three reasons: |
| 36 | + |
| 37 | +- **Inspect templates without binding variables.** Schema |
| 38 | + validation, prompt diffing, tooling that walks the prompt |
| 39 | + catalogue. |
| 40 | +- **Cache templates separately from rendered output.** The |
| 41 | + fetch step is the I/O step; rendering is pure local |
| 42 | + computation. |
| 43 | +- **Render the same template with different variables in |
| 44 | + tight loops.** Map-reduce over chunks, batch evaluation, |
| 45 | + fan-out fixtures. |
| 46 | + |
| 47 | +The convenience `get()` operation gives you the single-call |
| 48 | +shape when you want it without removing the separability. |
| 49 | + |
| 50 | +## Prompt identity |
| 51 | + |
| 52 | +Every `Prompt` carries five identity fields: |
| 53 | + |
| 54 | +- `name`: your stable identifier (`"greeting"`). |
| 55 | +- `version`: the backend's version string. Implementation-defined: |
| 56 | + a backend MAY use semver, monotonic integers, content |
| 57 | + hashes, git short-SHAs, or any stable identifier. The |
| 58 | + filesystem backend derives it from the template content |
| 59 | + hash. |
| 60 | +- `label`: the slot the prompt was fetched from |
| 61 | + (`"production"`, `"latest"`, `"variant-a"`). The label is |
| 62 | + part of the query. |
| 63 | +- `template_hash`: SHA-256 of the raw template source. |
| 64 | + Two prompts with different content always have different |
| 65 | + hashes. |
| 66 | +- `fetched_at`: when the prompt was fetched. Cached |
| 67 | + backends preserve the original fetch time, not the |
| 68 | + cache-hit time. |
| 69 | + |
| 70 | +The `name + version + label` triple identifies the prompt; |
| 71 | +the `template_hash` lets you tell two prompts apart by |
| 72 | +*content*, which matters when a vendor backend serves |
| 73 | +different content under the same `latest` label over time. |
| 74 | + |
| 75 | +A `PromptResult` propagates all of those, plus: |
| 76 | + |
| 77 | +- `rendered_hash`: SHA-256 over the rendered messages. |
| 78 | + Same template + same variables → same hash. This is the |
| 79 | + cache-key value a memoization layer wants. |
| 80 | +- `messages`: the rendered output as an LLM-ready |
| 81 | + `list[Message]`. Directly consumable by |
| 82 | + `Provider.complete()`. |
| 83 | +- `variables`: what was applied. Audit-trail friendly. |
| 84 | +- `rendered_at`: when the render happened. Distinct from |
| 85 | + `fetched_at`. |
| 86 | + |
| 87 | +## Strict variables by default |
| 88 | + |
| 89 | +A template that references a variable not in the mapping |
| 90 | +raises `PromptRenderError`: |
| 91 | + |
| 92 | +```python |
| 93 | +prompt = await manager.fetch("greeting", "production") # "Hello, {{ user }}! Today is {{ day }}." |
| 94 | +manager.render(prompt, {"user": "Alice"}) # raises: "day" is undefined |
| 95 | +``` |
| 96 | + |
| 97 | +This is intentional. Silently substituting empty strings for |
| 98 | +missing variables masks bugs: a typo'd variable name produces |
| 99 | +a working-but-wrong prompt, often invisibly. If you need |
| 100 | +lenient behavior, wrap your variables in your own defaulting |
| 101 | +layer before passing them to `render()`. |
| 102 | + |
| 103 | +The Python implementation uses Jinja2's `StrictUndefined`. |
| 104 | + |
| 105 | +## Composite backends and fallback |
| 106 | + |
| 107 | +A manager constructed with multiple backends consults them in |
| 108 | +order. The fallback rule distinguishes infrastructure failure |
| 109 | +from logical absence: |
| 110 | + |
| 111 | +```python |
| 112 | +from openarmature.prompts import PromptManager |
| 113 | +from openarmature_langfuse import LangfusePromptBackend # hypothetical sibling |
| 114 | + |
| 115 | +manager = PromptManager( |
| 116 | + LangfusePromptBackend(api_key=...), |
| 117 | + FilesystemPromptBackend("./prompts"), # local fallback |
| 118 | +) |
| 119 | +``` |
| 120 | + |
| 121 | +- **`PromptStoreUnavailable` from a backend → try the next.** |
| 122 | + Network's down, vendor API is 5xx-ing, filesystem hiccupped, |
| 123 | + so the manager falls back. This is the "Langfuse is degraded, |
| 124 | + use the local copy" case. |
| 125 | +- **`PromptNotFound` from a backend → STOP the chain.** The |
| 126 | + error propagates. This is the "operator deliberately deleted |
| 127 | + the prompt from Langfuse to retire it" case; falling back here |
| 128 | + would silently resurface a stale local copy under a name the |
| 129 | + operator wanted gone. |
| 130 | +- **All backends `PromptStoreUnavailable` → manager raises |
| 131 | + `PromptStoreUnavailable`.** Everything's down. |
| 132 | + |
| 133 | +The two error categories have different operational |
| 134 | +meanings; the manager keeps them separated. |
| 135 | + |
| 136 | +## Errors |
| 137 | + |
| 138 | +Three categories cover every failure mode: |
| 139 | + |
| 140 | +| Error | When | Transient | |
| 141 | +| ------------------------- | ------------------------------------------------------------------- | --------- | |
| 142 | +| `PromptNotFound` | No prompt matches `(name, label)` in any backend (after §8 rules) | No | |
| 143 | +| `PromptRenderError` | Undefined variable, template parse error, coercion failure | No | |
| 144 | +| `PromptStoreUnavailable` | Backend infrastructure failure (network, I/O, vendor API) | Yes | |
| 145 | + |
| 146 | +`PROMPT_TRANSIENT_CATEGORIES` is exported as a frozenset for |
| 147 | +retry-middleware classifiers, matching the pattern |
| 148 | +`openarmature.llm` uses with its `TRANSIENT_CATEGORIES`. |
| 149 | + |
| 150 | +## PromptGroup: tracing related prompts together |
| 151 | + |
| 152 | +A `PromptGroup` is a structural grouping of two or more |
| 153 | +`PromptResult` instances under a stable `group_name`. The |
| 154 | +group itself doesn't execute anything; it gives observability |
| 155 | +a shared name to render related calls under. |
| 156 | + |
| 157 | +```python |
| 158 | +from openarmature.prompts import PromptGroup, with_active_prompt_group |
| 159 | + |
| 160 | +classify = await manager.get("classify", variables={"input": user_query}) |
| 161 | +answer = await manager.get("answer", variables={"input": user_query, ...}) |
| 162 | + |
| 163 | +group = PromptGroup(group_name="classifier_chain", members=[classify, answer]) |
| 164 | +with with_active_prompt_group(group): |
| 165 | + # Every LLM call in this scope carries |
| 166 | + # openarmature.prompt.group_name="classifier_chain". |
| 167 | + classification = await provider.complete(classify.messages, ...) |
| 168 | + final = await provider.complete(answer.messages, ...) |
| 169 | +``` |
| 170 | + |
| 171 | +Canonical patterns the primitive covers: |
| 172 | + |
| 173 | +- **Multi-stage classification**: `[coarse, fine, answer]`. |
| 174 | +- **RAG with reranking**: `[query_rewrite, retrieve, rerank, answer]`. |
| 175 | +- **Self-correction loops**: `[generate, critique, revise]`. |
| 176 | +- **Map-reduce over chunks**: `[chunk_classify_1..N, synthesize]`. |
| 177 | + |
| 178 | +The N=2 case ("classifier + follow-up") is the simplest; |
| 179 | +larger groups work under the same primitive. The group rejects |
| 180 | +empty and single-member shapes; single-prompt tagging is |
| 181 | +already served by the per-prompt observability attributes |
| 182 | +below. |
| 183 | + |
| 184 | +## Observability propagation |
| 185 | + |
| 186 | +When an LLM call fires inside `with_active_prompt(result)` (or |
| 187 | +`with_active_prompt_group(group)`), the OTel observer surfaces |
| 188 | +six normative attributes on the `openarmature.llm.complete` |
| 189 | +span: |
| 190 | + |
| 191 | +- `openarmature.prompt.name` |
| 192 | +- `openarmature.prompt.version` |
| 193 | +- `openarmature.prompt.label` |
| 194 | +- `openarmature.prompt.template_hash` |
| 195 | +- `openarmature.prompt.rendered_hash` |
| 196 | +- `openarmature.prompt.group_name` |
| 197 | + |
| 198 | +Pattern: |
| 199 | + |
| 200 | +```python |
| 201 | +result = await manager.get("greeting", "production", {"user": "Alice"}) |
| 202 | +with with_active_prompt(result): |
| 203 | + response = await provider.complete(result.messages, ...) |
| 204 | +``` |
| 205 | + |
| 206 | +Trace UIs can then pivot on `prompt.name`, filter on |
| 207 | +`prompt.template_hash` to find every call that used a given |
| 208 | +template version, or surface `prompt.group_name` to group |
| 209 | +related calls into a single workflow view. |
| 210 | + |
| 211 | +Nesting is innermost-wins. If you activate a result inside |
| 212 | +another active result, the inner one wins for the duration |
| 213 | +of the inner block. |
| 214 | + |
| 215 | +## Determinism and content-addressed caching |
| 216 | + |
| 217 | +`render` is deterministic: same `Prompt`, same `variables` → |
| 218 | +bytewise-identical `messages` and `rendered_hash` across |
| 219 | +calls. This is the cache-key contract: `rendered_hash` |
| 220 | +gives a downstream memoization layer the right equivalence |
| 221 | +relation for free. |
| 222 | + |
| 223 | +Templates MAY reference user-supplied variables that capture |
| 224 | +nondeterministic values (`now=datetime.utcnow()`); the |
| 225 | +determinism contract applies to the render operation given |
| 226 | +fixed inputs, not to user-supplied variable content. |
| 227 | + |
| 228 | +## A minimal example |
| 229 | + |
| 230 | +```python |
| 231 | +import asyncio |
| 232 | +from pathlib import Path |
| 233 | + |
| 234 | +from openarmature.prompts import FilesystemPromptBackend, PromptManager |
| 235 | + |
| 236 | + |
| 237 | +async def main() -> None: |
| 238 | + manager = PromptManager(FilesystemPromptBackend(Path("./prompts"))) |
| 239 | + result = await manager.get( |
| 240 | + "greeting", |
| 241 | + "production", |
| 242 | + variables={"user": "Alice"}, |
| 243 | + ) |
| 244 | + print(result.messages[0].content) # rendered text |
| 245 | + print(result.rendered_hash) # cache key |
| 246 | + |
| 247 | + |
| 248 | +asyncio.run(main()) |
| 249 | +``` |
| 250 | + |
| 251 | +The filesystem backend layout is |
| 252 | +`<root>/<label>/<name>.j2`; for the example above, |
| 253 | +`./prompts/production/greeting.j2`. |
| 254 | + |
| 255 | +## What's out of scope (for now) |
| 256 | + |
| 257 | +- **Specific vendor backends**: Langfuse, PromptLayer, etc., |
| 258 | + ship as sibling packages (`openarmature-langfuse`, …). The |
| 259 | + core ships the protocol + a filesystem reference. |
| 260 | +- **Prompt versioning workflows**: how versions are assigned, |
| 261 | + promoted, pinned. Per project. The spec defines the |
| 262 | + `version` field; the discipline is yours. |
| 263 | +- **Cache invalidation policies**: `template_hash` and |
| 264 | + `rendered_hash` are the keys; the cache itself is a |
| 265 | + separate concern. |
| 266 | +- **Prompt linting / evaluation**: quality checks belong to |
| 267 | + separate tools (or the future eval capability). |
| 268 | +- **Multi-message render decomposition**: v1 emits a single |
| 269 | + `UserMessage` carrying the rendered text. If you need |
| 270 | + `system + user` splits, construct the messages list |
| 271 | + manually outside `render()` for now. |
| 272 | + |
| 273 | +## Where to next |
| 274 | + |
| 275 | +- **[Model Providers](../model-providers/index.md)**: |
| 276 | + what to pass `result.messages` into. |
| 277 | +- **[API reference: `openarmature.prompts`](../reference/prompts.md)**: |
| 278 | + the full public surface. |
0 commit comments