LunarCommand
diff --git a/‎.github/workflows/ci.yml‎
Lines changed: 12 additions & 0 deletions b/‎.github/workflows/ci.yml‎
Lines changed: 12 additions & 0 deletions
diff --git a/‎AGENTS.md‎
Lines changed: 15 additions & 3 deletions b/‎AGENTS.md‎
Lines changed: 15 additions & 3 deletions
diff --git a/‎CHANGELOG.md‎
Lines changed: 1 addition & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎README.md‎
Lines changed: 10 additions & 0 deletions b/‎README.md‎
Lines changed: 10 additions & 0 deletions
diff --git a/‎docs/agent/non-obvious-shapes.md‎
Lines changed: 111 additions & 0 deletions b/‎docs/agent/non-obvious-shapes.md‎
Lines changed: 111 additions & 0 deletions
diff --git a/‎docs/agent/tldr.md‎
Lines changed: 3 additions & 0 deletions b/‎docs/agent/tldr.md‎
Lines changed: 3 additions & 0 deletions
@@ -26,6 +26,18 @@ jobs:
           # Conformance fixtures live in the openarmature-spec submodule.
           submodules: recursive
 
+      - name: Fetch submodule tags
+        # actions/checkout's submodule clone is shallow and doesn't
+        # carry tags. ``scripts/build_agents_md.py`` asserts the
+        # submodule HEAD is AT a ``v*`` tag (``git tag --points-at
+        # HEAD``; refuses to bundle draft spec text or text from
+        # a commit between two release tags);
+        # ``tests/test_agents_md_drift.py`` runs that assertion.
+        # Fetch tag refs only — the HEAD commit is already present
+        # from the submodule checkout, and we don't need history
+        # beyond what tags point at.
+        run: git -C openarmature-spec fetch --tags
+
       - name: Install uv
         uses: astral-sh/setup-uv@08807647e7069bb48b6ef5acd8ec9567f424441b  # v8.1.0
         with:
 
@@ -1,8 +1,20 @@
 # AGENTS.md
 
-Orientation for coding agents working in this repo. `README.md` covers what
-the project is and how to use it; this file covers things that aren't
-obvious from reading the code.
+Orientation for coding agents working in **this repo** — i.e., agents
+contributing to openarmature itself. `README.md` covers what the project
+is and how to use it; this file covers things that aren't obvious from
+reading the code.
+
+> **Two AGENTS.md files in this project. Different audiences.**
+>
+> - This file (`./AGENTS.md`, at the repo root) — for agents working on
+>   the openarmature codebase. Package layout, test layout, tooling,
+>   spec-submodule discipline, commit conventions.
+> - `src/openarmature/AGENTS.md` (shipped in the wheel) — for agents
+>   working in user codebases that depend on openarmature. Capability
+>   contracts, common patterns, non-obvious shapes, example index.
+>   Generated by `scripts/build_agents_md.py` from canonical sources;
+>   committed and CI-drift-checked.
 
 ## Spec is the source of truth
 
 
@@ -8,6 +8,7 @@ The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). The
 
 ### Added
 
+- **Bundled agent documentation at `openarmature/AGENTS.md`.** The wheel now ships a generated `AGENTS.md` file at the installed package root, agent-discoverable via `python -c "import openarmature; print(openarmature.__path__[0] + '/AGENTS.md')"`. Sections include a TL;DR, capability summaries pulled from the pinned spec submodule's §1 (Purpose) + §2 (Concepts), the patterns docs, hand-written non-obvious-shapes recipes, and a one-line example index. Generator lives at `scripts/build_agents_md.py`; the committed file is CI-drift-checked by `tests/test_agents_md_drift.py`. The submodule pin discipline (build refuses unless the submodule HEAD is AT a `v*` tag via `git tag --points-at HEAD`) prevents draft (untagged) spec text — or text from a commit between two release tags — from leaking into a release bundle. Adopting projects can point their own `AGENTS.md` / `CLAUDE.md` at this path so agent sessions in their codebase find it automatically.
 - **`FanOutInstanceProgress.result_is_error` field** (proposal 0027, accepted in spec v0.21.0). Explicit boolean discriminator on each per-instance entry in `CheckpointRecord.fan_out_progress` — `True` for `collect`-mode error contributions (roll forward into `errors_field`), `False` for success contributions (roll forward into `target_field`). The engine reads the explicit field on resume rather than inferring routing from `result`'s shape; the previous structural heuristic (`_looks_like_error_record`) is removed. Backward-compat path on load: pre-0027 records that omit the key default to `False`.
 - **Strict `CheckpointRecordInvalid` on fan-out count drift** (proposal 0029, accepted in spec v0.22.0). When the resumed run's resolved instance count differs from the saved `fan_out_progress` entry's `instance_count`, the engine raises `CheckpointRecordInvalid` before any fan-out instance work runs on the resumed path. Replaces the pre-0029 pad/truncate behavior which silently dropped `completed` contributions on shrink (breaking §10.11.1's exactly-once guarantee) and dispatched unsaved work on grow.
 - **`tool_choice` parameter on `Provider.complete()`** (proposal 0025, accepted in spec v0.20.0). Optional discriminated-union value constraining the model's tool-calling behavior — one of `"auto"`, `"required"`, `"none"`, or a `ForceTool(name=...)` record. Validation runs pre-send: `"required"` and `ForceTool` both demand non-empty `tools`, and `ForceTool.name` must appear in the supplied list; violations raise `ProviderInvalidRequest` (§7's existing category — no new error category). When `tool_choice` is `None` (the default) the wire field is omitted and the provider's own default applies, preserving pre-0025 behavior exactly. The `OpenAIProvider` maps the spec shape onto OpenAI's wire shape per §8.1.1 (the `ForceTool.type="tool"` renames to wire `type="function"`).
 
@@ -195,3 +195,13 @@ A few things to notice:
 - **API reference**: auto-generated from docstrings. [openarmature.ai/reference](https://openarmature.ai/reference/)
 - **Examples**: ten runnable demos with walk-throughs. [openarmature.ai/examples](https://openarmature.ai/examples/) (source at [./examples/](./examples/))
 - **Spec**: behavioral contract this implementation conforms to. [LunarCommand/openarmature-spec](https://github.com/LunarCommand/openarmature-spec)
+
+## For AI agents
+
+If you're an AI agent working in code that uses openarmature, read the bundled agent docs before editing:
+
+```bash
+python -c "import openarmature; print(openarmature.__path__[0] + '/AGENTS.md')"
+```
+
+The file ships with the package and covers capability contracts, common patterns, non-obvious shapes, and an example index. Adopting projects can point their own `AGENTS.md` / `CLAUDE.md` at this path so agent sessions in their codebase find it automatically.
@@ -0,0 +1,111 @@
+## Non-obvious shapes
+
+Recipes that aren't deducible from the API surface alone. The primitives docs tell you what's possible; this section tells you what's smart.
+
+### Declare a non-clobbering reducer on accumulator list fields
+
+State fields default to `last_write_wins` — each node's write replaces the prior value for that field. For scalar fields (`status: str`, `count: int`) that's usually what you want. For list fields that accumulate contributions across multiple nodes (`messages: list[Message]`, `events: list[Event]`, `results: list[Result]`), it's the wrong default — every node's contribution silently clobbers everything before it.
+
+Declare `append` (or another non-clobbering reducer) at the state class:
+
+```python
+from typing import Annotated
+from pydantic import Field
+from openarmature.graph import State, append
+
+class WorkflowState(State):
+    messages: Annotated[list[Message], append] = Field(default_factory=list)
+    events: Annotated[list[Event], append] = Field(default_factory=list)
+    final_status: str = "pending"   # last_write_wins is fine here
+```
+
+The failure mode without `append` is silent and easy to misdiagnose — the final state shows only the last node's contribution to the list, with no error. Common "why is my accumulator empty?" question. `merge` is the equivalent for `dict[str, V]` fields that accumulate keys across nodes.
+
+### Branch on `Response.finish_reason` before reading `message.content`
+
+After `await provider.complete(messages, tools=[...])` returns, the shape of `Response` varies by `finish_reason`:
+
+- `finish_reason == "stop"` — assistant produced a content response. `message.content` carries the text; `message.tool_calls` is empty.
+- `finish_reason == "tool_calls"` — assistant emitted tool calls. `message.tool_calls` carries the list; `message.content` is typically empty (model didn't say anything beyond the tool calls).
+- `finish_reason == "length"` / `"content_filter"` / `"error"` — completion was cut off or refused; `message.content` may be partial or empty.
+
+Post-LLM logic that reads `message.content` without checking `finish_reason` misses the entire tool-calling path:
+
+```python
+response = await provider.complete(messages, tools=tools)
+
+if response.finish_reason == "tool_calls":
+    # Dispatch each tool call, append ToolMessage responses, re-call complete()
+    for tc in response.message.tool_calls:
+        result = dispatch_tool(tc.name, tc.arguments)
+        messages.append(ToolMessage(content=result, tool_call_id=tc.id))
+    response = await provider.complete(messages, tools=tools)
+elif response.finish_reason == "stop":
+    handle_text(response.message.content)
+else:
+    handle_error_or_partial(response)
+```
+
+The discriminator is one branch; missing it gives you empty data on tool-call responses and silently wrong behavior on truncations.
+
+### `disable_llm_payload` defaults to `True` — flip it for LLM-aware observability backends
+
+The `OTelObserver` (and any spec-conformant observer reading LLM events) defaults `disable_llm_payload: bool = True` per spec §5.5's "default-off by privacy" framing. Without flipping the flag, LLM spans carry GenAI semconv attributes (token counts, model name, finish reason) but NOT the message payload (input messages, response content, request extras).
+
+That's the right default for general OpenArmature use — payloads may contain PII the user hasn't audited, and storage cost grows with prompt size. But it's the WRONG default if you're wiring up an LLM-aware observability backend (Langfuse, Phoenix, Honeycomb's LLM lens) that renders the message stream as part of its generation view. Backends will show "empty" generations and you'll wonder why.
+
+Flip the flag once at observer construction:
+
+```python
+from openarmature.observability import OTelObserver
+
+observer = OTelObserver(
+    span_processor=your_exporter,
+    disable_llm_payload=False,   # opt in to message-payload attributes
+)
+compiled.attach_observer(observer)
+```
+
+The companion `disable_genai_semconv` flag defaults to `False` — GenAI semconv attributes emit by default since they're how LLM-aware backends render anything at all. Don't flip that one unless you're routing GenAI emission through a different layer.
+
+### Use the bundled `FilesystemCheckpointer` or `SQLiteCheckpointer`, not a hand-rolled serializer
+
+The temptation when persisting graph state is to `json.dumps(state.model_dump())` and write to a file. Don't. The shipped Checkpointer backends handle every contract `openarmature.checkpoint.Checkpointer` defines — round-trip integrity, `parent_states` for inner-save resume, fan-out progress tracking, schema-version migration, listing by `correlation_id`, `CheckpointRecordInvalid` on shape drift. A hand-rolled serializer that "works" on the happy path silently fails the moment a fan-out crash leaves an in-flight save record, and you'll be debugging it for hours before realizing the bundled backend exists.
+
+If your storage requirement isn't local disk (`FilesystemCheckpointer`) or local SQLite (`SQLiteCheckpointer` — also supports `:memory:` and arbitrary file paths), implement the `Checkpointer` Protocol against your backend rather than wrapping state serialization yourself. Custom backends inherit the spec's correctness contract for free.
+
+### Subgraphs > conditional-edge spaghetti when branches don't share state
+
+A common shape is "after this LLM call, route to either a JSON-extraction node or a tool-dispatch node depending on `finish_reason`." The naive solution is two conditional edges from the LLM node, one to each downstream. That works for two branches; it scales poorly past three.
+
+When the branches operate on different sub-shapes of state — e.g., one path is "extract JSON, then validate" while another is "dispatch tools, loop until done, then summarize" — encapsulate each as a `SubgraphNode` and route from the LLM node to the right subgraph. Each subgraph has its own state schema (projected from the parent), its own entry node, and its own internal topology. The parent graph becomes a switchboard with a few edges; the complexity lives one layer down where it composes cleanly.
+
+### Be explicit with `tool_choice`; don't trust the provider's default
+
+`Provider.complete(messages, tools, tool_choice=...)` accepts `"auto"`, `"required"`, `"none"`, or a `ForceTool(name=...)` record. When you omit `tool_choice`, the OpenAI provider's own default applies — usually `"auto"` when `tools` is non-empty, but documented per-provider. A pipeline that wants deterministic tool-calling (a routing node that MUST produce a tool call, a guarded LLM call that MUST NOT call tools) should pin `tool_choice` explicitly rather than relying on the provider default.
+
+Pre-send validation catches the three §5 failure modes (`required` with empty tools, `ForceTool` with empty tools, `ForceTool.name` not in tools) and raises `ProviderInvalidRequest` before the HTTP call. Not all providers honor `tool_choice` — confirm with your provider's docs — but the OpenAI-compatible mapping is in `OpenAIProvider`.
+
+### Always `await graph.drain()` in short-lived processes; supply a `timeout` if observers might hang
+
+`CompiledGraph.invoke()` returns when the graph reaches END or raises; observer events are dispatched onto a per-invocation queue and delivered by a background worker. The graph's execution loop never awaits observer processing. In a long-running service this is invisible — the worker drains naturally. In a CLI, script, or serverless function, the process exits before the worker finishes, and any late observer events (typically the last node's `completed` event plus any `checkpoint_saved` events) get dropped.
+
+Always call `await graph.drain()` before the short-lived process exits. If your observer set includes anything that might hang (a metrics observer with a flaky network endpoint, an OTel exporter behind a slow OTLP collector), supply a `timeout`:
+
+```python
+summary = await graph.drain(timeout=5.0)
+if summary.timeout_reached:
+    log.warning("drain truncated: %d events undelivered", summary.undelivered_count)
+```
+
+The compiled graph stays usable for subsequent invocations after a timed-out drain — workers are cancelled cleanly, no partial state leaks.
+
+### Three exception hierarchies; know which one your code catches
+
+`openarmature` exceptions split across three sibling hierarchies:
+
+- `RuntimeGraphError` (in `openarmature.graph`) — node execution failures: `NodeException`, `RoutingError`, `EdgeException`, `ReducerError`, `StateValidationError`. Each has a `category` string matching the spec's canonical error categories.
+- `CheckpointError` (in `openarmature.checkpoint`) — persistence failures: `CheckpointNotFound`, `CheckpointSaveFailed`, `CheckpointRecordInvalid`, `CheckpointStateMigrationMissing`, `CheckpointStateMigrationFailed`, `CheckpointStateMigrationChainAmbiguous`.
+- `LlmProviderError` (in `openarmature.llm`) — provider call failures: `ProviderAuthentication`, `ProviderInvalidRequest`, `ProviderInvalidResponse`, `ProviderInvalidModel`, `ProviderModelNotLoaded`, `ProviderRateLimit`, `ProviderUnavailable`, `ProviderUnsupportedContentBlock`, `StructuredOutputInvalid`.
+
+Catching `Exception` works but is too broad; catching one hierarchy misses the other two. If you want to branch on category strings (e.g., for retry logic), catch the relevant base — `RuntimeGraphError` covers all five spec runtime categories, `LlmProviderError` covers all nine provider categories, `CheckpointError` covers all six checkpoint categories. The `TRANSIENT_CATEGORIES` frozenset in `openarmature.llm` enumerates which provider categories are retriable.
@@ -0,0 +1,3 @@
+OpenArmature is a workflow framework for LLM pipelines and tool-calling agents — typed state, compile-time topology checks, observability, and crash-safe checkpoints baked into a graph engine. The graph layer has no concept of LLMs or tools; the same primitives drive deterministic ETL pipelines and tool-calling agents alike. Nodes return partial updates; the engine merges into a frozen state snapshot. Behavior is defined by [openarmature-spec](https://openarmature.org/capabilities/) and verified by conformance fixtures; this package is the reference Python implementation.
+
+**What OpenArmature is NOT:** not a chat framework (no built-in messages channel), not an LLM SDK (Provider is the abstraction layer; OpenAIProvider is the canonical impl), not a state-management library (state is per-invocation, not application-wide), not an evaluation framework (deferred to `openarmature-eval`).
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	+OpenArmature is a workflow framework for LLM pipelines and tool-calling agents — typed state, compile-time topology checks, observability, and crash-safe checkpoints baked into a graph engine. The graph layer has no concept of LLMs or tools; the same primitives drive deterministic ETL pipelines and tool-calling agents alike. Nodes return partial updates; the engine merges into a frozen state snapshot. Behavior is defined by [openarmature-spec](https://openarmature.org/capabilities/) and verified by conformance fixtures; this package is the reference Python implementation.
	`2`	`+`
	`3`	+What OpenArmature is NOT: not a chat framework (no built-in messages channel), not an LLM SDK (Provider is the abstraction layer; OpenAIProvider is the canonical impl), not a state-management library (state is per-invocation, not application-wide), not an evaluation framework (deferred to `openarmature-eval`).