Skip to content

Commit 6b5cc13

Browse files
Bundle agent-discoverable AGENTS.md in the wheel (agent-docs A2) (#72)
* Add bundled AGENTS.md generator + drift test The wheel now ships a generated AGENTS.md at the installed package root (openarmature/AGENTS.md) for AI agents working in code that uses openarmature. Sections, in order: version-stamped self- reference header, TL;DR, capability summaries (spec §1+§2 of graph-engine / pipeline-utilities / llm-provider / observability / prompt-management), patterns from docs/patterns/*.md, hand-written non-obvious-shapes recipes, example index (one-liners + paths inside the source tree), discovery footer. Generator at scripts/build_agents_md.py reads from the pinned spec submodule via `git show <sha>:spec/...` (not the working tree) and refuses to regenerate unless the submodule HEAD is reachable from a v* tag — closes the "release ships a bundle pinned to draft spec text" failure mode. Spec text and patterns are pulled verbatim; the non-obvious-shapes file (docs/agent/non-obvious-shapes.md) and the TL;DR (docs/agent/tldr.md) are hand-curated by python. tests/test_agents_md_drift.py regenerates in-memory and diffs against the committed src/openarmature/AGENTS.md, failing the suite when the bundle is stale relative to its sources. Run on every PR via the standard pytest invocation. Bundle is 908 lines / 46KB at v0.22.1 — dense enough to be authoritative without ballooning past useful agent context budgets. * Add agent-docs discovery pointers + CHANGELOG entry src/openarmature/__init__.py module docstring first line points at the bundled AGENTS.md so the file shows in standard IDE hover (Pylance / Pyright on `import openarmature`). README gets a "For AI agents" section with the discovery one-liner adopters can point their own AGENTS.md / CLAUDE.md at. Repo-root AGENTS.md gains a disambiguating note distinguishing the two AGENTS.md files in the project: this one orients agents working ON openarmature; the bundled src/openarmature/AGENTS.md orients agents in user codebases that USE openarmature. CHANGELOG Unreleased section gains an Added entry covering the bundle + discovery surface + drift check + the submodule-pin discipline that prevents draft spec text from leaking into a release bundle. * CI: fetch submodule tags before pytest actions/checkout's submodule clone is shallow and doesn't carry tag refs. scripts/build_agents_md.py asserts the spec submodule HEAD is reachable from a v* tag (refuses to bundle draft spec text); tests/test_agents_md_drift.py runs that assertion and was failing on CI because the tags weren't fetched. git fetch --tags pulls just the tag refs into the existing shallow clone — no extra history needed since the HEAD commit is already present from the submodule checkout. * Address PR #72 review: generator hardening + bundle hygiene CodeQL / github-code-quality flagged implicit string concatenation in three list literals in scripts/build_agents_md.py (_capability_summaries, _patterns, _example_index). Switched to explicit + concatenation across all three sites — same intent (multi-line string assembly for readability), no possibly-missing- comma ambiguity. CoPilot flagged four bundle-quality issues: 1. _assert_pin_at_tag used reverse lexicographic sort, which gets multi-digit semver wrong (v0.10.0 < v0.9.0 lexicographically). Switched to git tag --sort=-version:refname for native version-aware ordering. No new Python deps. 2. Extracted spec sections retained ## 1. Purpose headings that were higher-level than the wrapping ### Capability: header. _extract_sections_1_2 now demotes ATX headings by two levels so the bundled markdown maintains a clean hierarchy. 3. Patterns were inlined verbatim with their original # headings (multiple H1s under the bundle) and relative ../concepts/...md / ../examples/...md links (broken in the installed wheel). New _transform_pattern_content demotes ATX headings by two levels and rewrites relative doc-tree links to absolute openarmature.ai/<section>/<name>/ URLs via _PATTERN_LINK_RE. 4. Bundle header / TLDR / discovery footer used openarmature.ai/capabilities/ for spec links, which isn't a real URL (.ai serves python docs; .org serves spec). Normalized those three sites to openarmature.org/capabilities/. The pattern files' existing .org URLs stay correct. * Fix drift trailing newline + absorb 3 non-obvious-shapes entries CI drift test failed because the generator emitted a trailing `\n\n` (``_discovery_footer`` ends with `\n` + the final `+ "\n"`), but the committed file got normalized to a single trailing `\n` by local pre-commit / editor settings. The generator now does ``"\n\n".join(sections).rstrip() + "\n"`` so output ends with exactly one final newline regardless of section-internal trailing whitespace. Absorbed spec's three strong-candidate non-obvious-shapes entries from the discuss-agent-discoverable-docs 07 review (option a — fold into PR #72 before merge rather than a follow-on): 1. Declare a non-clobbering reducer on accumulator list fields — highest "silent foot-gun" risk per spec ranking. Default ``last_write_wins`` silently clobbers prior contributions when multiple nodes append to a list. The fix is one ``Annotated[..., append]`` declaration on the state field; the diagnostic is non-obvious because the API doesn't complain. 2. Branch on ``Response.finish_reason`` before reading ``message.content`` — broadly applicable to any tool-calling pipeline. `tool_calls` finish reason carries the calls in ``message.tool_calls`` and leaves ``message.content`` empty; post-LLM logic reading ``content`` first misses the entire tool-calling path. 3. ``disable_llm_payload`` defaults to ``True`` — LLM-aware observability backends (Langfuse, Phoenix, Honeycomb LLM lens) need the flag flipped at observer construction. Per spec §5.5's "default-off by privacy" framing; the right default for general OA use, wrong default if you're wiring LLM-specific telemetry. Bundle grew from 908 to 973 lines, 46KB to 50KB. Still well within the 1500-2500-line target. Three new test_docs_examples illustrative snippets (the code samples in each entry) skip per the existing concept-page convention. * Address PR #72 round 2: wording precision + pattern intra-links Three "reachable from" → "AT" wording fixes (build_agents_md.py module docstring, .github/workflows/ci.yml comment, CHANGELOG.md entry). The strict check (git tag --points-at HEAD) is the load-bearing invariant — a wheel built from a commit between two release tags would silently ship as if it were the prior tag. Wording now matches implementation; the check is unchanged. Module docstring's section-4 description previously said patterns were inlined "verbatim", which became stale when _transform_pattern_content landed earlier in this PR. Updated to describe both transforms (heading demotion + relative-link rewrite). _transform_pattern_content gains a second regex, _PATTERN_INTRA_LINK_RE, that rewrites pattern-to-pattern bare-name .md references to in-document anchors (e.g., `(bypass-if-output-exists.md)` → `(#bypass-if-output-exists)`). The demoted H3 heading slug matches the filename slug, so the anchor resolves cleanly in the bundled single-file context. Closes the broken pattern cross-link in tool-dispatch-as-node's "Cross-references" section.
1 parent d7be34c commit 6b5cc13

10 files changed

Lines changed: 1558 additions & 4 deletions

File tree

.github/workflows/ci.yml

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,18 @@ jobs:
2626
# Conformance fixtures live in the openarmature-spec submodule.
2727
submodules: recursive
2828

29+
- name: Fetch submodule tags
30+
# actions/checkout's submodule clone is shallow and doesn't
31+
# carry tags. ``scripts/build_agents_md.py`` asserts the
32+
# submodule HEAD is AT a ``v*`` tag (``git tag --points-at
33+
# HEAD``; refuses to bundle draft spec text or text from
34+
# a commit between two release tags);
35+
# ``tests/test_agents_md_drift.py`` runs that assertion.
36+
# Fetch tag refs only — the HEAD commit is already present
37+
# from the submodule checkout, and we don't need history
38+
# beyond what tags point at.
39+
run: git -C openarmature-spec fetch --tags
40+
2941
- name: Install uv
3042
uses: astral-sh/setup-uv@08807647e7069bb48b6ef5acd8ec9567f424441b # v8.1.0
3143
with:

AGENTS.md

Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,20 @@
11
# AGENTS.md
22

3-
Orientation for coding agents working in this repo. `README.md` covers what
4-
the project is and how to use it; this file covers things that aren't
5-
obvious from reading the code.
3+
Orientation for coding agents working in **this repo** — i.e., agents
4+
contributing to openarmature itself. `README.md` covers what the project
5+
is and how to use it; this file covers things that aren't obvious from
6+
reading the code.
7+
8+
> **Two AGENTS.md files in this project. Different audiences.**
9+
>
10+
> - This file (`./AGENTS.md`, at the repo root) — for agents working on
11+
> the openarmature codebase. Package layout, test layout, tooling,
12+
> spec-submodule discipline, commit conventions.
13+
> - `src/openarmature/AGENTS.md` (shipped in the wheel) — for agents
14+
> working in user codebases that depend on openarmature. Capability
15+
> contracts, common patterns, non-obvious shapes, example index.
16+
> Generated by `scripts/build_agents_md.py` from canonical sources;
17+
> committed and CI-drift-checked.
618
719
## Spec is the source of truth
820

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). The
88

99
### Added
1010

11+
- **Bundled agent documentation at `openarmature/AGENTS.md`.** The wheel now ships a generated `AGENTS.md` file at the installed package root, agent-discoverable via `python -c "import openarmature; print(openarmature.__path__[0] + '/AGENTS.md')"`. Sections include a TL;DR, capability summaries pulled from the pinned spec submodule's §1 (Purpose) + §2 (Concepts), the patterns docs, hand-written non-obvious-shapes recipes, and a one-line example index. Generator lives at `scripts/build_agents_md.py`; the committed file is CI-drift-checked by `tests/test_agents_md_drift.py`. The submodule pin discipline (build refuses unless the submodule HEAD is AT a `v*` tag via `git tag --points-at HEAD`) prevents draft (untagged) spec text — or text from a commit between two release tags — from leaking into a release bundle. Adopting projects can point their own `AGENTS.md` / `CLAUDE.md` at this path so agent sessions in their codebase find it automatically.
1112
- **`FanOutInstanceProgress.result_is_error` field** (proposal 0027, accepted in spec v0.21.0). Explicit boolean discriminator on each per-instance entry in `CheckpointRecord.fan_out_progress``True` for `collect`-mode error contributions (roll forward into `errors_field`), `False` for success contributions (roll forward into `target_field`). The engine reads the explicit field on resume rather than inferring routing from `result`'s shape; the previous structural heuristic (`_looks_like_error_record`) is removed. Backward-compat path on load: pre-0027 records that omit the key default to `False`.
1213
- **Strict `CheckpointRecordInvalid` on fan-out count drift** (proposal 0029, accepted in spec v0.22.0). When the resumed run's resolved instance count differs from the saved `fan_out_progress` entry's `instance_count`, the engine raises `CheckpointRecordInvalid` before any fan-out instance work runs on the resumed path. Replaces the pre-0029 pad/truncate behavior which silently dropped `completed` contributions on shrink (breaking §10.11.1's exactly-once guarantee) and dispatched unsaved work on grow.
1314
- **`tool_choice` parameter on `Provider.complete()`** (proposal 0025, accepted in spec v0.20.0). Optional discriminated-union value constraining the model's tool-calling behavior — one of `"auto"`, `"required"`, `"none"`, or a `ForceTool(name=...)` record. Validation runs pre-send: `"required"` and `ForceTool` both demand non-empty `tools`, and `ForceTool.name` must appear in the supplied list; violations raise `ProviderInvalidRequest` (§7's existing category — no new error category). When `tool_choice` is `None` (the default) the wire field is omitted and the provider's own default applies, preserving pre-0025 behavior exactly. The `OpenAIProvider` maps the spec shape onto OpenAI's wire shape per §8.1.1 (the `ForceTool.type="tool"` renames to wire `type="function"`).

README.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -195,3 +195,13 @@ A few things to notice:
195195
- **API reference**: auto-generated from docstrings. [openarmature.ai/reference](https://openarmature.ai/reference/)
196196
- **Examples**: ten runnable demos with walk-throughs. [openarmature.ai/examples](https://openarmature.ai/examples/) (source at [./examples/](./examples/))
197197
- **Spec**: behavioral contract this implementation conforms to. [LunarCommand/openarmature-spec](https://github.com/LunarCommand/openarmature-spec)
198+
199+
## For AI agents
200+
201+
If you're an AI agent working in code that uses openarmature, read the bundled agent docs before editing:
202+
203+
```bash
204+
python -c "import openarmature; print(openarmature.__path__[0] + '/AGENTS.md')"
205+
```
206+
207+
The file ships with the package and covers capability contracts, common patterns, non-obvious shapes, and an example index. Adopting projects can point their own `AGENTS.md` / `CLAUDE.md` at this path so agent sessions in their codebase find it automatically.

docs/agent/non-obvious-shapes.md

Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
## Non-obvious shapes
2+
3+
Recipes that aren't deducible from the API surface alone. The primitives docs tell you what's possible; this section tells you what's smart.
4+
5+
### Declare a non-clobbering reducer on accumulator list fields
6+
7+
State fields default to `last_write_wins` — each node's write replaces the prior value for that field. For scalar fields (`status: str`, `count: int`) that's usually what you want. For list fields that accumulate contributions across multiple nodes (`messages: list[Message]`, `events: list[Event]`, `results: list[Result]`), it's the wrong default — every node's contribution silently clobbers everything before it.
8+
9+
Declare `append` (or another non-clobbering reducer) at the state class:
10+
11+
```python
12+
from typing import Annotated
13+
from pydantic import Field
14+
from openarmature.graph import State, append
15+
16+
class WorkflowState(State):
17+
messages: Annotated[list[Message], append] = Field(default_factory=list)
18+
events: Annotated[list[Event], append] = Field(default_factory=list)
19+
final_status: str = "pending" # last_write_wins is fine here
20+
```
21+
22+
The failure mode without `append` is silent and easy to misdiagnose — the final state shows only the last node's contribution to the list, with no error. Common "why is my accumulator empty?" question. `merge` is the equivalent for `dict[str, V]` fields that accumulate keys across nodes.
23+
24+
### Branch on `Response.finish_reason` before reading `message.content`
25+
26+
After `await provider.complete(messages, tools=[...])` returns, the shape of `Response` varies by `finish_reason`:
27+
28+
- `finish_reason == "stop"` — assistant produced a content response. `message.content` carries the text; `message.tool_calls` is empty.
29+
- `finish_reason == "tool_calls"` — assistant emitted tool calls. `message.tool_calls` carries the list; `message.content` is typically empty (model didn't say anything beyond the tool calls).
30+
- `finish_reason == "length"` / `"content_filter"` / `"error"` — completion was cut off or refused; `message.content` may be partial or empty.
31+
32+
Post-LLM logic that reads `message.content` without checking `finish_reason` misses the entire tool-calling path:
33+
34+
```python
35+
response = await provider.complete(messages, tools=tools)
36+
37+
if response.finish_reason == "tool_calls":
38+
# Dispatch each tool call, append ToolMessage responses, re-call complete()
39+
for tc in response.message.tool_calls:
40+
result = dispatch_tool(tc.name, tc.arguments)
41+
messages.append(ToolMessage(content=result, tool_call_id=tc.id))
42+
response = await provider.complete(messages, tools=tools)
43+
elif response.finish_reason == "stop":
44+
handle_text(response.message.content)
45+
else:
46+
handle_error_or_partial(response)
47+
```
48+
49+
The discriminator is one branch; missing it gives you empty data on tool-call responses and silently wrong behavior on truncations.
50+
51+
### `disable_llm_payload` defaults to `True` — flip it for LLM-aware observability backends
52+
53+
The `OTelObserver` (and any spec-conformant observer reading LLM events) defaults `disable_llm_payload: bool = True` per spec §5.5's "default-off by privacy" framing. Without flipping the flag, LLM spans carry GenAI semconv attributes (token counts, model name, finish reason) but NOT the message payload (input messages, response content, request extras).
54+
55+
That's the right default for general OpenArmature use — payloads may contain PII the user hasn't audited, and storage cost grows with prompt size. But it's the WRONG default if you're wiring up an LLM-aware observability backend (Langfuse, Phoenix, Honeycomb's LLM lens) that renders the message stream as part of its generation view. Backends will show "empty" generations and you'll wonder why.
56+
57+
Flip the flag once at observer construction:
58+
59+
```python
60+
from openarmature.observability import OTelObserver
61+
62+
observer = OTelObserver(
63+
span_processor=your_exporter,
64+
disable_llm_payload=False, # opt in to message-payload attributes
65+
)
66+
compiled.attach_observer(observer)
67+
```
68+
69+
The companion `disable_genai_semconv` flag defaults to `False` — GenAI semconv attributes emit by default since they're how LLM-aware backends render anything at all. Don't flip that one unless you're routing GenAI emission through a different layer.
70+
71+
### Use the bundled `FilesystemCheckpointer` or `SQLiteCheckpointer`, not a hand-rolled serializer
72+
73+
The temptation when persisting graph state is to `json.dumps(state.model_dump())` and write to a file. Don't. The shipped Checkpointer backends handle every contract `openarmature.checkpoint.Checkpointer` defines — round-trip integrity, `parent_states` for inner-save resume, fan-out progress tracking, schema-version migration, listing by `correlation_id`, `CheckpointRecordInvalid` on shape drift. A hand-rolled serializer that "works" on the happy path silently fails the moment a fan-out crash leaves an in-flight save record, and you'll be debugging it for hours before realizing the bundled backend exists.
74+
75+
If your storage requirement isn't local disk (`FilesystemCheckpointer`) or local SQLite (`SQLiteCheckpointer` — also supports `:memory:` and arbitrary file paths), implement the `Checkpointer` Protocol against your backend rather than wrapping state serialization yourself. Custom backends inherit the spec's correctness contract for free.
76+
77+
### Subgraphs > conditional-edge spaghetti when branches don't share state
78+
79+
A common shape is "after this LLM call, route to either a JSON-extraction node or a tool-dispatch node depending on `finish_reason`." The naive solution is two conditional edges from the LLM node, one to each downstream. That works for two branches; it scales poorly past three.
80+
81+
When the branches operate on different sub-shapes of state — e.g., one path is "extract JSON, then validate" while another is "dispatch tools, loop until done, then summarize" — encapsulate each as a `SubgraphNode` and route from the LLM node to the right subgraph. Each subgraph has its own state schema (projected from the parent), its own entry node, and its own internal topology. The parent graph becomes a switchboard with a few edges; the complexity lives one layer down where it composes cleanly.
82+
83+
### Be explicit with `tool_choice`; don't trust the provider's default
84+
85+
`Provider.complete(messages, tools, tool_choice=...)` accepts `"auto"`, `"required"`, `"none"`, or a `ForceTool(name=...)` record. When you omit `tool_choice`, the OpenAI provider's own default applies — usually `"auto"` when `tools` is non-empty, but documented per-provider. A pipeline that wants deterministic tool-calling (a routing node that MUST produce a tool call, a guarded LLM call that MUST NOT call tools) should pin `tool_choice` explicitly rather than relying on the provider default.
86+
87+
Pre-send validation catches the three §5 failure modes (`required` with empty tools, `ForceTool` with empty tools, `ForceTool.name` not in tools) and raises `ProviderInvalidRequest` before the HTTP call. Not all providers honor `tool_choice` — confirm with your provider's docs — but the OpenAI-compatible mapping is in `OpenAIProvider`.
88+
89+
### Always `await graph.drain()` in short-lived processes; supply a `timeout` if observers might hang
90+
91+
`CompiledGraph.invoke()` returns when the graph reaches END or raises; observer events are dispatched onto a per-invocation queue and delivered by a background worker. The graph's execution loop never awaits observer processing. In a long-running service this is invisible — the worker drains naturally. In a CLI, script, or serverless function, the process exits before the worker finishes, and any late observer events (typically the last node's `completed` event plus any `checkpoint_saved` events) get dropped.
92+
93+
Always call `await graph.drain()` before the short-lived process exits. If your observer set includes anything that might hang (a metrics observer with a flaky network endpoint, an OTel exporter behind a slow OTLP collector), supply a `timeout`:
94+
95+
```python
96+
summary = await graph.drain(timeout=5.0)
97+
if summary.timeout_reached:
98+
log.warning("drain truncated: %d events undelivered", summary.undelivered_count)
99+
```
100+
101+
The compiled graph stays usable for subsequent invocations after a timed-out drain — workers are cancelled cleanly, no partial state leaks.
102+
103+
### Three exception hierarchies; know which one your code catches
104+
105+
`openarmature` exceptions split across three sibling hierarchies:
106+
107+
- `RuntimeGraphError` (in `openarmature.graph`) — node execution failures: `NodeException`, `RoutingError`, `EdgeException`, `ReducerError`, `StateValidationError`. Each has a `category` string matching the spec's canonical error categories.
108+
- `CheckpointError` (in `openarmature.checkpoint`) — persistence failures: `CheckpointNotFound`, `CheckpointSaveFailed`, `CheckpointRecordInvalid`, `CheckpointStateMigrationMissing`, `CheckpointStateMigrationFailed`, `CheckpointStateMigrationChainAmbiguous`.
109+
- `LlmProviderError` (in `openarmature.llm`) — provider call failures: `ProviderAuthentication`, `ProviderInvalidRequest`, `ProviderInvalidResponse`, `ProviderInvalidModel`, `ProviderModelNotLoaded`, `ProviderRateLimit`, `ProviderUnavailable`, `ProviderUnsupportedContentBlock`, `StructuredOutputInvalid`.
110+
111+
Catching `Exception` works but is too broad; catching one hierarchy misses the other two. If you want to branch on category strings (e.g., for retry logic), catch the relevant base — `RuntimeGraphError` covers all five spec runtime categories, `LlmProviderError` covers all nine provider categories, `CheckpointError` covers all six checkpoint categories. The `TRANSIENT_CATEGORIES` frozenset in `openarmature.llm` enumerates which provider categories are retriable.

docs/agent/tldr.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
OpenArmature is a workflow framework for LLM pipelines and tool-calling agents — typed state, compile-time topology checks, observability, and crash-safe checkpoints baked into a graph engine. The graph layer has no concept of LLMs or tools; the same primitives drive deterministic ETL pipelines and tool-calling agents alike. Nodes return partial updates; the engine merges into a frozen state snapshot. Behavior is defined by [openarmature-spec](https://openarmature.org/capabilities/) and verified by conformance fixtures; this package is the reference Python implementation.
2+
3+
**What OpenArmature is NOT:** not a chat framework (no built-in messages channel), not an LLM SDK (Provider is the abstraction layer; OpenAIProvider is the canonical impl), not a state-management library (state is per-invocation, not application-wide), not an evaluation framework (deferred to `openarmature-eval`).

0 commit comments

Comments
 (0)