LunarCommand · chris-colinsky · Jun 1, 2026 · Jun 1, 2026 · Jun 1, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -13,6 +13,8 @@ The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). The
 
 ### Added
 
+- **Chat-with-multimodal example.** `examples/11-chat-with-multimodal/` demonstrates `ChatPrompt` + `PlaceholderSegment` (proposal 0046) end-to-end: a four-turn lunar-mission Q&A conversation with conversation memory threaded through state, one mid-conversation turn attaching a photograph via `ImageURLBlockTemplate`, the agent processing the multimodal turn naturally without changing the chat-history shape. Complementary to example 09 (tool use); chat history threading and tool calling are separate primitives.
+- **`docs/examples/index.md` catalog now lists example 10.** A pre-existing gap (the Langfuse-observability example was missing from the catalog) caught and fixed alongside the example 11 entry.
 - **PyPI + spec-version shields on the docs homepage.** `docs/index.md` now carries dynamic shields for the published PyPI version and the pinned spec version, sourced from `img.shields.io`. Both auto-update on every publish or spec bump; no maintenance burden. Mirrors the same shield URLs the README already uses.
 - **vLLM production deployment notes.** `docs/model-providers/vllm.md` grows a "Production deployment" section covering the `VLLM_HTTP_TIMEOUT_KEEP_ALIVE` gotcha (vLLM's stock 5s uvicorn keep-alive lapses pooled OA-side httpx connections and surfaces as `ProviderUnavailable`; widen to roughly 300s), a systemd unit skeleton, and the three throughput knobs that interact with OA's shared connection pool (`--max-model-len`, `--max-num-seqs`, `--gpu-memory-utilization`). The existing "Tool calling" section grows a `--tool-call-parser` family table verified against vLLM's docs (Llama 3.x / Llama 4 / Mistral / Hermes / Qwen3 / DeepSeek V3 / GPT-OSS), plus explicit "not supported here" callouts for Anthropic / Gemini (proprietary cloud) and mainstream Gemma (no vLLM parser).
 - **Three new patterns docs.** `docs/patterns/state-migration-on-resume.md`, `docs/patterns/caller-supplied-trace-identifiers.md`, and `docs/patterns/observer-state-reconciliation.md` graduate the corresponding entries from `docs/agent/non-obvious-shapes.md` into full pattern recipes with code snippets and "when this is right / when it isn't" guidance. The programmatic patterns API (`openarmature.patterns.list()` / `get(name)`) grows from 4 to 7 entries.

diff --git a/docs/examples/11-chat-with-multimodal.md b/docs/examples/11-chat-with-multimodal.md
@@ -0,0 +1,160 @@
+# 11 - Chat with multi-turn memory and a multimodal turn
+
+A lunar-mission Q&A assistant that maintains conversation context
+across four turns. One mid-conversation turn includes an attached
+photograph (Apollo 11 Lunar Module on the lunar surface): the user
+asks about it, the agent processes the multimodal turn naturally
+without changing the chat-history shape.
+
+## Overview
+
+The user has a four-turn conversation with the assistant. Turns 1,
+2, and 4 are text-only; turn 3 attaches a photograph and asks the
+agent to describe it. Throughout the conversation, the agent
+maintains memory: turn 2 references "it" from turn 1, turn 4
+references "the LM you described" from turn 3.
+
+The whole thing rides on one `ChatPrompt` template:
+
+- A `ContentSegment(role="system", ...)` holds the assistant's
+  persona and response style.
+- A `PlaceholderSegment(placeholder="history")` is the slot where
+  the caller injects the prior conversation.
+- A trailing `ContentSegment(role="user", ...)` carries the current
+  turn's question. For text-only turns its `content` is a string;
+  for the multimodal turn its `content` is a list of content-block
+  templates (`TextBlockTemplate` + `ImageURLBlockTemplate`).
+
+Chat history lives on state as `Annotated[list[Message], append]`.
+After each turn the `respond` node appends two messages to history
+(the rendered user turn + the assistant response), and the next
+turn's `render()` injects the grown history into the placeholder.
+
+## What it teaches
+
+- [`ChatPrompt`](../concepts/prompts.md) with
+  [`ContentSegment`](../concepts/prompts.md) and
+  [`PlaceholderSegment`](../concepts/prompts.md) (proposal 0046,
+  spec v0.38.0). The placeholder is how multi-turn chat history
+  shapes get injected at render time.
+- The same chat template can carry an
+  [`ImageURLBlockTemplate`](../concepts/prompts.md) when the
+  current user turn includes an image. The `content` field on the
+  user `ContentSegment` switches between a single `str` (text-only)
+  and a `list[ContentBlockTemplate]` (multimodal); the system and
+  placeholder segments are identical across both shapes.
+- [`PromptManager.render(prompt, placeholders={"history":
+  state.history})`](../reference/prompts.md) injects the message
+  list at the placeholder slot. An empty list is valid (first-turn
+  case); the rendered messages become just
+  `[system, current_user_turn]` with no prior history.
+- Multi-turn memory threaded through state via the `append`
+  reducer. Each `respond` call appends `[current_user_message,
+  assistant_response]` to history; reading history on the next turn
+  produces the running conversation.
+- The graph is a single `respond` node with a conditional edge that
+  loops back to itself until the script-supplied user turns are
+  exhausted, then routes to `END`. The cycle is
+  [`respond → respond → respond → … → END`](../concepts/graphs.md).
+- Complementary to [example 09 (tool use)](09-tool-use.md): chat
+  history threading and tool calling are separate primitives.
+  Example 09 shows the LLM emitting tool calls and the framework
+  dispatching them; this example shows how the prompt-management
+  layer composes a multi-turn conversation. A production chat agent
+  often combines both.
+
+## How to run
+
+```bash
+uv sync --group examples --all-extras
+
+# Clean conversation output only (default).
+LLM_API_KEY=sk-... uv run python examples/11-chat-with-multimodal/main.py
+
+# With OTel JSON spans streaming to stderr alongside the chat.
+LLM_API_KEY=sk-... uv run python examples/11-chat-with-multimodal/main.py --traces
+```
+
+`LLM_MODEL` must point at a vision-capable model. The default
+(`gpt-4o-mini`) qualifies. For a different image, set `IMAGE_URL`
+to any publicly-reachable image URL.
+
+The conversation streams to stdout as each turn completes (a small
+visual delay between turns lets the human reader follow along). The
+`--traces` flag opts in to the OTel observer with a console
+exporter; without it the chat runs without any observer attached.
+Example 03 owns the observer-hooks story end-to-end; this example's
+headline is the chat shape, not the observability wiring.
+
+The demo is illustrative only: it runs four pre-scripted user turns
+sequentially in one process. A real chat-server runtime would
+manage one invocation per turn with the chat history persisted
+across sessions (e.g., via a checkpointer keyed on session_id);
+that's [example 08 (checkpointing)](08-checkpointing-and-migration.md)'s
+territory, combined with this one's chat shape.
+
+## The graph
+
+```mermaid
+flowchart TD
+  start([start])
+  respond[respond]
+  stop([end])
+
+  start --> respond
+  respond -->|more user turns scripted| respond
+  respond -->|user turns exhausted| stop
+```
+
+`route_after_respond` returns `"respond"` while
+`state.next_turn_index < len(state.user_turns)` and `END` otherwise.
+Each loop iteration renders the current chat template, calls the
+LLM, and updates state.
+
+## Reading the output
+
+```
+=== openarmature chat-with-multimodal demo ===
+Image URL: https://upload.wikimedia.org/...
+Scripted turns: 4
+
+--- Turn 0 ---
+USER:      What was the primary objective of Apollo 11?
+ASSISTANT: The primary objective of Apollo 11 was to perform a
+manned lunar landing and safely return the crew to Earth ...
+
+--- Turn 1 ---
+USER:      And what year did it launch?
+ASSISTANT: Apollo 11 launched on July 16, 1969.
+
+--- Turn 2 [+image] ---
+USER:      I have a photograph of the Lunar Module. What's
+distinctive about its design?
+ASSISTANT: The Apollo Lunar Module had a distinctive two-stage,
+spider-like configuration ...
+
+--- Turn 3 ---
+USER:      Given what you described about the LM, was that design
+reused on later Apollo missions?
+ASSISTANT: Yes, the same basic LM design was used on Apollo 12
+through 17 ...
+
+=== history length: 8 messages (4 user/assistant turns) ===
+```
+
+- **Turn 1 builds on turn 0** without you having to re-mention
+  Apollo 11. The history placeholder injected the prior `[user_0,
+  assistant_0]` pair, so the model sees the question "what year did
+  it launch" in context.
+- **Turn 2 is the multimodal one** (`[+image]` tag in the trace).
+  The user `ContentSegment` for this turn carries
+  `[TextBlockTemplate(text=...), ImageURLBlockTemplate(url=...)]`
+  instead of a plain string; the model receives both blocks in one
+  user message and answers about the image.
+- **Turn 3 references "the LM you described"** from turn 2. The
+  history at this point contains all six prior messages (system is
+  not in history; it comes from the template every render). The
+  model carries the multimodal context forward without you having
+  to re-attach the image.
+- **History length 8 = 4 (user, assistant) pairs.** No system
+  message in history; the template adds it on every render.
diff --git a/docs/examples/index.md b/docs/examples/index.md
@@ -43,6 +43,14 @@ in the repo.
 - [**09 - Tool use**](09-tool-use.md). Lunar-mission assistant that
   calls local Python tools to answer questions mixing fact recall and
   physics arithmetic.
+- [**10 - Langfuse observability**](10-langfuse-observability.md).
+  Send LLM-call observability natively to Langfuse with a prompt-
+  linkage demonstration on a mission-briefing Q&A pipeline.
+- [**11 - Chat with multimodal**](11-chat-with-multimodal.md). Four-
+  turn lunar-mission conversation with conversation memory threaded
+  through `ChatPrompt` + `PlaceholderSegment`. One turn attaches a
+  photograph; the agent processes it without changing the chat
+  shape.
 
 ## Configuration