Skip to content

Commit 8519ba1

Browse files
Add chat-with-multimodal example (#116)
* Add chat-with-multimodal example New examples/11-chat-with-multimodal/ demonstrates the headline of proposal 0046 (ChatPrompt + PlaceholderSegment, shipped in v0.11.0) end-to-end: - ChatPrompt with ContentSegment (system + user) and PlaceholderSegment for chat-history injection. - Multi-turn conversation memory threaded through state via the ``append`` reducer; each turn's render() sees the full prior history through the placeholder slot. - Multimodal turn: one of four scripted turns attaches a NASA public-domain Apollo 16 LM "Orion" photograph via ImageURLBlockTemplate. Same chat template, only the trailing user ContentSegment's content shape changes (string vs content-blocks list); system + placeholder segments are identical across both shapes. - Error handling at the invoke() boundary: try/except NodeException in main(), inspect exc.__cause__ for LlmProviderError to surface the canonical category string. Comments name the other two legitimate handler locations (RetryMiddleware, node-internal try/except). - Streaming transcript: each turn prints from inside the respond node body so the conversation arrives as the graph executes rather than waiting for invoke() to return. A 0.5s ``_TURN_DELAY_S`` pacing constant lets the reader follow along. - ``--traces`` argparse flag (default off) opts in to the OTel observer with a console exporter. Without it the chat runs without any observer attached; with it, JSON spans stream to stderr alongside the conversation on stdout. Complementary to example 09 (tool calling); chat history threading and tool calling are separate primitives. Example 03 owns the observer-hooks story, so this example points readers there for the observability details rather than re-teaching them. Bonus pickup: docs/examples/index.md was missing example 10 from its catalog before this PR. Caught alongside the example 11 entry and added. * Fix three docstring inconsistencies per PR review Three CoPilot findings on PR #116: 1. main.py docstring referenced PlaceholderSegment(name="history") but the actual field is named ``placeholder``. The code's chat-template construction uses ``placeholder=`` correctly; only the docstring narrative was wrong. A reader copy-pasting from the docstring would have hit a Pydantic ValidationError. 2. Walk-through doc's intro paragraph still said "Apollo 11 Lunar Module" after the image URL swap to the Apollo 16 "Orion" shot. The code's docstring + inline comment got updated when the URL changed but the walk-through intro was missed. 3. Sample "Reading the output" block showed an ``upload.wikimedia.org`` image URL the example explicitly warns against. Updated to the actual default ``images-assets.nasa.gov`` URL so the sample matches a real run.
1 parent dc13b71 commit 8519ba1

6 files changed

Lines changed: 704 additions & 0 deletions

File tree

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,8 @@ The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). The
1313

1414
### Added
1515

16+
- **Chat-with-multimodal example.** `examples/11-chat-with-multimodal/` demonstrates `ChatPrompt` + `PlaceholderSegment` (proposal 0046) end-to-end: a four-turn lunar-mission Q&A conversation with conversation memory threaded through state, one mid-conversation turn attaching a photograph via `ImageURLBlockTemplate`, the agent processing the multimodal turn naturally without changing the chat-history shape. Complementary to example 09 (tool use); chat history threading and tool calling are separate primitives.
17+
- **`docs/examples/index.md` catalog now lists example 10.** A pre-existing gap (the Langfuse-observability example was missing from the catalog) caught and fixed alongside the example 11 entry.
1618
- **PyPI + spec-version shields on the docs homepage.** `docs/index.md` now carries dynamic shields for the published PyPI version and the pinned spec version, sourced from `img.shields.io`. Both auto-update on every publish or spec bump; no maintenance burden. Mirrors the same shield URLs the README already uses.
1719
- **vLLM production deployment notes.** `docs/model-providers/vllm.md` grows a "Production deployment" section covering the `VLLM_HTTP_TIMEOUT_KEEP_ALIVE` gotcha (vLLM's stock 5s uvicorn keep-alive lapses pooled OA-side httpx connections and surfaces as `ProviderUnavailable`; widen to roughly 300s), a systemd unit skeleton, and the three throughput knobs that interact with OA's shared connection pool (`--max-model-len`, `--max-num-seqs`, `--gpu-memory-utilization`). The existing "Tool calling" section grows a `--tool-call-parser` family table verified against vLLM's docs (Llama 3.x / Llama 4 / Mistral / Hermes / Qwen3 / DeepSeek V3 / GPT-OSS), plus explicit "not supported here" callouts for Anthropic / Gemini (proprietary cloud) and mainstream Gemma (no vLLM parser).
1820
- **Three new patterns docs.** `docs/patterns/state-migration-on-resume.md`, `docs/patterns/caller-supplied-trace-identifiers.md`, and `docs/patterns/observer-state-reconciliation.md` graduate the corresponding entries from `docs/agent/non-obvious-shapes.md` into full pattern recipes with code snippets and "when this is right / when it isn't" guidance. The programmatic patterns API (`openarmature.patterns.list()` / `get(name)`) grows from 4 to 7 entries.
Lines changed: 160 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,160 @@
1+
# 11 - Chat with multi-turn memory and a multimodal turn
2+
3+
A lunar-mission Q&A assistant that maintains conversation context
4+
across four turns. One mid-conversation turn includes an attached
5+
photograph (Apollo 16 Lunar Module "Orion" on the lunar surface):
6+
the user asks about it, the agent processes the multimodal turn
7+
naturally without changing the chat-history shape.
8+
9+
## Overview
10+
11+
The user has a four-turn conversation with the assistant. Turns 1,
12+
2, and 4 are text-only; turn 3 attaches a photograph and asks the
13+
agent to describe it. Throughout the conversation, the agent
14+
maintains memory: turn 2 references "it" from turn 1, turn 4
15+
references "the LM you described" from turn 3.
16+
17+
The whole thing rides on one `ChatPrompt` template:
18+
19+
- A `ContentSegment(role="system", ...)` holds the assistant's
20+
persona and response style.
21+
- A `PlaceholderSegment(placeholder="history")` is the slot where
22+
the caller injects the prior conversation.
23+
- A trailing `ContentSegment(role="user", ...)` carries the current
24+
turn's question. For text-only turns its `content` is a string;
25+
for the multimodal turn its `content` is a list of content-block
26+
templates (`TextBlockTemplate` + `ImageURLBlockTemplate`).
27+
28+
Chat history lives on state as `Annotated[list[Message], append]`.
29+
After each turn the `respond` node appends two messages to history
30+
(the rendered user turn + the assistant response), and the next
31+
turn's `render()` injects the grown history into the placeholder.
32+
33+
## What it teaches
34+
35+
- [`ChatPrompt`](../concepts/prompts.md) with
36+
[`ContentSegment`](../concepts/prompts.md) and
37+
[`PlaceholderSegment`](../concepts/prompts.md) (proposal 0046,
38+
spec v0.38.0). The placeholder is how multi-turn chat history
39+
shapes get injected at render time.
40+
- The same chat template can carry an
41+
[`ImageURLBlockTemplate`](../concepts/prompts.md) when the
42+
current user turn includes an image. The `content` field on the
43+
user `ContentSegment` switches between a single `str` (text-only)
44+
and a `list[ContentBlockTemplate]` (multimodal); the system and
45+
placeholder segments are identical across both shapes.
46+
- [`PromptManager.render(prompt, placeholders={"history":
47+
state.history})`](../reference/prompts.md) injects the message
48+
list at the placeholder slot. An empty list is valid (first-turn
49+
case); the rendered messages become just
50+
`[system, current_user_turn]` with no prior history.
51+
- Multi-turn memory threaded through state via the `append`
52+
reducer. Each `respond` call appends `[current_user_message,
53+
assistant_response]` to history; reading history on the next turn
54+
produces the running conversation.
55+
- The graph is a single `respond` node with a conditional edge that
56+
loops back to itself until the script-supplied user turns are
57+
exhausted, then routes to `END`. The cycle is
58+
[`respond → respond → respond → … → END`](../concepts/graphs.md).
59+
- Complementary to [example 09 (tool use)](09-tool-use.md): chat
60+
history threading and tool calling are separate primitives.
61+
Example 09 shows the LLM emitting tool calls and the framework
62+
dispatching them; this example shows how the prompt-management
63+
layer composes a multi-turn conversation. A production chat agent
64+
often combines both.
65+
66+
## How to run
67+
68+
```bash
69+
uv sync --group examples --all-extras
70+
71+
# Clean conversation output only (default).
72+
LLM_API_KEY=sk-... uv run python examples/11-chat-with-multimodal/main.py
73+
74+
# With OTel JSON spans streaming to stderr alongside the chat.
75+
LLM_API_KEY=sk-... uv run python examples/11-chat-with-multimodal/main.py --traces
76+
```
77+
78+
`LLM_MODEL` must point at a vision-capable model. The default
79+
(`gpt-4o-mini`) qualifies. For a different image, set `IMAGE_URL`
80+
to any publicly-reachable image URL.
81+
82+
The conversation streams to stdout as each turn completes (a small
83+
visual delay between turns lets the human reader follow along). The
84+
`--traces` flag opts in to the OTel observer with a console
85+
exporter; without it the chat runs without any observer attached.
86+
Example 03 owns the observer-hooks story end-to-end; this example's
87+
headline is the chat shape, not the observability wiring.
88+
89+
The demo is illustrative only: it runs four pre-scripted user turns
90+
sequentially in one process. A real chat-server runtime would
91+
manage one invocation per turn with the chat history persisted
92+
across sessions (e.g., via a checkpointer keyed on session_id);
93+
that's [example 08 (checkpointing)](08-checkpointing-and-migration.md)'s
94+
territory, combined with this one's chat shape.
95+
96+
## The graph
97+
98+
```mermaid
99+
flowchart TD
100+
start([start])
101+
respond[respond]
102+
stop([end])
103+
104+
start --> respond
105+
respond -->|more user turns scripted| respond
106+
respond -->|user turns exhausted| stop
107+
```
108+
109+
`route_after_respond` returns `"respond"` while
110+
`state.next_turn_index < len(state.user_turns)` and `END` otherwise.
111+
Each loop iteration renders the current chat template, calls the
112+
LLM, and updates state.
113+
114+
## Reading the output
115+
116+
```
117+
=== openarmature chat-with-multimodal demo ===
118+
Image URL: https://images-assets.nasa.gov/image/as16-113-18334/...
119+
Scripted turns: 4
120+
121+
--- Turn 0 ---
122+
USER: What was the primary objective of Apollo 11?
123+
ASSISTANT: The primary objective of Apollo 11 was to perform a
124+
manned lunar landing and safely return the crew to Earth ...
125+
126+
--- Turn 1 ---
127+
USER: And what year did it launch?
128+
ASSISTANT: Apollo 11 launched on July 16, 1969.
129+
130+
--- Turn 2 [+image] ---
131+
USER: I have a photograph of the Lunar Module. What's
132+
distinctive about its design?
133+
ASSISTANT: The Apollo Lunar Module had a distinctive two-stage,
134+
spider-like configuration ...
135+
136+
--- Turn 3 ---
137+
USER: Given what you described about the LM, was that design
138+
reused on later Apollo missions?
139+
ASSISTANT: Yes, the same basic LM design was used on Apollo 12
140+
through 17 ...
141+
142+
=== history length: 8 messages (4 user/assistant turns) ===
143+
```
144+
145+
- **Turn 1 builds on turn 0** without you having to re-mention
146+
Apollo 11. The history placeholder injected the prior `[user_0,
147+
assistant_0]` pair, so the model sees the question "what year did
148+
it launch" in context.
149+
- **Turn 2 is the multimodal one** (`[+image]` tag in the trace).
150+
The user `ContentSegment` for this turn carries
151+
`[TextBlockTemplate(text=...), ImageURLBlockTemplate(url=...)]`
152+
instead of a plain string; the model receives both blocks in one
153+
user message and answers about the image.
154+
- **Turn 3 references "the LM you described"** from turn 2. The
155+
history at this point contains all six prior messages (system is
156+
not in history; it comes from the template every render). The
157+
model carries the multimodal context forward without you having
158+
to re-attach the image.
159+
- **History length 8 = 4 (user, assistant) pairs.** No system
160+
message in history; the template adds it on every render.

docs/examples/index.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,14 @@ in the repo.
4343
- [**09 - Tool use**](09-tool-use.md). Lunar-mission assistant that
4444
calls local Python tools to answer questions mixing fact recall and
4545
physics arithmetic.
46+
- [**10 - Langfuse observability**](10-langfuse-observability.md).
47+
Send LLM-call observability natively to Langfuse with a prompt-
48+
linkage demonstration on a mission-briefing Q&A pipeline.
49+
- [**11 - Chat with multimodal**](11-chat-with-multimodal.md). Four-
50+
turn lunar-mission conversation with conversation memory threaded
51+
through `ChatPrompt` + `PlaceholderSegment`. One turn attaches a
52+
photograph; the agent processes it without changing the chat
53+
shape.
4654

4755
## Configuration
4856

0 commit comments

Comments
 (0)