Real LLM tool-calling against a live SuperDoc document. The user types a natural-language request; the model picks a tool; the browser executes it against the Document API; the document updates. The OPENAI_API_KEY never leaves the server.
src/tool.ts— the Document API wrapper. One function:addFootnoteCitation(api, { sourceText }). Wrapsselection.current+footnotes.insert+footnotes.listand returns a typed receipt.src/agent.ts— the tool-use loop.runAgentTurnposts the user message, dispatches anytool_callsthe model returns to local handlers, sends results back, and emits events to the UI. SDK-agnostic — speaks the Chat Completions message shape but doesn't import any provider SDK.src/App.tsx— the UI. Mounts SuperDoc, captures the user prompt, bindsaddFootnoteCitationto the liveeditor.docas a handler, callsrunAgentTurn, renders chat rows.server.mjs— the proxy. Declares the tool schema (strict: true,parallel_tool_calls: false), forwards turn requests toopenai.chat.completions.create, returns the assistant message untouched.
Browser ──POST /api/turn──▶ Node proxy ──▶ OpenAI
▲ │
│ message or ◀┘
│ tool_calls
▼
editor.doc.* (tool execution lives here)
│
└── POST /api/turn with the tool result, loop until the model returns text
| Surface | Owns |
|---|---|
| Server | API key, model client, tool schema |
| Browser | Editor, Document API, tool impl |
The browser owns tool execution because editor.doc lives there. The server has no editor. So the server runs the model conversation; the browser runs the document.
- Add a handler in
src/App.tsx'shandlersmap. - Mirror the JSON schema in
server.mjs'sTOOLSarray.
That's it. The loop and dispatch in src/agent.ts are tool-agnostic.
cp .env.example .env # then add your OPENAI_API_KEY
pnpm install
pnpm dev # Node proxy + Vite, run togetherOpen http://localhost:5181. Click into the paragraph, then send a message like:
Add a footnote citing Doe's 2024 cloud reliability paper.
The chat shows: user → used addFootnoteCitation · ok → one-line assistant confirmation. The doc shows the superscript marker.
- Non-streaming: each
/api/turncall is request/response. For a streaming-token UX layered on top of tool calls, swap to the Responses API or SSE per-event delivery. - Each
sendstarts a fresh tool loop — prior turns are not preserved. For multi-turn conversations, liftmessagesinto app state and append rather than replace. - For production, add auth, rate limiting, a stricter iteration cap, and reject tool calls that aren't in the registry.
- examples/ai/streaming — SSE token streaming into a document (no tool use).