Skip to content

Commit 2dd74db

Browse files
Merge branch 'main' into fix-issue-689
2 parents d0afbf4 + b02b195 commit 2dd74db

File tree

5 files changed

+282
-5
lines changed

5 files changed

+282
-5
lines changed
609 KB
Loading

src/blog/tanstack-ai-code-mode.md

Lines changed: 277 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,277 @@
1+
---
2+
title: 'Code Mode: Let Your AI Write Programs, Not Just Call Tools'
3+
published: 2026-04-08
4+
excerpt: One tool call at a time is the bottleneck. TanStack AI Code Mode lets the LLM write and execute TypeScript programs in secure sandboxes, composing your tools with loops, conditionals, and Promise.all in a single shot.
5+
authors:
6+
- Jack Herrington
7+
- Alem Tuzlak
8+
- Tanner Linsley
9+
---
10+
11+
![TanStack AI Code Mode](/blog-assets/tanstack-ai-code-mode/header.jpg)
12+
13+
There are three things we know about LLMs and tools:
14+
15+
1. **Tool calling is slow and expensive.** Every tool call means a round-trip. The calls themselves, their parameters, and their return values all bloat the context window, which bloats the token count on every subsequent request. Even when calls can be parallelized, the overhead compounds.
16+
2. **LLMs are hilariously bad at math.** Ask a model to sum a column of numbers from three API responses and you'll get a confident, but probably incorrect answer.
17+
3. **Frontier LLMs are very good at writing TypeScript.** They have an enormous amount of real-world TypeScript in their training data. They know how to `Promise.all`, `.reduce()`, `.filter()`, and handle async control flow.
18+
19+
Put these together and they point in one direction: let the LLM do what it's good at (write TypeScript), and let a runtime handle what it's bad at (execution, math, orchestration).
20+
21+
That's exactly what **Code Mode** is. Code mode gives the LLM an `execute_typescript` tool. Instead of orchestrating tools one at a time, the model writes a short TypeScript program that composes your tools with loops, conditionals, and data transformations, then executes it in a secure sandbox. One call in, one structured result out.
22+
23+
If you've tried to connect an LLM to your APIs, you know the pain. The model has no concept of N+1 problems. It will happily fetch a list of 50 users, then make 50 individual requests to get each user's profile. It doesn't batch. It doesn't parallelize. It doesn't know how to aggregate results without burning a reasoning step per intermediate value.
24+
25+
Code Mode hands those problems to TypeScript. The model writes `Promise.all` to batch API calls. It uses `.reduce()` to aggregate instead of asking the LLM to reason about each value. Math happens in the JavaScript runtime, not in the model's token prediction. **The N+1 problem disappears. The arithmetic is correct. Every time.**
26+
27+
This is a game changer for any application that wants to put an LLM in front of real APIs.
28+
29+
## Prior Art: Why Code Execution Matters
30+
31+
We're not the first to recognize this pattern.
32+
33+
**Anthropic's computer use and tool-use research** has shown that LLMs are dramatically more capable when they can write and execute code rather than being limited to predefined tool signatures. Their work on Claude's analysis tool demonstrated that giving models a code sandbox reduces error rates and increases task completion on complex multi-step problems.
34+
35+
**Cloudflare coined the term "Code Mode"** in [their September 2025 blog post](https://blog.cloudflare.com/code-mode/) by Kenton Varda and Sunil Pai. Their insight was simple: LLMs are better at writing code to call APIs than at calling APIs directly. Sunil Pai has been a driving force in this space, pushing the idea that agents should generate and execute TypeScript against typed SDKs rather than making individual tool calls. Cloudflare's Dynamic Workers pioneered running that untrusted code at the edge in V8 isolates, and their execution model (send code to an isolated runtime, bridge specific capabilities in, get a result back) is exactly what we adopted. Code Mode supports Cloudflare Workers as a first-class isolate driver.
36+
37+
What we've done is turn this pattern into a composable, model-agnostic tool that plugs into any TanStack AI chat pipeline. Code Mode works with any model that can write TypeScript and reason well. Use it with OpenAI, Anthropic, Gemini, Groq, xAI, Ollama, or any other provider through our adapter system. You don't need to build the sandbox infrastructure, the type stub generation, or the system prompt engineering. Define your tools, pick a driver, pick your model, and the LLM gets a TypeScript sandbox with typed access to your entire tool surface.
38+
39+
## How It Works
40+
41+
Code Mode is modeled as a single tool called `execute_typescript`. When the LLM decides it needs to compose multiple operations, it writes TypeScript code and passes it to this tool. The code runs inside an isolate: a sandboxed environment with no access to the host file system, network, or process.
42+
43+
Your existing tools become `external_*` functions inside the sandbox. If you have a `fetchWeather` tool and a `searchProducts` tool, the sandbox sees `external_fetchWeather()` and `external_searchProducts()`, fully typed, fully async, fully isolated.
44+
45+
### The N+1 example
46+
47+
Consider an e-commerce app where the user asks: _"What are the top 5 best-selling products, and what's the average rating for each?"_
48+
49+
Without Code Mode, the LLM calls `getTopProducts`, waits for the result, then calls `getProductRatings` five separate times (one per product), waits for each result, then tries to compute averages in its head. That's seven tool calls, seven round-trips, and the averages are likely wrong because the model is doing mental math on floating-point numbers.
50+
51+
With Code Mode, the LLM writes this:
52+
53+
```typescript
54+
const top = await external_getTopProducts({ limit: 5 })
55+
56+
const ratings = await Promise.all(
57+
top.products.map((p) => external_getProductRatings({ productId: p.id })),
58+
)
59+
60+
return top.products.map((product, i) => {
61+
const scores = ratings[i].ratings.map((r) => r.score)
62+
const avg = scores.reduce((sum, s) => sum + s, 0) / scores.length
63+
return {
64+
name: product.name,
65+
sales: product.totalSales,
66+
averageRating: Math.round(avg * 100) / 100,
67+
}
68+
})
69+
```
70+
71+
One tool call. Five API fetches in parallel. Math computed in JavaScript, not in the model. The averages are correct to the penny. The context window savings compound fast: every round-trip you eliminate is hundreds of tokens you don't spend.
72+
73+
### Why math matters
74+
75+
LLMs predict tokens. They don't execute arithmetic. When the model says "the average is 4.37," it's pattern-matching, not computing. Code Mode moves all calculation into a real runtime. Sums, averages, percentages, currency conversions, date math: all of it runs as actual JavaScript. The model decides _what_ to compute. The sandbox computes it correctly.
76+
77+
## Setting It Up
78+
79+
### Install
80+
81+
```bash
82+
pnpm add @tanstack/ai @tanstack/ai-code-mode zod
83+
```
84+
85+
Pick an isolate driver:
86+
87+
```bash
88+
# Node.js: fastest, V8 isolates via isolated-vm
89+
pnpm add @tanstack/ai-isolate-node
90+
91+
# QuickJS WASM: no native deps, works in browsers and edge runtimes
92+
pnpm add @tanstack/ai-isolate-quickjs
93+
94+
# Cloudflare Workers: run on the edge
95+
pnpm add @tanstack/ai-isolate-cloudflare
96+
```
97+
98+
### Define tools
99+
100+
Same `toolDefinition()` API you already use. Nothing changes here:
101+
102+
```typescript
103+
import { toolDefinition } from '@tanstack/ai'
104+
import { z } from 'zod'
105+
106+
const fetchWeather = toolDefinition({
107+
name: 'fetchWeather',
108+
description: 'Get current weather for a city',
109+
inputSchema: z.object({ location: z.string() }),
110+
outputSchema: z.object({
111+
temperature: z.number(),
112+
condition: z.string(),
113+
}),
114+
}).server(async ({ location }) => {
115+
const res = await fetch(`https://api.weather.example/v1?city=${location}`)
116+
return res.json()
117+
})
118+
```
119+
120+
### Create Code Mode and use it with `chat()`
121+
122+
```typescript
123+
import { chat } from '@tanstack/ai'
124+
import { openaiText } from '@tanstack/ai-openai'
125+
import { createCodeMode } from '@tanstack/ai-code-mode'
126+
import { createNodeIsolateDriver } from '@tanstack/ai-isolate-node'
127+
128+
const { tool, systemPrompt } = createCodeMode({
129+
driver: createNodeIsolateDriver(),
130+
tools: [fetchWeather],
131+
timeout: 30_000,
132+
})
133+
134+
const result = await chat({
135+
adapter: openaiText('gpt-5.4'),
136+
systemPrompts: ['You are a helpful assistant.', systemPrompt],
137+
tools: [tool],
138+
messages,
139+
})
140+
```
141+
142+
`createCodeMode` returns two things: the `execute_typescript` tool and a system prompt containing typed function stubs for every tool you passed in. The model sees exact input/output types, so it generates correct calls without guessing parameter shapes. TypeScript annotations are stripped automatically before execution.
143+
144+
## Three Sandbox Runtimes
145+
146+
Code Mode is runtime-agnostic. All three drivers implement the same `IsolateDriver` interface. Swap them without changing any other code.
147+
148+
| Driver | Best for | Native deps | Browser |
149+
| --------------------------------- | --------------------------- | --------------- | ------- |
150+
| `@tanstack/ai-isolate-node` | Server-side Node.js | Yes (C++ addon) | No |
151+
| `@tanstack/ai-isolate-quickjs` | Browsers, edge, portability | None (WASM) | Yes |
152+
| `@tanstack/ai-isolate-cloudflare` | Edge on Cloudflare | None | N/A |
153+
154+
The **Node driver** uses V8 isolates via `isolated-vm` for maximum performance. JIT compilation, no serialization overhead beyond tool call boundaries. The **QuickJS driver** compiles to WASM and runs anywhere JavaScript runs, including the browser. The **Cloudflare driver** sends code to a deployed Worker and bridges tool calls back to your server via HTTP round-trips.
155+
156+
Each execution creates a fresh sandbox context. Configurable timeouts and memory limits prevent runaway scripts. The sandbox is destroyed after every call.
157+
158+
## Skills: Persistent, Reusable Code Libraries
159+
160+
Code Mode is powerful on its own. But what if your LLM could get smarter over time?
161+
162+
Right now the model rewrites the same logic every time. If it figures out a good way to fetch and rank NPM packages, that knowledge disappears when the conversation ends.
163+
164+
**Skills** fix this. The `@tanstack/ai-code-mode-skills` package lets the LLM save working code as a named, typed, persistent skill. On future requests, relevant skills are loaded from storage and exposed as first-class tools. The LLM calls them directly without rewriting the logic.
165+
166+
### The lifecycle
167+
168+
1. The LLM writes code via `execute_typescript` to solve a problem
169+
2. The code works. The LLM calls `register_skill` to save it with a name, description, and schemas
170+
3. On the next conversation, the system loads skill metadata and uses a cheap/fast model to select relevant skills
171+
4. Selected skills appear as direct tools the LLM can call, no sandbox needed
172+
5. Execution stats are tracked. Skills earn trust through successful runs
173+
174+
### Two integration paths
175+
176+
**High-level**: `codeModeWithSkills()` handles everything. Skill selection via a cheap LLM call, tool registry assembly, system prompt generation.
177+
178+
```typescript
179+
import { codeModeWithSkills } from '@tanstack/ai-code-mode-skills'
180+
import { createFileSkillStorage } from '@tanstack/ai-code-mode-skills/storage'
181+
import { createNodeIsolateDriver } from '@tanstack/ai-isolate-node'
182+
import { openaiText } from '@tanstack/ai-openai'
183+
184+
const storage = createFileSkillStorage({ directory: './.skills' })
185+
186+
const { toolsRegistry, systemPrompt } = await codeModeWithSkills({
187+
config: {
188+
driver: createNodeIsolateDriver(),
189+
tools: [myTool1, myTool2],
190+
timeout: 60_000,
191+
},
192+
adapter: openaiText('gpt-5.4-mini'), // cheap model for skill selection
193+
skills: { storage, maxSkillsInContext: 5 },
194+
messages,
195+
})
196+
197+
const stream = chat({
198+
adapter: openaiText('gpt-5.4'), // strong model for reasoning
199+
toolRegistry: toolsRegistry,
200+
messages,
201+
systemPrompts: ['You are a helpful assistant.', systemPrompt],
202+
})
203+
```
204+
205+
**Manual**: use `createCodeMode`, `skillsToTools`, and `createSkillManagementTools` individually when you want full control over which skills load and how they're assembled.
206+
207+
### Storage
208+
209+
Skills are just TypeScript text and metadata. Your application is free to store them wherever it wants. We ship file-based and in-memory storage implementations, but anything that satisfies the `SkillStorage` interface works: a database, S3, Redis, whatever fits your stack.
210+
211+
### Trust
212+
213+
Skills start untrusted and earn trust through successful executions. Four built-in strategies control promotion thresholds:
214+
215+
| Strategy | Provisional at | Trusted at |
216+
| -------------- | ---------------------- | ----------------------- |
217+
| Default | 10+ runs, ≥90% success | 100+ runs, ≥95% success |
218+
| Relaxed | 3+ runs, ≥80% success | 10+ runs, ≥90% success |
219+
| Always trusted | Immediately | Immediately |
220+
| Custom | You decide | You decide |
221+
222+
Trust is metadata today. It doesn't gate execution. But the infrastructure is there for when you want to build approval workflows or restrict untrusted skills to sandboxed execution only.
223+
224+
## Showing It in the UI
225+
226+
Code Mode emits custom events through the AG-UI streaming protocol as code executes. Your client receives them via the `onCustomEvent` callback on `useChat`:
227+
228+
| Event | When |
229+
| ----------------------------- | ------------------------------------ |
230+
| `code_mode:execution_started` | Sandbox begins executing |
231+
| `code_mode:console` | Each `console.log/error/warn/info` |
232+
| `code_mode:external_call` | Before an `external_*` function runs |
233+
| `code_mode:external_result` | After a successful `external_*` call |
234+
| `code_mode:external_error` | When an `external_*` call fails |
235+
236+
Every event carries a `toolCallId` that ties it to the specific `execute_typescript` call, so you can render a live execution timeline alongside the right message. Console output streaming in, external function calls with arguments and durations, errors as they happen.
237+
238+
## Try It: The Code Mode Demo
239+
240+
It's hard to impart just how much of a game changer code mode really is. You have to try it for yourself. The TanStack AI monorepo includes a full working example at `examples/ts-code-mode-web`. It's a TanStack Start app with multiple demo scenarios:
241+
242+
- **NPM/GitHub Chat**: ask questions about packages, the LLM writes code that calls NPM and GitHub APIs in parallel
243+
- **Database Demo**: natural language queries over an in-memory dataset, with skill registration
244+
- **Structured Output**: Code Mode generating typed, validated output
245+
- **Reporting**: an agent that builds live dashboards by writing code
246+
247+
To run it:
248+
249+
```bash
250+
git clone https://github.com/TanStack/ai.git
251+
cd ai
252+
pnpm install
253+
pnpm --filter ts-code-mode-web dev
254+
```
255+
256+
Set your API keys in the example's `.env` file (OpenAI, Anthropic, or Gemini). The app lets you switch providers and isolate runtimes (Node vs. QuickJS) from the UI.
257+
258+
## What's Next
259+
260+
Code Mode is available now across three packages:
261+
262+
```bash
263+
pnpm add @tanstack/ai-code-mode # Core
264+
pnpm add @tanstack/ai-code-mode-skills # Persistent skills
265+
pnpm add @tanstack/ai-isolate-node # or quickjs, or cloudflare
266+
```
267+
268+
We're working on:
269+
270+
- **DevTools integration**: visual timeline of sandbox execution, skill creation, and trust progression
271+
- **More isolate drivers**: Deno, Docker, and AWS Lambda sandboxes
272+
273+
The full documentation is in the [Code Mode Guide](https://tanstack.com/ai/latest/docs/guides/code-mode) and the [Skills Guide](https://tanstack.com/ai/latest/docs/guides/code-mode-with-skills).
274+
275+
---
276+
277+
_TanStack AI is open-source, provider-agnostic, and framework-agnostic. [Get started here.](https://tanstack.com/ai)_

src/blog/tanstack-ai-lazy-tool-discovery.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -117,7 +117,7 @@ const searchGuitars = toolDefinition({
117117

118118
// Use in chat — lazy tools work automatically
119119
const stream = chat({
120-
adapter: openaiText('gpt-4o'),
120+
adapter: openaiText('gpt-5.4'),
121121
messages,
122122
tools: [
123123
getGuitars,

src/blog/tanstack-ai-middleware.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ import { chat } from '@tanstack/ai'
4343
import { openaiText } from '@tanstack/ai-openai'
4444

4545
const stream = chat({
46-
adapter: openaiText('gpt-4o'),
46+
adapter: openaiText('gpt-5.4'),
4747
messages: [{ role: 'user', content: 'Explain middleware to me' }],
4848
middleware: [logger],
4949
})
@@ -377,7 +377,7 @@ import {
377377
} from '@tanstack/ai/middlewares'
378378

379379
const stream = chat({
380-
adapter: openaiText('gpt-4o'),
380+
adapter: openaiText('gpt-5.4'),
381381
messages: req.body.messages,
382382
tools: [searchTool, weatherTool, calendarTool],
383383
context: { tenantId: req.auth.tenantId },

src/blog/tanstack-ai-realtime-voice-chat.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ import { openaiRealtimeToken } from '@tanstack/ai-openai'
3535
// In your API route / server function
3636
const token = await realtimeToken({
3737
adapter: openaiRealtimeToken({
38-
model: 'gpt-4o-realtime-preview',
38+
model: 'gpt-realtime-1.5',
3939
}),
4040
})
4141

@@ -460,7 +460,7 @@ The server token endpoint picks the right adapter too:
460460
async function handleTokenRequest(provider: string) {
461461
if (provider === 'openai') {
462462
return realtimeToken({
463-
adapter: openaiRealtimeToken({ model: 'gpt-4o-realtime-preview' }),
463+
adapter: openaiRealtimeToken({ model: 'gpt-realtime-1.5' }),
464464
})
465465
}
466466
if (provider === 'elevenlabs') {

0 commit comments

Comments
 (0)