Skip to content

Commit d52581b

Browse files
dprevoznikclaude
andauthored
Simplify TS computer-use templates with @onkernel/cua-agent (#191)
## Summary Replaces the hand-written per-provider sampling loops and action adapters in the **Anthropic**, **OpenAI**, and **Gemini** TypeScript computer-use templates with the `CuaAgent` class from [`@onkernel/cua-agent`](https://www.npmjs.com/package/@onkernel/cua-agent). Each template now provisions a Kernel browser, hands it to `CuaAgent`, and returns the final assistant text. `CuaAgent` owns the screenshot/tool loop and the provider-specific tool-call translation, so the bespoke `loop.ts` / `tools/**` / `lib/agent.ts` / `lib/kernel-computer.ts` code is deleted — about 3,500 lines of provider plumbing removed across the three templates. The Kernel app wrapper (`app.action("cua-task")`), payload/output shapes, custom system prompts, and replay recording are preserved, so the existing `kernel invoke` samples still work unchanged. ## What changed per template | Template | Before | After | |---|---|---| | `anthropic-computer-use` | `index.ts` + `loop.ts` + `tools/**` + `utils/**` (~1,385 LOC TS) | `index.ts` + `session.ts` over `CuaAgent` (`anthropic:claude-sonnet-4-6`) | | `openai-computer-use` | `index.ts` + `lib/agent.ts` + `lib/kernel-computer.ts` + `lib/toolset.ts` + event logging + `run_local.ts` (~1,934 LOC TS) | `index.ts` + `lib/replay.ts` over `CuaAgent` (`openai:gpt-5.5`, `computerUseExtra`) | | `gemini-computer-use` | `index.ts` + `loop.ts` + `tools/**` (~983 LOC TS) | `index.ts` + `session.ts` over `CuaAgent` (`google:gemini-3-flash-preview`) | `session.ts` is retained (Anthropic/Gemini) as a provider-neutral browser-lifecycle + replay helper; it gains a `browser` getter so the `BrowserCreateResponse` can be handed to `CuaAgent`. ## Notable behavior changes - **Model updates**, required because `@onkernel/cua-ai` curates the supported computer-use models: - Gemini: `gemini-2.5-computer-use-preview-10-2025` → `gemini-3-flash-preview` (the old preview model is intentionally unsupported by cua-ai — it needs Google's native `tools.computer_use` wrapper). - OpenAI: `gpt-5.4` → `gpt-5.5`. - **OpenAI navigation:** OpenAI's computer tool has no native URL navigation. The old template pre-navigated to DuckDuckGo; this enables `computerUseExtra` so the model gets a `goto`/`back`/`forward`/`url` helper instead. - **`@onkernel/sdk` pinned to `0.49.0`** in each template to match `@onkernel/cua-agent`'s dependency, so the `Kernel` client and browser types are a single instance (the SDK `Kernel` class is nominally typed). - **Removed** the OpenAI bespoke JSONL event-logging system, `run_local.ts`, and `dotenv`; dropped the OpenAI `logs` and Gemini `error` output fields. Errors now surface by throwing (Anthropic/Gemini) or as `answer: null` (OpenAI), matching each template's prior contract otherwise. - Lockfiles regenerated; Gemini gains a `pnpm-lock.yaml` (it previously had none). ## Scope - TypeScript only, three templates. Yutori, Tzafon, and all Python templates are untouched. - No Go changes: app names, action names, and payload field names are preserved, so the `kernel create` / `kernel deploy` / `kernel invoke` flows and samples in `pkg/create/templates.go` are unaffected. ## Test plan - [x] `tsc --noEmit` passes for each migrated template against the published cua packages. - [x] `make build` (Go `//go:embed` re-embeds the cleaned template tree; no `node_modules` embedded). - [x] `make test` (`go vet ./...` + `go test ./...`) passes. - [x] Not yet deployed/invoked live against a Kernel browser — recommend a `kernel deploy` + `kernel invoke` smoke test per template before marking ready. <!-- CURSOR_SUMMARY --> --- > [!NOTE] > **Medium Risk** > Large deletion of custom agent logic shifts runtime behavior to external packages and newer model IDs; invoke/deploy contracts are preserved but live smoke tests are still recommended. > > **Overview** > The **Anthropic**, **Gemini**, and **OpenAI** TypeScript computer-use templates stop using in-repo sampling loops and hand-rolled tool adapters (`loop.ts`, `tools/**`, OpenAI `lib/agent.ts` / `kernel-computer.ts`, etc.) and instead wire **`CuaAgent`** from `@onkernel/cua-agent` after provisioning a Kernel browser. > > Each `index.ts` now creates a session (or browser), runs `agent.prompt(...)`, and returns the last assistant text; replay and `cua-task` payload shapes stay the same. **`session.ts`** (Anthropic/Gemini) exposes a **`browser`** getter on the create response for `CuaAgent`. Dependencies shift to `@onkernel/cua-agent`, `@onkernel/cua-ai`, and **`@onkernel/sdk` pinned to `0.49.0`**; READMEs document the Playwright escape hatch. > > **Behavior deltas:** Gemini model **`google:gemini-3-flash-preview`** (replacing the old preview id); OpenAI **`openai:gpt-5.5`** with **`computerUseExtra: true`** instead of pre-navigating to DuckDuckGo and custom batch/goto tooling; OpenAI drops local `run_local.ts`, dotenv, JSONL event logging, and optional **`logs`** / Gemini **`error`** response fields. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit fe1101e. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
1 parent c2d845a commit d52581b

34 files changed

Lines changed: 3837 additions & 3893 deletions

pkg/templates/typescript/anthropic-computer-use/README.md

Lines changed: 18 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
# Kernel TypeScript Sample App - Anthropic Computer Use
22

3-
This is a Kernel application that implements a prompt loop using Anthropic Computer Use with Kernel's Computer Controls API.
3+
This is a Kernel application that runs Anthropic Computer Use against a Kernel cloud browser.
44

5-
It generally follows the [Anthropic Reference Implementation](https://github.com/anthropics/anthropic-quickstarts/tree/main/computer-use-demo) but uses Kernel's Computer Controls API instead of `xdotool` and `gnome-screenshot`.
5+
It uses [`@onkernel/cua-agent`](https://www.npmjs.com/package/@onkernel/cua-agent) to run the computer-use loop: the `CuaAgent` class translates Claude's computer-use tool calls into Kernel browser controls and feeds a fresh screenshot back on every turn. The app entry point just provisions a browser, hands it to `CuaAgent`, and returns the final answer.
66

77
## Setup
88

@@ -35,13 +35,26 @@ kernel invoke ts-anthropic-cua cua-task --payload '{"query": "Navigate to https:
3535

3636
When enabled, the response will include a `replay_url` field with a link to view the recorded session.
3737

38-
## Known Limitations
38+
## Playwright escape hatch
3939

40-
### Cursor Position
40+
Some steps are awkward as raw clicks and keystrokes — precise DOM reads, form fills, data extraction, or waiting on a selector. Pass `playwright: true` when constructing the agent in `index.ts` to add a `playwright_execute` tool that runs Playwright/TypeScript directly against the live browser session:
4141

42-
The `cursor_position` action is not supported with Kernel's Computer Controls API. If the model attempts to use this action, an error will be returned. This is a known limitation that does not significantly impact most computer use workflows, as the model typically tracks cursor position through screenshots.
42+
```ts
43+
const agent = new CuaAgent({
44+
browser: session.browser,
45+
client: kernel,
46+
playwright: true,
47+
initialState: {
48+
model: 'anthropic:claude-sonnet-4-6',
49+
systemPrompt: SYSTEM_PROMPT,
50+
},
51+
});
52+
```
53+
54+
Inside `playwright_execute`, `page`, `context`, and `browser` are in scope and the code may `return` a JSON-serializable value. Each call runs in a fresh context (locals don't persist across calls), and no screenshot is returned automatically — the model can request one on a follow-up turn. See [`@onkernel/cua-agent`](https://www.npmjs.com/package/@onkernel/cua-agent) for details and per-model support status.
4355

4456
## Resources
4557

58+
- [@onkernel/cua-agent](https://www.npmjs.com/package/@onkernel/cua-agent)
4659
- [Anthropic Computer Use Documentation](https://docs.anthropic.com/en/docs/build-with-claude/computer-use)
4760
- [Kernel Documentation](https://www.kernel.sh/docs/quickstart)

pkg/templates/typescript/anthropic-computer-use/index.ts

Lines changed: 52 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
import { Kernel, type KernelContext } from '@onkernel/sdk';
2-
import { samplingLoop } from './loop';
2+
import { CuaAgent } from '@onkernel/cua-agent';
3+
import type { AssistantMessage } from '@onkernel/cua-ai';
34
import { KernelBrowserSession } from './session';
45

56
const kernel = new Kernel();
@@ -16,11 +17,40 @@ interface QueryOutput {
1617
replay_url?: string;
1718
}
1819

19-
// LLM API Keys are set in the environment during `kernel deploy <filename> -e ANTHROPIC_API_KEY=XXX`
20-
// See https://www.kernel.sh/docs/launch/deploy#environment-variables
21-
const ANTHROPIC_API_KEY = process.env.ANTHROPIC_API_KEY;
20+
const CURRENT_DATE = new Intl.DateTimeFormat('en-US', {
21+
weekday: 'long',
22+
month: 'long',
23+
day: 'numeric',
24+
year: 'numeric',
25+
}).format(new Date());
26+
27+
// System prompt optimized for the Kernel cloud browser environment.
28+
const SYSTEM_PROMPT = `<SYSTEM_CAPABILITY>
29+
* You are utilising an Ubuntu virtual machine using ${process.arch} architecture with internet access.
30+
* When you connect to the display, CHROMIUM IS ALREADY OPEN. The url bar is not visible but it is there.
31+
* If you need to navigate to a new page, use ctrl+l to focus the url bar and then enter the url.
32+
* You won't be able to see the url bar from the screenshot but ctrl-l still works.
33+
* As the initial step click on the search bar.
34+
* When viewing a page it can be helpful to zoom out so that you can see everything on the page.
35+
* Either that, or make sure you scroll down to see everything before deciding something isn't available.
36+
* Scroll action: scroll_amount and the tool result are in wheel units (not pixels).
37+
* When using your computer function calls, they take a while to run and send back to you.
38+
* Where possible/feasible, try to chain multiple of these calls all into one function calls request.
39+
* The current date is ${CURRENT_DATE}.
40+
* After each step, take a screenshot and carefully evaluate if you have achieved the right outcome.
41+
* Explicitly show your thinking: "I have evaluated step X..." If not correct, try again.
42+
* Only when you confirm a step was executed correctly should you move on to the next one.
43+
</SYSTEM_CAPABILITY>
44+
45+
<IMPORTANT>
46+
* When using Chromium, if a startup wizard appears, IGNORE IT. Do not even click "skip this step".
47+
* Instead, click on the search bar on the center of the screen where it says "Search or enter address", and enter the appropriate search term or URL there.
48+
</IMPORTANT>`;
2249

23-
if (!ANTHROPIC_API_KEY) {
50+
// LLM API keys are set in the environment during `kernel deploy <filename> -e ANTHROPIC_API_KEY=XXX`.
51+
// See https://www.kernel.sh/docs/launch/deploy#environment-variables
52+
// CuaAgent reads ANTHROPIC_API_KEY (or ANTHROPIC_OAUTH_TOKEN) from the environment by default.
53+
if (!process.env.ANTHROPIC_API_KEY) {
2454
throw new Error('ANTHROPIC_API_KEY is not set');
2555
}
2656

@@ -42,44 +72,34 @@ app.action<QueryInput, QueryOutput>(
4272
console.log('Kernel browser live view url:', session.liveViewUrl);
4373

4474
try {
45-
// Run the sampling loop
46-
const finalMessages = await samplingLoop({
47-
model: 'claude-sonnet-4-6',
48-
messages: [{
49-
role: 'user',
50-
content: payload.query,
51-
}],
52-
apiKey: ANTHROPIC_API_KEY,
53-
thinkingBudget: 1024,
54-
kernel,
55-
sessionId: session.sessionId,
75+
const agent = new CuaAgent({
76+
browser: session.browser,
77+
client: kernel,
78+
// Set to true to expose a playwright_execute tool for DOM reads, form fills, and selector waits.
79+
playwright: false,
80+
initialState: {
81+
model: 'anthropic:claude-sonnet-4-6',
82+
systemPrompt: SYSTEM_PROMPT,
83+
},
5684
});
5785

58-
// Extract the final result from the messages
59-
if (finalMessages.length === 0) {
60-
throw new Error('No messages were generated during the sampling loop');
61-
}
62-
63-
const lastMessage = finalMessages[finalMessages.length - 1];
64-
if (!lastMessage) {
65-
throw new Error('Failed to get the last message from the sampling loop');
66-
}
86+
await agent.prompt(payload.query);
6787

68-
const result = typeof lastMessage.content === 'string'
69-
? lastMessage.content
70-
: lastMessage.content.map(block =>
71-
block.type === 'text' ? block.text : ''
72-
).join('');
88+
const lastAssistant = [...agent.state.messages]
89+
.reverse()
90+
.find((message): message is AssistantMessage => message.role === 'assistant');
91+
const result = lastAssistant?.content
92+
.flatMap((block) => (block.type === 'text' ? [block.text] : []))
93+
.join('') ?? '';
7394

74-
// Stop session and get replay URL if recording was enabled
7595
const sessionInfo = await session.stop();
7696

7797
return {
7898
result,
7999
replay_url: sessionInfo.replayViewUrl,
80100
};
81101
} catch (error) {
82-
console.error('Error in sampling loop:', error);
102+
console.error('Error running CUA task:', error);
83103
await session.stop();
84104
throw error;
85105
}

pkg/templates/typescript/anthropic-computer-use/loop.ts

Lines changed: 0 additions & 218 deletions
This file was deleted.

pkg/templates/typescript/anthropic-computer-use/package.json

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,9 @@
44
"type": "module",
55
"private": true,
66
"dependencies": {
7-
"@anthropic-ai/sdk": "^0.71.2",
8-
"@onkernel/sdk": "^0.35.0"
7+
"@onkernel/cua-agent": "^0.3.4",
8+
"@onkernel/cua-ai": "^0.3.1",
9+
"@onkernel/sdk": "0.49.0"
910
},
1011
"devDependencies": {
1112
"@types/node": "^22.15.17",

0 commit comments

Comments
 (0)