Skip to content

Commit 7719c2e

Browse files
dprevoznikclaude
andcommitted
Simplify TS computer-use templates with @onkernel/cua-agent
Replace the per-provider sampling loops and hand-written action adapters in the Anthropic, OpenAI, and Gemini TypeScript templates with the CuaAgent class from @onkernel/cua-agent. Each template now provisions a Kernel browser, hands it to CuaAgent, and returns the final answer, removing ~3500 lines of provider-specific tool translation and screenshot-loop code. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
1 parent c2d845a commit 7719c2e

34 files changed

Lines changed: 3780 additions & 3900 deletions

pkg/templates/typescript/anthropic-computer-use/README.md

Lines changed: 3 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
# Kernel TypeScript Sample App - Anthropic Computer Use
22

3-
This is a Kernel application that implements a prompt loop using Anthropic Computer Use with Kernel's Computer Controls API.
3+
This is a Kernel application that runs Anthropic Computer Use against a Kernel cloud browser.
44

5-
It generally follows the [Anthropic Reference Implementation](https://github.com/anthropics/anthropic-quickstarts/tree/main/computer-use-demo) but uses Kernel's Computer Controls API instead of `xdotool` and `gnome-screenshot`.
5+
It uses [`@onkernel/cua-agent`](https://www.npmjs.com/package/@onkernel/cua-agent) to run the computer-use loop: the `CuaAgent` class translates Claude's computer-use tool calls into Kernel browser controls and feeds a fresh screenshot back on every turn. The app entry point just provisions a browser, hands it to `CuaAgent`, and returns the final answer.
66

77
## Setup
88

@@ -35,13 +35,8 @@ kernel invoke ts-anthropic-cua cua-task --payload '{"query": "Navigate to https:
3535

3636
When enabled, the response will include a `replay_url` field with a link to view the recorded session.
3737

38-
## Known Limitations
39-
40-
### Cursor Position
41-
42-
The `cursor_position` action is not supported with Kernel's Computer Controls API. If the model attempts to use this action, an error will be returned. This is a known limitation that does not significantly impact most computer use workflows, as the model typically tracks cursor position through screenshots.
43-
4438
## Resources
4539

40+
- [@onkernel/cua-agent](https://www.npmjs.com/package/@onkernel/cua-agent)
4641
- [Anthropic Computer Use Documentation](https://docs.anthropic.com/en/docs/build-with-claude/computer-use)
4742
- [Kernel Documentation](https://www.kernel.sh/docs/quickstart)

pkg/templates/typescript/anthropic-computer-use/index.ts

Lines changed: 50 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
import { Kernel, type KernelContext } from '@onkernel/sdk';
2-
import { samplingLoop } from './loop';
2+
import { CuaAgent } from '@onkernel/cua-agent';
3+
import type { AssistantMessage } from '@onkernel/cua-ai';
34
import { KernelBrowserSession } from './session';
45

56
const kernel = new Kernel();
@@ -16,11 +17,40 @@ interface QueryOutput {
1617
replay_url?: string;
1718
}
1819

19-
// LLM API Keys are set in the environment during `kernel deploy <filename> -e ANTHROPIC_API_KEY=XXX`
20-
// See https://www.kernel.sh/docs/launch/deploy#environment-variables
21-
const ANTHROPIC_API_KEY = process.env.ANTHROPIC_API_KEY;
20+
const CURRENT_DATE = new Intl.DateTimeFormat('en-US', {
21+
weekday: 'long',
22+
month: 'long',
23+
day: 'numeric',
24+
year: 'numeric',
25+
}).format(new Date());
26+
27+
// System prompt optimized for the Kernel cloud browser environment.
28+
const SYSTEM_PROMPT = `<SYSTEM_CAPABILITY>
29+
* You are utilising an Ubuntu virtual machine using ${process.arch} architecture with internet access.
30+
* When you connect to the display, CHROMIUM IS ALREADY OPEN. The url bar is not visible but it is there.
31+
* If you need to navigate to a new page, use ctrl+l to focus the url bar and then enter the url.
32+
* You won't be able to see the url bar from the screenshot but ctrl-l still works.
33+
* As the initial step click on the search bar.
34+
* When viewing a page it can be helpful to zoom out so that you can see everything on the page.
35+
* Either that, or make sure you scroll down to see everything before deciding something isn't available.
36+
* Scroll action: scroll_amount and the tool result are in wheel units (not pixels).
37+
* When using your computer function calls, they take a while to run and send back to you.
38+
* Where possible/feasible, try to chain multiple of these calls all into one function calls request.
39+
* The current date is ${CURRENT_DATE}.
40+
* After each step, take a screenshot and carefully evaluate if you have achieved the right outcome.
41+
* Explicitly show your thinking: "I have evaluated step X..." If not correct, try again.
42+
* Only when you confirm a step was executed correctly should you move on to the next one.
43+
</SYSTEM_CAPABILITY>
44+
45+
<IMPORTANT>
46+
* When using Chromium, if a startup wizard appears, IGNORE IT. Do not even click "skip this step".
47+
* Instead, click on the search bar on the center of the screen where it says "Search or enter address", and enter the appropriate search term or URL there.
48+
</IMPORTANT>`;
2249

23-
if (!ANTHROPIC_API_KEY) {
50+
// LLM API keys are set in the environment during `kernel deploy <filename> -e ANTHROPIC_API_KEY=XXX`.
51+
// See https://www.kernel.sh/docs/launch/deploy#environment-variables
52+
// CuaAgent reads ANTHROPIC_API_KEY (or ANTHROPIC_OAUTH_TOKEN) from the environment by default.
53+
if (!process.env.ANTHROPIC_API_KEY) {
2454
throw new Error('ANTHROPIC_API_KEY is not set');
2555
}
2656

@@ -42,44 +72,32 @@ app.action<QueryInput, QueryOutput>(
4272
console.log('Kernel browser live view url:', session.liveViewUrl);
4373

4474
try {
45-
// Run the sampling loop
46-
const finalMessages = await samplingLoop({
47-
model: 'claude-sonnet-4-6',
48-
messages: [{
49-
role: 'user',
50-
content: payload.query,
51-
}],
52-
apiKey: ANTHROPIC_API_KEY,
53-
thinkingBudget: 1024,
54-
kernel,
55-
sessionId: session.sessionId,
75+
const agent = new CuaAgent({
76+
browser: session.browser,
77+
client: kernel,
78+
initialState: {
79+
model: 'anthropic:claude-sonnet-4-6',
80+
systemPrompt: SYSTEM_PROMPT,
81+
},
5682
});
5783

58-
// Extract the final result from the messages
59-
if (finalMessages.length === 0) {
60-
throw new Error('No messages were generated during the sampling loop');
61-
}
62-
63-
const lastMessage = finalMessages[finalMessages.length - 1];
64-
if (!lastMessage) {
65-
throw new Error('Failed to get the last message from the sampling loop');
66-
}
84+
await agent.prompt(payload.query);
6785

68-
const result = typeof lastMessage.content === 'string'
69-
? lastMessage.content
70-
: lastMessage.content.map(block =>
71-
block.type === 'text' ? block.text : ''
72-
).join('');
86+
const lastAssistant = [...agent.state.messages]
87+
.reverse()
88+
.find((message): message is AssistantMessage => message.role === 'assistant');
89+
const result = lastAssistant?.content
90+
.flatMap((block) => (block.type === 'text' ? [block.text] : []))
91+
.join('') ?? '';
7392

74-
// Stop session and get replay URL if recording was enabled
7593
const sessionInfo = await session.stop();
7694

7795
return {
7896
result,
7997
replay_url: sessionInfo.replayViewUrl,
8098
};
8199
} catch (error) {
82-
console.error('Error in sampling loop:', error);
100+
console.error('Error running CUA task:', error);
83101
await session.stop();
84102
throw error;
85103
}

pkg/templates/typescript/anthropic-computer-use/loop.ts

Lines changed: 0 additions & 218 deletions
This file was deleted.

pkg/templates/typescript/anthropic-computer-use/package.json

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,9 @@
44
"type": "module",
55
"private": true,
66
"dependencies": {
7-
"@anthropic-ai/sdk": "^0.71.2",
8-
"@onkernel/sdk": "^0.35.0"
7+
"@onkernel/cua-agent": "^0.3.4",
8+
"@onkernel/cua-ai": "^0.3.1",
9+
"@onkernel/sdk": "0.49.0"
910
},
1011
"devDependencies": {
1112
"@types/node": "^22.15.17",

0 commit comments

Comments
 (0)