You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Simplify TS computer-use templates with @onkernel/cua-agent
Replace the per-provider sampling loops and hand-written action
adapters in the Anthropic, OpenAI, and Gemini TypeScript templates
with the CuaAgent class from @onkernel/cua-agent. Each template now
provisions a Kernel browser, hands it to CuaAgent, and returns the
final answer, removing ~3500 lines of provider-specific tool
translation and screenshot-loop code.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: pkg/templates/typescript/anthropic-computer-use/README.md
+3-8Lines changed: 3 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,8 +1,8 @@
1
1
# Kernel TypeScript Sample App - Anthropic Computer Use
2
2
3
-
This is a Kernel application that implements a prompt loop using Anthropic Computer Use with Kernel's Computer Controls API.
3
+
This is a Kernel application that runs Anthropic Computer Use against a Kernel cloud browser.
4
4
5
-
It generally follows the [Anthropic Reference Implementation](https://github.com/anthropics/anthropic-quickstarts/tree/main/computer-use-demo) but uses Kernel's Computer Controls API instead of `xdotool`and `gnome-screenshot`.
5
+
It uses [`@onkernel/cua-agent`](https://www.npmjs.com/package/@onkernel/cua-agent) to run the computer-use loop: the `CuaAgent` class translates Claude's computer-use tool calls into Kernel browser controls and feeds a fresh screenshot back on every turn. The app entry point just provisions a browser, hands it to `CuaAgent`, and returns the final answer.
When enabled, the response will include a `replay_url` field with a link to view the recorded session.
37
37
38
-
## Known Limitations
39
-
40
-
### Cursor Position
41
-
42
-
The `cursor_position` action is not supported with Kernel's Computer Controls API. If the model attempts to use this action, an error will be returned. This is a known limitation that does not significantly impact most computer use workflows, as the model typically tracks cursor position through screenshots.
// System prompt optimized for the Kernel cloud browser environment.
28
+
constSYSTEM_PROMPT=`<SYSTEM_CAPABILITY>
29
+
* You are utilising an Ubuntu virtual machine using ${process.arch} architecture with internet access.
30
+
* When you connect to the display, CHROMIUM IS ALREADY OPEN. The url bar is not visible but it is there.
31
+
* If you need to navigate to a new page, use ctrl+l to focus the url bar and then enter the url.
32
+
* You won't be able to see the url bar from the screenshot but ctrl-l still works.
33
+
* As the initial step click on the search bar.
34
+
* When viewing a page it can be helpful to zoom out so that you can see everything on the page.
35
+
* Either that, or make sure you scroll down to see everything before deciding something isn't available.
36
+
* Scroll action: scroll_amount and the tool result are in wheel units (not pixels).
37
+
* When using your computer function calls, they take a while to run and send back to you.
38
+
* Where possible/feasible, try to chain multiple of these calls all into one function calls request.
39
+
* The current date is ${CURRENT_DATE}.
40
+
* After each step, take a screenshot and carefully evaluate if you have achieved the right outcome.
41
+
* Explicitly show your thinking: "I have evaluated step X..." If not correct, try again.
42
+
* Only when you confirm a step was executed correctly should you move on to the next one.
43
+
</SYSTEM_CAPABILITY>
44
+
45
+
<IMPORTANT>
46
+
* When using Chromium, if a startup wizard appears, IGNORE IT. Do not even click "skip this step".
47
+
* Instead, click on the search bar on the center of the screen where it says "Search or enter address", and enter the appropriate search term or URL there.
48
+
</IMPORTANT>`;
22
49
23
-
if(!ANTHROPIC_API_KEY){
50
+
// LLM API keys are set in the environment during `kernel deploy <filename> -e ANTHROPIC_API_KEY=XXX`.
51
+
// See https://www.kernel.sh/docs/launch/deploy#environment-variables
52
+
// CuaAgent reads ANTHROPIC_API_KEY (or ANTHROPIC_OAUTH_TOKEN) from the environment by default.
0 commit comments