Skip to content

Commit 2fba168

Browse files
authored
docs: add Use cases section with Linux Desktop page (#91)
* docs: add Use cases section with Linux Desktop page - Add new "Use cases" navigation group between Getting Started and Code Interpreting - Add Linux Desktop page documenting AI-powered desktop control with E2B sandboxes - Links to E2B Surf project and live demo * fix: correct broken link to template docs * docs: improve Linux Desktop and Desktop Template documentation - Add explanatory prose before code blocks per documentation standards - Fix TypeScript code block labels to use "JavaScript & TypeScript" - Add external links for sharp, SSE, noVNC, xdotool, x11vnc, and Xvfb - Fix GitHub link to point to Surf repo instead of cookbook - Add introduction and section headers to desktop template page * fix: remove non-existent SSEEventType import from Linux Desktop guide The page.tsx code example imported SSEEventType from @/types, but this type was never defined in the types/index.ts file and wasn't used in the component. This caused TypeScript compilation errors when following the guide. * docs: add missing links and fix consistency in Linux Desktop docs - Link unexplained terms (XFCE, VNC, Next.js App Router, OpenAI Computer Use API, scrot) - Remove unnecessary CodeGroup wrapper from project structure - Add missing xorg package to Python template to match TypeScript version * docs: rename Linux Desktop use case to Computer Use Reframe the use case around what the AI agent does (computer use) rather than the underlying infrastructure (Linux desktop). * docs: rewrite Computer Use page as concise use-case overview Replace 1865-line step-by-step tutorial with a focused ~240-line page that shows core E2B Desktop SDK patterns (sandbox creation, screenshots, desktop actions, agent loop) and links to E2B Surf for the full project. Code examples adapted from the actual Surf implementation with both TypeScript and Python variants. * docs: address PR review comments on Computer Use page - Reorder steps: sandbox creation before user command - Link E2B Desktop SDK name to GitHub repo instead of npm * fix: correct step order and broken E2B Desktop repo link
1 parent 3d1cb8f commit 2fba168

3 files changed

Lines changed: 267 additions & 0 deletions

File tree

docs.json

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,12 @@
4242
"docs/billing"
4343
]
4444
},
45+
{
46+
"group": "Use cases",
47+
"pages": [
48+
"docs/use-cases/computer-use"
49+
]
50+
},
4551
{
4652
"group": "Code Interpreting",
4753
"pages": [

docs/template/examples/desktop.mdx

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,17 @@ title: "Desktop"
33
description: "Sandbox with Ubuntu Desktop and VNC access"
44
---
55

6+
This template creates a sandbox with a full Ubuntu 22.04 desktop environment, including the XFCE desktop, common applications, and VNC streaming for remote access. It's ideal for building AI agents that need to interact with graphical user interfaces.
7+
8+
The template includes:
9+
- **Ubuntu 22.04** with XFCE desktop environment
10+
- **VNC streaming** via [noVNC](https://novnc.com/) for browser-based access
11+
- **Pre-installed applications**: LibreOffice, text editors, file manager, and common utilities
12+
- **Automation tools**: [xdotool](https://github.com/jordansissel/xdotool) and [scrot](https://github.com/resurrecting-open-source-projects/scrot) for programmatic desktop control
13+
14+
## Template Definition
15+
16+
The template installs the desktop environment, sets up VNC streaming via [x11vnc](https://github.com/LibVNC/x11vnc) and noVNC, and configures a startup script.
617

718
<CodeGroup>
819

@@ -79,6 +90,7 @@ template = (
7990
"apt-get update",
8091
"apt-get install -y \
8192
xserver-xorg \
93+
xorg \
8294
x11-xserver-utils \
8395
xvfb \
8496
x11-utils \
@@ -131,6 +143,9 @@ template = (
131143

132144
</CodeGroup>
133145

146+
## Startup Script
147+
148+
The startup script initializes the virtual display using [Xvfb](https://www.x.org/releases/X11R7.6/doc/man/man1/Xvfb.1.xhtml) (X Virtual Framebuffer), launches the XFCE desktop session, starts the VNC server, and exposes the desktop via noVNC on port 6080. This script runs automatically when the sandbox starts.
134149

135150
```bash start_command.sh
136151
#!/bin/bash
@@ -156,6 +171,9 @@ cd /opt/noVNC/utils && ./novnc_proxy --vnc localhost:5900 --listen 6080 --web /o
156171
sleep 2
157172
```
158173

174+
## Building the Template
175+
176+
Build the template with increased CPU and memory allocation to handle the desktop environment installation. The build process may take several minutes due to the size of the packages being installed.
159177

160178
<CodeGroup>
161179

docs/use-cases/computer-use.mdx

Lines changed: 243 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,243 @@
1+
---
2+
title: "Computer Use"
3+
description: "Build AI agents that see, understand, and control virtual Linux desktops using E2B Desktop sandboxes."
4+
icon: "desktop"
5+
---
6+
7+
Computer use agents interact with graphical desktops the same way a human would — viewing the screen, clicking, typing, and scrolling. E2B provides the sandboxed desktop environment where these agents operate safely, with [VNC](https://en.wikipedia.org/wiki/Virtual_Network_Computing) streaming for real-time visual feedback.
8+
9+
For a complete working implementation, see [E2B Surf](https://github.com/e2b-dev/surf) — an open-source computer use agent you can try via the [live demo](https://surf.e2b.dev).
10+
11+
## How It Works
12+
13+
The computer use agent loop follows this pattern:
14+
15+
1. **User sends a command** — e.g., "Open Firefox and search for AI news"
16+
2. **Agent creates a desktop sandbox** — an Ubuntu 22.04 environment with [XFCE](https://xfce.org/) desktop and pre-installed applications
17+
3. **Agent takes a screenshot** — captures the current desktop state via E2B Desktop SDK
18+
4. **LLM analyzes the screenshot** — a vision model (e.g., [OpenAI Computer Use API](https://platform.openai.com/docs/guides/computer-use)) decides what action to take
19+
5. **Action is executed** — click, type, scroll, or keypress via E2B Desktop SDK
20+
6. **Repeat** — new screenshot is taken and sent back to the LLM until the task is complete
21+
22+
## Install the E2B Desktop SDK
23+
24+
The [E2B Desktop](https://github.com/e2b-dev/desktop) SDK gives your agent a full Linux desktop with mouse, keyboard, and screen capture APIs.
25+
26+
<CodeGroup>
27+
```bash JavaScript & TypeScript
28+
npm i @e2b/desktop
29+
```
30+
```bash Python
31+
pip install e2b-desktop
32+
```
33+
</CodeGroup>
34+
35+
## Core Implementation
36+
37+
The following snippets are adapted from [E2B Surf](https://github.com/e2b-dev/surf).
38+
39+
### Setting up the sandbox
40+
41+
Create a desktop sandbox and start VNC streaming so you can view the desktop in a browser.
42+
43+
<CodeGroup>
44+
```typescript JavaScript & TypeScript
45+
import { Sandbox } from '@e2b/desktop'
46+
47+
// Create a desktop sandbox with a 5-minute timeout
48+
const sandbox = await Sandbox.create({
49+
resolution: [1024, 720],
50+
dpi: 96,
51+
timeoutMs: 300_000,
52+
})
53+
54+
// Start VNC streaming for browser-based viewing
55+
await sandbox.stream.start()
56+
const streamUrl = sandbox.stream.getUrl()
57+
console.log('View desktop at:', streamUrl)
58+
```
59+
```python Python
60+
from e2b_desktop import Sandbox
61+
62+
# Create a desktop sandbox with a 5-minute timeout
63+
sandbox = Sandbox.create(
64+
resolution=(1024, 720),
65+
dpi=96,
66+
timeout=300,
67+
)
68+
69+
# Start VNC streaming for browser-based viewing
70+
sandbox.stream.start()
71+
stream_url = sandbox.stream.get_url()
72+
print("View desktop at:", stream_url)
73+
```
74+
</CodeGroup>
75+
76+
### Executing desktop actions
77+
78+
The E2B Desktop SDK maps directly to mouse and keyboard actions. Here's how Surf translates LLM-returned actions into desktop interactions.
79+
80+
<CodeGroup>
81+
```typescript JavaScript & TypeScript
82+
import { Sandbox } from '@e2b/desktop'
83+
84+
const sandbox = await Sandbox.create({ timeoutMs: 300_000 })
85+
86+
// Mouse actions
87+
await sandbox.leftClick(500, 300)
88+
await sandbox.rightClick(500, 300)
89+
await sandbox.doubleClick(500, 300)
90+
await sandbox.middleClick(500, 300)
91+
await sandbox.moveMouse(500, 300)
92+
await sandbox.drag([100, 200], [400, 500])
93+
94+
// Keyboard actions
95+
await sandbox.write('Hello, world!') // Type text
96+
await sandbox.press('Enter') // Press a key
97+
98+
// Scrolling
99+
await sandbox.scroll('down', 3) // Scroll down 3 ticks
100+
await sandbox.scroll('up', 3) // Scroll up 3 ticks
101+
102+
// Screenshots
103+
const screenshot = await sandbox.screenshot() // Returns Buffer
104+
105+
// Run terminal commands
106+
await sandbox.commands.run('ls -la /home')
107+
```
108+
```python Python
109+
from e2b_desktop import Sandbox
110+
111+
sandbox = Sandbox.create(timeout=300)
112+
113+
# Mouse actions
114+
sandbox.left_click(500, 300)
115+
sandbox.right_click(500, 300)
116+
sandbox.double_click(500, 300)
117+
sandbox.middle_click(500, 300)
118+
sandbox.move_mouse(500, 300)
119+
sandbox.drag([100, 200], [400, 500])
120+
121+
# Keyboard actions
122+
sandbox.write("Hello, world!") # Type text
123+
sandbox.press("Enter") # Press a key
124+
125+
# Scrolling
126+
sandbox.scroll("down", 3) # Scroll down 3 ticks
127+
sandbox.scroll("up", 3) # Scroll up 3 ticks
128+
129+
# Screenshots
130+
screenshot = sandbox.screenshot() # Returns bytes
131+
132+
# Run terminal commands
133+
sandbox.commands.run("ls -la /home")
134+
```
135+
</CodeGroup>
136+
137+
### Agent loop
138+
139+
The core loop takes screenshots, sends them to an LLM, and executes the returned actions on the desktop. This is a simplified version of how [Surf](https://github.com/e2b-dev/surf) drives the computer use cycle.
140+
141+
<CodeGroup>
142+
```typescript JavaScript & TypeScript
143+
import { Sandbox } from '@e2b/desktop'
144+
145+
const sandbox = await Sandbox.create({
146+
resolution: [1024, 720],
147+
timeoutMs: 300_000,
148+
})
149+
await sandbox.stream.start()
150+
151+
while (true) {
152+
// 1. Capture the current desktop state
153+
const screenshot = await sandbox.screenshot()
154+
155+
// 2. Send screenshot to your LLM and get the next action
156+
// (use OpenAI Computer Use, Anthropic Claude, etc.)
157+
const action = await getNextActionFromLLM(screenshot)
158+
159+
if (!action) break // LLM signals task is complete
160+
161+
// 3. Execute the action on the desktop
162+
switch (action.type) {
163+
case 'click':
164+
await sandbox.leftClick(action.x, action.y)
165+
break
166+
case 'type':
167+
await sandbox.write(action.text)
168+
break
169+
case 'keypress':
170+
await sandbox.press(action.keys)
171+
break
172+
case 'scroll':
173+
await sandbox.scroll(
174+
action.scrollY < 0 ? 'up' : 'down',
175+
Math.abs(action.scrollY)
176+
)
177+
break
178+
case 'drag':
179+
await sandbox.drag(
180+
[action.startX, action.startY],
181+
[action.endX, action.endY]
182+
)
183+
break
184+
}
185+
}
186+
187+
await sandbox.kill()
188+
```
189+
```python Python
190+
from e2b_desktop import Sandbox
191+
192+
sandbox = Sandbox.create(
193+
resolution=(1024, 720),
194+
timeout=300,
195+
)
196+
sandbox.stream.start()
197+
198+
while True:
199+
# 1. Capture the current desktop state
200+
screenshot = sandbox.screenshot()
201+
202+
# 2. Send screenshot to your LLM and get the next action
203+
# (use OpenAI Computer Use, Anthropic Claude, etc.)
204+
action = get_next_action_from_llm(screenshot)
205+
206+
if not action:
207+
break # LLM signals task is complete
208+
209+
# 3. Execute the action on the desktop
210+
if action.type == "click":
211+
sandbox.left_click(action.x, action.y)
212+
elif action.type == "type":
213+
sandbox.write(action.text)
214+
elif action.type == "keypress":
215+
sandbox.press(action.keys)
216+
elif action.type == "scroll":
217+
direction = "up" if action.scroll_y < 0 else "down"
218+
sandbox.scroll(direction, abs(action.scroll_y))
219+
elif action.type == "drag":
220+
sandbox.drag(
221+
[action.start_x, action.start_y],
222+
[action.end_x, action.end_y],
223+
)
224+
225+
sandbox.kill()
226+
```
227+
</CodeGroup>
228+
229+
The `getNextActionFromLLM` / `get_next_action_from_llm` function is where you integrate your chosen LLM. See [Connect LLMs to E2B](/docs/quickstart/connect-llms) for integration patterns with OpenAI, Anthropic, and other providers.
230+
231+
## Related Guides
232+
233+
<CardGroup cols={3}>
234+
<Card title="Desktop Template" icon="desktop" href="/docs/template/examples/desktop">
235+
Build desktop sandboxes with Ubuntu, XFCE, and VNC streaming
236+
</Card>
237+
<Card title="Connect LLMs" icon="brain" href="/docs/quickstart/connect-llms">
238+
Integrate AI models with sandboxes using tool calling
239+
</Card>
240+
<Card title="Sandbox Lifecycle" icon="rotate" href="/docs/sandbox">
241+
Create, manage, and control sandbox lifecycle
242+
</Card>
243+
</CardGroup>

0 commit comments

Comments
 (0)