docs: add Use cases section with Linux Desktop page (#91)

beran-t · web-flow · commit 2fba168531fe · 2026-02-09T14:29:48.000-08:00
* docs: add Use cases section with Linux Desktop page

- Add new "Use cases" navigation group between Getting Started and Code Interpreting
- Add Linux Desktop page documenting AI-powered desktop control with E2B sandboxes
- Links to E2B Surf project and live demo

* fix: correct broken link to template docs

* docs: improve Linux Desktop and Desktop Template documentation

- Add explanatory prose before code blocks per documentation standards
- Fix TypeScript code block labels to use "JavaScript &amp; TypeScript"
- Add external links for sharp, SSE, noVNC, xdotool, x11vnc, and Xvfb
- Fix GitHub link to point to Surf repo instead of cookbook
- Add introduction and section headers to desktop template page

* fix: remove non-existent SSEEventType import from Linux Desktop guide

The page.tsx code example imported SSEEventType from @/types, but this
type was never defined in the types/index.ts file and wasn't used in
the component. This caused TypeScript compilation errors when following
the guide.

* docs: add missing links and fix consistency in Linux Desktop docs

- Link unexplained terms (XFCE, VNC, Next.js App Router, OpenAI Computer Use API, scrot)
- Remove unnecessary CodeGroup wrapper from project structure
- Add missing xorg package to Python template to match TypeScript version

* docs: rename Linux Desktop use case to Computer Use

Reframe the use case around what the AI agent does (computer use)
rather than the underlying infrastructure (Linux desktop).

* docs: rewrite Computer Use page as concise use-case overview

Replace 1865-line step-by-step tutorial with a focused ~240-line page
that shows core E2B Desktop SDK patterns (sandbox creation, screenshots,
desktop actions, agent loop) and links to E2B Surf for the full project.

Code examples adapted from the actual Surf implementation with both
TypeScript and Python variants.

* docs: address PR review comments on Computer Use page

- Reorder steps: sandbox creation before user command
- Link E2B Desktop SDK name to GitHub repo instead of npm

* fix: correct step order and broken E2B Desktop repo link
diff --git a/docs.json b/docs.json
@@ -42,6 +42,12 @@
               "docs/billing"
             ]
           },
+          {
+            "group": "Use cases",
+            "pages": [
+              "docs/use-cases/computer-use"
+            ]
+          },
           {
             "group": "Code Interpreting",
             "pages": [
diff --git a/docs/template/examples/desktop.mdx b/docs/template/examples/desktop.mdx
@@ -3,6 +3,17 @@ title: "Desktop"
 description: "Sandbox with Ubuntu Desktop and VNC access"
 ---
 
+This template creates a sandbox with a full Ubuntu 22.04 desktop environment, including the XFCE desktop, common applications, and VNC streaming for remote access. It's ideal for building AI agents that need to interact with graphical user interfaces.
+
+The template includes:
+- **Ubuntu 22.04** with XFCE desktop environment
+- **VNC streaming** via [noVNC](https://novnc.com/) for browser-based access
+- **Pre-installed applications**: LibreOffice, text editors, file manager, and common utilities
+- **Automation tools**: [xdotool](https://github.com/jordansissel/xdotool) and [scrot](https://github.com/resurrecting-open-source-projects/scrot) for programmatic desktop control
+
+## Template Definition
+
+The template installs the desktop environment, sets up VNC streaming via [x11vnc](https://github.com/LibVNC/x11vnc) and noVNC, and configures a startup script.
 
 <CodeGroup>
 
@@ -79,6 +90,7 @@ template = (
             "apt-get update",
             "apt-get install -y \
                 xserver-xorg \
+                xorg \
                 x11-xserver-utils \
                 xvfb \
                 x11-utils \
@@ -131,6 +143,9 @@ template = (
 
 </CodeGroup>
 
+## Startup Script
+
+The startup script initializes the virtual display using [Xvfb](https://www.x.org/releases/X11R7.6/doc/man/man1/Xvfb.1.xhtml) (X Virtual Framebuffer), launches the XFCE desktop session, starts the VNC server, and exposes the desktop via noVNC on port 6080. This script runs automatically when the sandbox starts.
 
 ```bash start_command.sh
 #!/bin/bash
@@ -156,6 +171,9 @@ cd /opt/noVNC/utils && ./novnc_proxy --vnc localhost:5900 --listen 6080 --web /o
 sleep 2
 ```
 
+## Building the Template
+
+Build the template with increased CPU and memory allocation to handle the desktop environment installation. The build process may take several minutes due to the size of the packages being installed.
 
 <CodeGroup>
 
diff --git a/docs/use-cases/computer-use.mdx b/docs/use-cases/computer-use.mdx
@@ -0,0 +1,243 @@
+---
+title: "Computer Use"
+description: "Build AI agents that see, understand, and control virtual Linux desktops using E2B Desktop sandboxes."
+icon: "desktop"
+---
+
+Computer use agents interact with graphical desktops the same way a human would — viewing the screen, clicking, typing, and scrolling. E2B provides the sandboxed desktop environment where these agents operate safely, with [VNC](https://en.wikipedia.org/wiki/Virtual_Network_Computing) streaming for real-time visual feedback.
+
+For a complete working implementation, see [E2B Surf](https://github.com/e2b-dev/surf) — an open-source computer use agent you can try via the [live demo](https://surf.e2b.dev).
+
+## How It Works
+
+The computer use agent loop follows this pattern:
+
+1. **User sends a command** — e.g., "Open Firefox and search for AI news"
+2. **Agent creates a desktop sandbox** — an Ubuntu 22.04 environment with [XFCE](https://xfce.org/) desktop and pre-installed applications
+3. **Agent takes a screenshot** — captures the current desktop state via E2B Desktop SDK
+4. **LLM analyzes the screenshot** — a vision model (e.g., [OpenAI Computer Use API](https://platform.openai.com/docs/guides/computer-use)) decides what action to take
+5. **Action is executed** — click, type, scroll, or keypress via E2B Desktop SDK
+6. **Repeat** — new screenshot is taken and sent back to the LLM until the task is complete
+
+## Install the E2B Desktop SDK
+
+The [E2B Desktop](https://github.com/e2b-dev/desktop) SDK gives your agent a full Linux desktop with mouse, keyboard, and screen capture APIs.
+
+<CodeGroup>
+```bash JavaScript & TypeScript
+npm i @e2b/desktop
+```
+```bash Python
+pip install e2b-desktop
+```
+</CodeGroup>
+
+## Core Implementation
+
+The following snippets are adapted from [E2B Surf](https://github.com/e2b-dev/surf).
+
+### Setting up the sandbox
+
+Create a desktop sandbox and start VNC streaming so you can view the desktop in a browser.
+
+<CodeGroup>
+```typescript JavaScript & TypeScript
+import { Sandbox } from '@e2b/desktop'
+
+// Create a desktop sandbox with a 5-minute timeout
+const sandbox = await Sandbox.create({
+  resolution: [1024, 720],
+  dpi: 96,
+  timeoutMs: 300_000,
+})
+
+// Start VNC streaming for browser-based viewing
+await sandbox.stream.start()
+const streamUrl = sandbox.stream.getUrl()
+console.log('View desktop at:', streamUrl)
+```
+```python Python
+from e2b_desktop import Sandbox
+
+# Create a desktop sandbox with a 5-minute timeout
+sandbox = Sandbox.create(
+    resolution=(1024, 720),
+    dpi=96,
+    timeout=300,
+)
+
+# Start VNC streaming for browser-based viewing
+sandbox.stream.start()
+stream_url = sandbox.stream.get_url()
+print("View desktop at:", stream_url)
+```
+</CodeGroup>
+
+### Executing desktop actions
+
+The E2B Desktop SDK maps directly to mouse and keyboard actions. Here's how Surf translates LLM-returned actions into desktop interactions.
+
+<CodeGroup>
+```typescript JavaScript & TypeScript
+import { Sandbox } from '@e2b/desktop'
+
+const sandbox = await Sandbox.create({ timeoutMs: 300_000 })
+
+// Mouse actions
+await sandbox.leftClick(500, 300)
+await sandbox.rightClick(500, 300)
+await sandbox.doubleClick(500, 300)
+await sandbox.middleClick(500, 300)
+await sandbox.moveMouse(500, 300)
+await sandbox.drag([100, 200], [400, 500])
+
+// Keyboard actions
+await sandbox.write('Hello, world!')  // Type text
+await sandbox.press('Enter')          // Press a key
+
+// Scrolling
+await sandbox.scroll('down', 3)  // Scroll down 3 ticks
+await sandbox.scroll('up', 3)    // Scroll up 3 ticks
+
+// Screenshots
+const screenshot = await sandbox.screenshot()  // Returns Buffer
+
+// Run terminal commands
+await sandbox.commands.run('ls -la /home')
+```
+```python Python
+from e2b_desktop import Sandbox
+
+sandbox = Sandbox.create(timeout=300)
+
+# Mouse actions
+sandbox.left_click(500, 300)
+sandbox.right_click(500, 300)
+sandbox.double_click(500, 300)
+sandbox.middle_click(500, 300)
+sandbox.move_mouse(500, 300)
+sandbox.drag([100, 200], [400, 500])
+
+# Keyboard actions
+sandbox.write("Hello, world!")  # Type text
+sandbox.press("Enter")          # Press a key
+
+# Scrolling
+sandbox.scroll("down", 3)  # Scroll down 3 ticks
+sandbox.scroll("up", 3)    # Scroll up 3 ticks
+
+# Screenshots
+screenshot = sandbox.screenshot()  # Returns bytes
+
+# Run terminal commands
+sandbox.commands.run("ls -la /home")
+```
+</CodeGroup>
+
+### Agent loop
+
+The core loop takes screenshots, sends them to an LLM, and executes the returned actions on the desktop. This is a simplified version of how [Surf](https://github.com/e2b-dev/surf) drives the computer use cycle.
+
+<CodeGroup>
+```typescript JavaScript & TypeScript
+import { Sandbox } from '@e2b/desktop'
+
+const sandbox = await Sandbox.create({
+  resolution: [1024, 720],
+  timeoutMs: 300_000,
+})
+await sandbox.stream.start()
+
+while (true) {
+  // 1. Capture the current desktop state
+  const screenshot = await sandbox.screenshot()
+
+  // 2. Send screenshot to your LLM and get the next action
+  //    (use OpenAI Computer Use, Anthropic Claude, etc.)
+  const action = await getNextActionFromLLM(screenshot)
+
+  if (!action) break // LLM signals task is complete
+
+  // 3. Execute the action on the desktop
+  switch (action.type) {
+    case 'click':
+      await sandbox.leftClick(action.x, action.y)
+      break
+    case 'type':
+      await sandbox.write(action.text)
+      break
+    case 'keypress':
+      await sandbox.press(action.keys)
+      break
+    case 'scroll':
+      await sandbox.scroll(
+        action.scrollY < 0 ? 'up' : 'down',
+        Math.abs(action.scrollY)
+      )
+      break
+    case 'drag':
+      await sandbox.drag(
+        [action.startX, action.startY],
+        [action.endX, action.endY]
+      )
+      break
+  }
+}
+
+await sandbox.kill()
+```
+```python Python
+from e2b_desktop import Sandbox
+
+sandbox = Sandbox.create(
+    resolution=(1024, 720),
+    timeout=300,
+)
+sandbox.stream.start()
+
+while True:
+    # 1. Capture the current desktop state
+    screenshot = sandbox.screenshot()
+
+    # 2. Send screenshot to your LLM and get the next action
+    #    (use OpenAI Computer Use, Anthropic Claude, etc.)
+    action = get_next_action_from_llm(screenshot)
+
+    if not action:
+        break  # LLM signals task is complete
+
+    # 3. Execute the action on the desktop
+    if action.type == "click":
+        sandbox.left_click(action.x, action.y)
+    elif action.type == "type":
+        sandbox.write(action.text)
+    elif action.type == "keypress":
+        sandbox.press(action.keys)
+    elif action.type == "scroll":
+        direction = "up" if action.scroll_y < 0 else "down"
+        sandbox.scroll(direction, abs(action.scroll_y))
+    elif action.type == "drag":
+        sandbox.drag(
+            [action.start_x, action.start_y],
+            [action.end_x, action.end_y],
+        )
+
+sandbox.kill()
+```
+</CodeGroup>
+
+The `getNextActionFromLLM` / `get_next_action_from_llm` function is where you integrate your chosen LLM. See [Connect LLMs to E2B](/docs/quickstart/connect-llms) for integration patterns with OpenAI, Anthropic, and other providers.
+
+## Related Guides
+
+<CardGroup cols={3}>
+  <Card title="Desktop Template" icon="desktop" href="/docs/template/examples/desktop">
+    Build desktop sandboxes with Ubuntu, XFCE, and VNC streaming
+  </Card>
+  <Card title="Connect LLMs" icon="brain" href="/docs/quickstart/connect-llms">
+    Integrate AI models with sandboxes using tool calling
+  </Card>
+  <Card title="Sandbox Lifecycle" icon="rotate" href="/docs/sandbox">
+    Create, manage, and control sandbox lifecycle
+  </Card>
+</CardGroup>

Original file line number	Diff line number	Diff line change
`@@ -42,6 +42,12 @@`
`42`	`42`	`"docs/billing"`
`43`	`43`	`]`
`44`	`44`	`},`
	`45`	`+ {`
	`46`	`+ "group": "Use cases",`
	`47`	`+ "pages": [`
	`48`	`+ "docs/use-cases/computer-use"`
	`49`	`+ ]`
	`50`	`+ },`
`45`	`51`	`{`
`46`	`52`	`"group": "Code Interpreting",`
`47`	`53`	`"pages": [`