Claude Code Computer Use System - Comprehensive Reverse Engineering Report

SYSTEM ARCHITECTURE OVERVIEW

Claude Code's computer use (CU) system is a macOS-only, MCP-based native desktop automation layer that bridges the Claude API with low-level screen capture and input simulation. The architecture is modular:

API Model
    ↓
MCP Server (in-process)
    ↓
Computer Use Host Adapter (singleton)
    ↓
Executor (ComputerExecutor interface)
    ├─ @ant/computer-use-swift (native .node module)
    └─ @ant/computer-use-input (Rust/enigo .node module)
    ↓
OS-level APIs (macOS)

1. FILE-BY-FILE BREAKDOWN (13 Files)

A. appNames.ts (Prompt Injection Hardening)

Purpose: Filters and sanitizes installed-app names for tool descriptions.

Key mechanisms:

PATH_ALLOWLIST: Only apps from /Applications/, /System/Applications/, or ~/Applications/
NAME_PATTERN_BLOCKLIST: Filters noisy background services (Helper, Agent, Service, Uninstaller)
ALWAYS_KEEP_BUNDLE_IDS: 30+ trusted apps (browsers, terminals, dev tools) bypass filtering
APP_NAME_ALLOWED regex: Unicode-safe allowlist [\p{L}\p{M}\p{N}_ .&'()+-]+ (no quotes, pipes, backticks)
Prevents injection via character filtering on untrusted (attacker-installable) apps only
Length cap: 40 chars max per name, 50 apps in list max
Deduplication and sorting applied

Attack surface mitigation: An app named "grant all" could exploit naive parsing, but the tool description's structural framing ("Available applications:") plus explicit user approval dialog contain it.

B. computerUseLock.ts (Session Serialization)

Purpose: File-based distributed lock for exclusive computer control.

Lock mechanism:

File: ~/.claude/computer-use.lock
Format: JSON with {sessionId, pid, acquiredAt}
Atomic test-and-set: writeFile(..., {flag: 'wx'}) for O_EXCL
Stale recovery: Checks PID liveness via process.kill(pid, 0) signal probe

Key functions:

tryAcquireComputerUseLock(): O_EXCL create → reads → liveness check → race-safe stale recovery
releaseComputerUseLock(): Unlinks on drop (idempotent)
isLockHeldLocally(): Zero-syscall check (tracks via unregisterCleanup closure)
checkComputerUseLock(): Non-acquiring check for request_access tool

Race conditions handled:

Multiple sessions racing to recover stale lock: only one's create succeeds, others read winner
Small PID-reuse window: negligible in practice

C. cleanup.ts (Turn-End Cleanup)

Purpose: Unhides apps and releases lock at turn end.

Execution flow:

Checks appState.computerUseMcpState.hiddenDuringTurn (Set of bundleIds)
Calls unhideComputerUseApps([...hidden]) (fire-and-forget, 5s timeout)
Unregisters Escape hotkey
Releases lock
Sends OS notification

Key details:

Runs on all turn ends: natural, abort-streaming, abort-tools
Dynamic import gated on feature('CHICAGO_MCP')
Unhide timeout: 5s (generous, just unblocks abort)
All cleanup is idempotent

D. common.ts (Constants & Sentinel Values)

Purpose: Shared constants and terminal detection.

Key exports:

COMPUTER_USE_MCP_SERVER_NAME = 'computer-use'
CLI_HOST_BUNDLE_ID = 'com.anthropic.claude-code.cli-no-window'
- Sentinel for "no frontmost window" (terminal has no window)
- Used by package's frontmost gate (always false, design intent)
CLI_CU_CAPABILITIES = {screenshotFiltering: 'native', platform: 'darwin'}

Terminal detection via getTerminalBundleId():

Reads process.env.__CFBundleIdentifier (set by LaunchServices for .app bundles)
Fallback table: iTerm2, Apple Terminal, ghostty, kitty, Warp, VS Code
Returns null if undetectable (ssh, tmux client, unknown terminal)

Why terminal detection matters:

Used as "surrogate host" in prepareDisplay (hide-exempt)
Stripped from screenshot allowlist so terminal never photobombs
Skipped in app activation z-order walk

E. drainRunLoop.ts (CFRunLoop Pump)

Purpose: Drains macOS's main dispatch queue for async-to-sync bridging.

Problem:

Swift's @MainActor methods (screenshot, listInstalled, etc.) dispatch to DispatchQueue.main
enigo's key() also uses DispatchQueue.main
Under libuv (Node/Bun), this queue never drains → promises hang indefinitely
Electron drains CFRunLoop continuously, so Cowork doesn't need this

Solution:

Refcounted setInterval(drainTick, 1ms) that calls _drainMainRunLoop()
retain(): increments pending, starts pump if needed
release(): decrements, stops pump when pending hits zero
Safe nesting via refcount

Timeout protection:

drainRunLoop(fn): 30s timeout with race detection
Orphaned promise detection: late rejection swallowed with .catch(() => {})
retainPump/releasePump: long-lived registration (ESC hotkey), no timeout

F. escHotkey.ts (System Escape Interception)

Purpose: Global Escape key abort, preventing prompt-injected actions from dismissing dialogs.

Mechanism:

registerEscHotkey(): Creates CGEventTap via Swift
Tap consumes Escape system-wide (CFRunLoopGetMain defaultMode)
Escape never reaches the target app
Model must call notifyExpectedEscape() for model-synthesized Escapes (100ms decay)

Lifecycle:

Register on fresh lock acquire (first CU tool call)
Unregister on lock release
Pump retain held for registration lifetime (refcounted with drainRunLoop)
Returns false if CGEvent.tapCreate fails (missing Accessibility permission)

Safety model:

Hole-punch for model's own Escape: notifyExpectedEscape() sets decay flag
CGEventTap checks event.flags.isEmpty so ctrl+escape etc. pass through

G. executor.ts (Main ComputerExecutor Implementation)

Purpose: Bridge between MCP tool calls and native input/screenshot APIs.

Key architecture:

Factory function createCliExecutor(opts) builds ComputerExecutor singleton
Two native modules:
- @ant/computer-use-input: Rust/enigo for mouse, keyboard, FrontmostApp
- @ant/computer-use-swift: SCContentFilter screenshots, TCC, app management
Lazy loading: Swift loaded at factory time, Input loaded on first mouse/keyboard call

Screenshot pipeline:

display.getSize(displayId)
  ↓ logical.px * scaleFactor → physical.px
  ↓ targetImageSize(physW, physH, API_RESIZE_PARAMS) → target dims
  ↓ cu.screenshot.captureExcluding(allowedBundleIds, quality=0.75, targetW, targetH, displayId)
  ↓ Result: {base64: string (JPEG), width, height}

Input sequence primitives:

Mouse:
- moveAndSettle(): Instant move + 50ms sleep for HID round-trip
- animatedMove(): Distance-proportional duration at 2000px/sec, max 0.5s, ease-out-cubic @ 60fps
- click(): Move → modifiers-bracketed button click
- drag(): Move → press → animated move → release (finally ensures release)
- scroll(): Move → vertical first → horizontal
Keyboard:
- key(): xdotool-style "ctrl+shift+a" split on '+' → enigo.keys()
- Bare Escape: notifyExpectedEscape() before key()
- holdKey(): Press all → sleep(durationMs) → release all (reverse order)
- type(): Per-grapheme via clipboard or typeText()
- Clipboard path: read → write → verify round-trip → paste → sleep(100ms) → restore
App management:
- prepareForAction(): Hide non-allowlisted apps + defocus → returns hidden set
- listInstalledApps(): Spotlight via Swift
- openApp(): NSWorkspace.openApplication

JPEG quality: 0.75 (75%)

Terminal as surrogate host:

const surrogateHost = terminalBundleId ?? CLI_HOST_BUNDLE_ID
// Passed to prepareDisplay (hide-exempt) and resolvePrepareCapture
// Stripped from screenshot allowlist so terminal never photobombs

H. gates.ts (Feature Gates & Configuration)

Purpose: GrowthBook feature flag integration for CU subgates.

Gates:

enabled: Master gate (default: false)
pixelValidation: Pixel-compare click validation (default: false)
clipboardPasteMultiline: Multiline paste via clipboard (default: true)
mouseAnimation: Animated drag movement (default: true)
hideBeforeAction: Show/hide behavior (default: true)
autoTargetDisplay: Auto-detect display (default: true)
clipboardGuard: Clipboard safety checks (default: true)
coordinateMode: 'pixels' | 'normalized' (default: 'pixels', frozen at first read)

Subscription gating:

Max/Pro only for external rollout
User_TYPE='ant' bypass (dogfooding)

Frozen coordinate mode: Read once at setup, stays constant even if GB flips mid-session

I. hostAdapter.ts (Dependency Injection Container)

Purpose: Singleton factory for ComputerUseHostAdapter passed to MCP package.

Adapter shape:

{
  serverName: 'computer-use'
  logger: DebugLogger
  executor: ComputerExecutor
  ensureOsPermissions: () => {accessibility, screenRecording}
  isDisabled: () => !getChicagoEnabled()
  getSubGates: () => CuSubGates
  getAutoUnhideEnabled: () => true
  cropRawPatch: () => null
}

Key details:

Process-lifetime singleton (cached)
Loaded on first CU tool call
Native modules load here, throw on failure (no degraded mode)
cropRawPatch returns null (async limitation)

J. inputLoader.ts (Rust/enigo Wrapper)

Purpose: Lazy-load Rust/enigo native module.

Export path:

COMPUTER_USE_INPUT_NODE_PATH (baked by build-with-plugins.ts on darwin)
Falls through to node_modules prebuilds if unset

Dispatch model:

key() and keys() dispatch to DispatchQueue.main
Block tokio worker on channel until completion
Requires drainRunLoop() on libuv (Node/Bun)

K. mcpServer.ts (MCP Server Lifecycle)

Purpose: In-process MCP server for stdio transport.

Startup flow:

Singleton init: getComputerUseHostAdapter()
Package factory: createComputerUseMcpServer(adapter, coordinateMode)
App enumeration: tryGetInstalledAppNames() (1s timeout, soft fail)
Tool building: buildComputerUseTools(capabilities, coordinateMode, installedAppNames)
ListTools override: Include installed app names in request_access description

Subprocess entry point:

--computer-use-mcp spawns runComputerUseMcpServer()
StdioServerTransport
Exit on stdin EOF
Flush analytics before exit

L. setup.ts (Dynamic MCP Config Builder)

Purpose: Builds MCP config + allowedTools for main client.

Key architecture:

mcp__computer-use__* tool names added to allowedTools
Bypass normal permission prompts (package's request_access handles approval)
API backend detects these names, emits CU availability hint in system prompt

Config:

{
  type: 'stdio'
  command: process.execPath
  args: ['--computer-use-mcp'] (bundled) or ['.../cli.js', '--computer-use-mcp']
  scope: 'dynamic'
}

Why stdio never spawns: client.ts intercepts by name, uses in-process server.

M. swiftLoader.ts (Swift Native Module Wrapper)

Purpose: Load and cache @ant/computer-use-swift.

Four @MainActor methods (all require drainRunLoop):

screenshot.captureExcluding(allowedBundleIds, quality, targetW, targetH, displayId)
screenshot.captureRegion(allowedBundleIds, x, y, w, h, outW, outH, quality, displayId)
apps.listInstalled()
resolvePrepareCapture(...)

Key details:

Load once at factory time
Throws on non-darwin
Cached, no runtime reloading

2. SCREENSHOT CAPTURE PIPELINE

Full flow:

Request arrives at screenshot tool with displayId and allowlist
Size computation:
- display.getSize(displayId) → {width, height, scaleFactor}
- logical dims * scaleFactor = physical dims
- targetImageSize(physW, physH, API_RESIZE_PARAMS) → target dims
Filtering:
- allowedBundleIds passed (user-granted apps)
- Terminal (if detected) stripped from list
- Swift's captureExcluding takes ALLOW list
Capture via cu.screenshot.captureExcluding():
- SCContentFilter (ScreenCaptureKit) on macOS 13+
- Returns JPEG base64 + actual dims
API encoding: base64 → text block in API request

Quality: 75% JPEG compression (0.75 flag)

Exclusion: Terminal never captured (if detected), preventing photobomb.

3. MOUSE & KEYBOARD EXECUTION MODEL

Dispatch chain:

moveAndSettle(x, y) → move + 50ms sleep (HID round-trip)
withModifiers() → bracket press/release, swallow errors, reverse order release
input.mouseButton() or input.keys() → native layer
drainRunLoop() wraps main-queue-dispatching calls

Sleep timings:

MOVE_SETTLE_MS = 50ms: After mouse move, before click
8ms between repeated key presses (125Hz USB polling)
Type via clipboard: 100ms final sleep (paste vs restore race)

Modifier bracketing:

Tracks pressed keys to release only what succeeded
Finally block releases in reverse order
Errors swallowed (best-effort)

Escape special case:

isBareEscape([part]): Single element "escape" or "esc" (case-insensitive)
Calls notifyExpectedEscape() before key() to punch hole in CGEventTap
ctrl+escape passes through (flags not empty)

Clipboard path (for type() with viaClipboard: true):

Read saved content via pbpaste
Write new text via pbcopy
READ-BACK VERIFY: pbpaste again, fail if mismatch
Cmd+V via input.keys(['command','v'])
sleep(100ms): Paste effect vs restore race
Restore saved in finally

Drag internals:

Optional from: move to start position
Press mouse
sleep(50ms): Let HID tap register pressedMouseButtons
animatedMove() with animation enabled/disabled
Finally: always release (even if throw)

4. SAFETY BOUNDARIES & VALIDATION

Pre-action safeguards:

frontmost gate (in package):
- Checks if target app is frontmost before executing action
- Sentinel CLI_HOST_BUNDLE_ID never matches
- Safety net
prepareForAction:
- Hides non-allowlisted apps
- Returns hidden set for cleanup
- Errors logged but continue (frontmost gate still enforces)
request_access tool:
- Lists installed apps (filtered via appNames.ts)
- Requires explicit user approval
- Lock check: tools don't acquire lock (defers to first action tool)
Escape hotkey:
- Global Escape consumed by CGEventTap
- Prevents prompt-injected Escape from dismissing dialogs
- Model must opt-in via notifyExpectedEscape()
Pixel validation (sub-gate, default: false):
- Compares before/after screenshot patches
- Disabled: no sync image-processor in Node
- Designed fallback: skip validation
Clipboard guard (sub-gate, default: true):
- Round-trip verification: write → read-back
- Fails if mismatch
- Restore in finally, errors swallowed

5. ATTACK SURFACE & INJECTION VECTORS

Prompt Injection via App Names

Vector: Attacker installs app named "grant all permissions"
Mitigation:
- Character allowlist (no quotes, pipes, backticks)
- Trusted-app carve-out (Apple/Google/MS bypass filter)
- Length cap (40 chars), count cap (50 apps)
- Structural framing + explicit user approval dialog
Residual risk: Benign-sounding names hard to filter programmatically

Escape Key Injection

Vector: Model-synthesized Escape dismisses dialogs
Mitigation: CGEventTap consumes globally, model must opt-in
Residual risk: ctrl+escape might be intended as cancel, passes through (flags check)

Clipboard Injection

Vector: Clipboard write fails silently, junk pasted
Mitigation: Read-back verify, fail if mismatch
Residual risk: None if working correctly

Screenshot Exfiltration

Vector: Model exports screenshot to attacker-controlled endpoint
Mitigation: Base64 JPEG goes only to Anthropic API, terminal excluded
Residual risk: None if API boundary respected

App Enumeration

Vector: Model fingerprints installed software
Mitigation: None (feature inherent)
Residual risk: Reveals installed apps to Claude

Mouse Position Leakage

Vector: Model probes mouse position repeatedly to infer activity
Mitigation: None (feature inherent)
Residual risk: Reveals user activity

6. CONCURRENCY & STATE MANAGEMENT

Session isolation:

File-based lock prevents concurrent sessions
Each session has sessionId, pid, acquiredAt timestamp
Stale detection via PID liveness

AppState tracking:

computerUseMcpState.hiddenDuringTurn: Set of bundleIds hidden this turn
Populated by prepareForAction
Cleared by cleanup after turn
No cross-turn state

CFRunLoop pump:

Refcounted setInterval
Safe nesting: multiple drainRunLoop() calls share one pump
Pending count prevents premature stop

Hotkey registration:

Global state: let registered = false
Idempotent: registerEscHotkey() returns true if already registered
unregisterEscHotkey() swallows errors, always releasePump

7. ERROR HANDLING & FALLBACKS

prepareForAction failure: Logged, action continues (frontmost gate enforces)

Escape hotkey registration failure: Logged, CU proceeds without abort

Clipboard read-back mismatch: Throws, never pastes, restore in finally

App enumeration timeout: Soft fail, tool description omits list

Unhide timeout (5s): Best-effort, timer cleared regardless

Modifier release in finally: Errors swallowed (best-effort)

Orphaned promise cleanup: Late rejection swallowed with .catch(() => {})

8. PERFORMANCE CHARACTERISTICS

Load time:

Swift: Once at executor factory (first CU tool call)
Input: Lazy on first mouse/keyboard
Cleanup: Zero cost for non-CU turns (dynamic import)

CPU overhead:

CFRunLoop pump: 1ms setInterval while pending
DrainRunLoop: 30s timeout ceiling

Latency:

Mouse move → 50ms settle = 50ms minimum per action
Drag animation: Distance-proportional, max 0.5s
Keyboard: 8ms between repeats
Type via clipboard: 100ms final sleep

Screenshot latency:

Spotlight enumeration: 1s timeout
SCContentFilter capture: 100-500ms (platform-dependent)
JPEG encode: 75% quality

9. NOVEL ENGINEERING DECISIONS

Terminal as surrogate host: Clever solution to exclude terminal from screenshots while keeping it unhidden
Read-back clipboard verify: Robust defense against silent clipboard write failures
Animated drag with intermediate frames: Targets apps that require .leftMouseDragged sequence
Orphaned promise swallowing: Prevents unhandledRejection from timeout race
Feature gate freezing: coordinateMode frozen at session start to prevent model/executor mismatch
Refcounted pump: Safe nesting of main-queue dispatch via retain/release pattern
Non-acquiring lock check: defers acquiring until first action tool
Stale lock recovery: O_EXCL atomic create handles race between recovery attempts

10. TECHNICAL DEBT & LIMITATIONS

cropRawPatch returns null: Pixel validation disabled due to lack of sync image processor
No Windows/Linux support: macOS-only (platform guard in executor.ts)
Hard-coded 30s timeout: drainRunLoop ceiling not configurable
Terminal detection fallback: May misidentify in ssh/tmux scenarios
App filtering count cap: Max 50 apps shown even if 200+ installed
Escape hotkey permission gating: No warning if CGEvent.tapCreate fails

Summary

The Claude Code computer use system is a well-engineered, production-quality macOS desktop automation layer. Key strengths are modular architecture, thoughtful safety layering, correct concurrency handling, and graceful error degradation. Residual risks are prompt injection via app names and keyboard/screen data exposure (inherent to feature), mitigated by user approval gates. The implementation demonstrates sophisticated macOS native API usage, particularly CFRunLoop pumping and CGEventTap interception.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Claude Code Computer Use System - Comprehensive Reverse Engineering Report

SYSTEM ARCHITECTURE OVERVIEW

1. FILE-BY-FILE BREAKDOWN (13 Files)

A. appNames.ts (Prompt Injection Hardening)

B. computerUseLock.ts (Session Serialization)

C. cleanup.ts (Turn-End Cleanup)

D. common.ts (Constants & Sentinel Values)

E. drainRunLoop.ts (CFRunLoop Pump)

F. escHotkey.ts (System Escape Interception)

G. executor.ts (Main ComputerExecutor Implementation)

H. gates.ts (Feature Gates & Configuration)

I. hostAdapter.ts (Dependency Injection Container)

J. inputLoader.ts (Rust/enigo Wrapper)

K. mcpServer.ts (MCP Server Lifecycle)

L. setup.ts (Dynamic MCP Config Builder)

M. swiftLoader.ts (Swift Native Module Wrapper)

2. SCREENSHOT CAPTURE PIPELINE

3. MOUSE & KEYBOARD EXECUTION MODEL

4. SAFETY BOUNDARIES & VALIDATION

5. ATTACK SURFACE & INJECTION VECTORS

Prompt Injection via App Names

Escape Key Injection

Clipboard Injection

Screenshot Exfiltration

App Enumeration

Mouse Position Leakage

6. CONCURRENCY & STATE MANAGEMENT

7. ERROR HANDLING & FALLBACKS

8. PERFORMANCE CHARACTERISTICS

9. NOVEL ENGINEERING DECISIONS

10. TECHNICAL DEBT & LIMITATIONS

Summary

FilesExpand file tree

computer-use-deep-dive.md

Latest commit

History

computer-use-deep-dive.md

File metadata and controls

Claude Code Computer Use System - Comprehensive Reverse Engineering Report

SYSTEM ARCHITECTURE OVERVIEW

1. FILE-BY-FILE BREAKDOWN (13 Files)

A. appNames.ts (Prompt Injection Hardening)

B. computerUseLock.ts (Session Serialization)

C. cleanup.ts (Turn-End Cleanup)

D. common.ts (Constants & Sentinel Values)

E. drainRunLoop.ts (CFRunLoop Pump)

F. escHotkey.ts (System Escape Interception)

G. executor.ts (Main ComputerExecutor Implementation)

H. gates.ts (Feature Gates & Configuration)

I. hostAdapter.ts (Dependency Injection Container)

J. inputLoader.ts (Rust/enigo Wrapper)

K. mcpServer.ts (MCP Server Lifecycle)

L. setup.ts (Dynamic MCP Config Builder)

M. swiftLoader.ts (Swift Native Module Wrapper)

2. SCREENSHOT CAPTURE PIPELINE

3. MOUSE & KEYBOARD EXECUTION MODEL

4. SAFETY BOUNDARIES & VALIDATION

5. ATTACK SURFACE & INJECTION VECTORS

Prompt Injection via App Names

Escape Key Injection

Clipboard Injection

Screenshot Exfiltration

App Enumeration

Mouse Position Leakage

6. CONCURRENCY & STATE MANAGEMENT

7. ERROR HANDLING & FALLBACKS

8. PERFORMANCE CHARACTERISTICS

9. NOVEL ENGINEERING DECISIONS

10. TECHNICAL DEBT & LIMITATIONS

Summary