Skip to content

Latest commit

Β 

History

History
177 lines (121 loc) Β· 6.09 KB

File metadata and controls

177 lines (121 loc) Β· 6.09 KB

Examples

Example task files and Dockerfiles for desktest.

Task Files

gedit-save.json β€” GTK App (Folder Deploy)

A simple test that opens a text file in gedit, adds a line, and saves. Uses the folder app deploy type with a local application directory.

desktest run examples/gedit-save.json
desktest run examples/gedit-save.json --monitor   # Watch live at http://localhost:7860
desktest interactive examples/gedit-save.json

libreoffice-calc.json β€” Custom Docker Image

A spreadsheet test that enters values and a formula in LibreOffice Calc. Uses the docker_image app type with a pre-built custom image.

# Build the custom image first
docker build -t tent-libreoffice:latest -f examples/Dockerfile.libreoffice .

# Run the test
desktest run examples/libreoffice-calc.json

# Or interactively
desktest interactive examples/libreoffice-calc.json

electron-todo.json β€” Electron App (Folder Deploy)

A minimal Electron todo app that demonstrates testing Electron applications. Uses the folder app deploy type with electron: true for Node.js support.

# Build the electron Docker image first
docker build -t desktest-desktop:latest docker/
docker build -f docker/Dockerfile.electron -t desktest-desktop:electron docker/

# Run the test
desktest run examples/electron-todo.json

See ELECTRON_QUICKSTART.md for a complete guide to testing Electron apps.

multi-app-terminal-gedit.json β€” Multi-App Workflow (Hard)

A harder test that exercises multi-app coordination: curl a CSV from a local HTTP server in a terminal, open it in gedit, find-and-replace ERROR→FIXED, save, and verify with grep. Tests app switching, dialog navigation, terminal interaction, and multi-step evaluation (4 metrics).

desktest run examples/multi-app-terminal-gedit.json
desktest run examples/multi-app-terminal-gedit.json --monitor

macos-textedit.json β€” macOS TextEdit (Tart VM)

Tests basic text editing on macOS inside a Tart VM. Requires Apple Silicon, Tart, and a golden image prepared with desktest init-macos.

desktest run examples/macos-textedit.json --config config.json

macos-electron.json β€” macOS Electron App (Tart VM)

Deploys and tests an Electron todo app inside a Tart VM. Requires the desktest-macos-electron:latest golden image.

desktest run examples/macos-electron.json --config config.json

macos-native-textedit.json β€” macOS TextEdit (Native, No VM)

Same TextEdit test but using macos_native mode β€” runs directly on the host macOS desktop with no VM isolation. Useful for quick local iteration without setting up Tart. Requires a local desktop session (not SSH) with Accessibility, Automation, and Screen Recording permissions granted.

desktest run examples/macos-native-textedit.json --config config.json

See docs/macos-support.md for the full macOS testing guide.

windows-calculator.json β€” Windows Calculator (QEMU/KVM VM)

Tests basic Windows Calculator interaction inside a QEMU/KVM VM. Requires a Linux host with KVM, QEMU, and a golden image prepared with desktest init-windows.

desktest run examples/windows-calculator.json --config config.json

See dev-docs/windows-ci-guide.md for the full Windows testing guide.

Custom Docker Images

Dockerfile.libreoffice shows how to create a compatible custom image.

Required Dependencies

Custom images must include these packages for desktest to work:

Category Packages
Display xvfb, x11vnc, xfce4, xfce4-terminal
Tools scrot, xdotool, ffmpeg
Accessibility at-spi2-core, libatspi2.0-0
Python python3, python3-pyautogui, python3-xlib, python3-pyatspi, python3-pyperclip
Clipboard xclip
D-Bus dbus, dbus-x11

Custom images must also create ~/.Xauthority for the tester user. Without it, PyAutoGUI will crash with Xlib.error.XauthError. Add this after USER tester:

RUN touch /home/tester/.Xauthority

You must also copy the helper scripts from docker/:

  • docker/get-a11y-tree.py β†’ /usr/local/bin/get-a11y-tree
  • docker/execute-action.py β†’ /usr/local/bin/execute-action
  • docker/entrypoint.sh β†’ /usr/local/bin/entrypoint.sh

Validation

desktest validates custom images at startup. If a required dependency is missing, it exits with code 2 and a clear error message.

# Validate a task file without running
desktest validate examples/libreoffice-calc.json

QA Mode

Any example can be run with --qa to enable bug reporting. The agent will complete its task while also watching for application bugs:

desktest run examples/gedit-save.json --qa

Bug reports are written as markdown files in desktest_artifacts/bugs/. Each report includes a summary, reproduction steps, screenshot references, and diagnostic evidence gathered via bash commands.

Live Monitoring

Any example can be run with the --monitor flag to open a real-time web dashboard:

# Single test with live dashboard
desktest run examples/gedit-save.json --monitor

# Suite with progress tracking
desktest suite examples/ --monitor

# Custom port
desktest run examples/gedit-save.json --monitor --monitor-port 8080

Open http://localhost:7860 in your browser to watch the agent's screenshots, thoughts, and actions stream in as each step completes. The dashboard uses the same UI as desktest review.

Task JSON Schema

See src/task.rs for the full schema definition. Key fields:

{
  "schema_version": "1.0",
  "id": "unique-test-id",
  "instruction": "What the agent should do",
  "completion_condition": "Optional β€” when the agent should consider the task done",
  "app": { "type": "appimage|folder|docker_image|vnc_attach|macos_tart|macos_native|windows_vm|windows_native", "..." : "..." },
  "config": [ { "type": "execute|copy|open|sleep", "..." : "..." } ],
  "evaluator": {
    "mode": "llm|programmatic|hybrid",
    "metrics": [ { "type": "file_exists|command_output|...", "..." : "..." } ]
  },
  "timeout": 120
}