Two model backends are supported, each targeting a different hardware setup:
| Backend | Model | Browser mode | Hardware requirement |
|---|---|---|---|
llama_cpp |
OpAgent-32B-Q4 (INT4 quantized) | Headed (visible window) | Local machine with a single 24 GB GPU |
vllm |
codefuse-ai/OpAgent-32B (full precision) | Headless (screenshot-only) | GPU server, e.g. 1× A100 80 GB |
The browser mode is automatically chosen based on the backend:
- llama_cpp opens a real, visible browser window so you can watch the agent in real time. Useful for local development and debugging.
- vllm runs a headless browser on a server with no display. The agent drives the browser entirely through screenshots.
Both modes save a full trajectory (screenshots + annotated screenshots + JSON log) after every run.
opagent_single_model/
├── action_executor.py # Browser action parser/executor (shared)
├── browser_runtime.py # BrowserSession + TrajectoryRecorder (shared)
├── model_interface_llama_cpp.py # INT4 backend — talks to llama-server HTTP API
├── model_interface_vllm.py # Full-precision backend — vLLM in-process inference
├── main.py # CLI entry point, --backend flag switches backends
├── requirements.txt
└── static/
cd opagent_single_model
pip install -r requirements.txt
playwright install --with-deps chromiumDownload the INT4-quantized GGUF model:
- 🤖 ModelScope: codefuse-ai/OpAgent-32B-Q4
Build llama.cpp and start the server:
# Build (static, no .so dependency issues)
git clone --depth 1 https://github.com/ggml-org/llama.cpp.git
cd llama.cpp && mkdir build && cd build
cmake .. -DBUILD_SHARED_LIBS=OFF # add -DGGML_CUDA=ON for GPU
cmake --build . --config Release --target llama-server -j$(nproc)
# Start server
./bin/llama-server \
--model /path/to/OpAgent.Q4_K_M.gguf \
--mmproj /path/to/mmproj-f16.gguf \
--port 18080 \
--ctx-size 4096 \
--n-gpu-layers 99# Headed browser (default) — a visible Chrome window opens
python main.py --backend llama_cpp
# Override llama-server URL if not on default port
OPAGENT_LLAMA_SERVER_URL=http://127.0.0.1:18080 python main.py --backend llama_cpp
# Force headless even with llama_cpp
python main.py --backend llama_cpp --headless
# Non-interactive single task
python main.py --backend llama_cpp \
--url https://www.amazon.com \
--task "buy a physical PS5 copy of Elden Ring"| Variable | Default | Description |
|---|---|---|
OPAGENT_LLAMA_SERVER_URL |
http://127.0.0.1:18080 |
llama-server base URL |
OPAGENT_LLAMA_MODEL_PATH |
~/workspace/OpAgent/OpAgent-32B-Q4/OpAgent.Q4_K_M.gguf |
Local GGUF path (used for health check) |
OPAGENT_LLAMA_MAX_TOKENS |
512 |
Max tokens per model call |
OPAGENT_LLAMA_TEMPERATURE |
0 |
Sampling temperature |
OPAGENT_LLAMA_TIMEOUT |
600 |
HTTP request timeout (seconds) |
- 🤖 Hugging Face: codefuse-ai/OpAgent-32B
- 🤖 ModelScope: codefuse-ai/OpAgent-32B
The model is loaded in-process by vLLM. The browser is always headless (no display on server).
# Basic — single GPU
python main.py --backend vllm
# Multi-GPU tensor parallelism
python main.py --backend vllm --tensor-parallel-size 2
# Custom model path and memory settings
python main.py --backend vllm \
--model-path /path/to/local/model \
--max-model-len 16384 \
--gpu-memory-utilization 0.95
# Non-interactive single task
python main.py --backend vllm \
--url https://www.amazon.com \
--task "buy a physical PS5 copy of Elden Ring"| Option | Default | Description |
|---|---|---|
--model-path |
codefuse-ai/OpAgent |
Local path or HuggingFace repo |
--tensor-parallel-size, -tp |
1 |
Number of GPUs |
--max-model-len |
32768 |
Max sequence length |
--gpu-memory-utilization |
0.9 |
GPU memory fraction |
python main.py --backend <vllm|llama_cpp> [OPTIONS]
--backend, -b vllm or llama_cpp (required)
--output, -o PATH Output directory for trajectory files
--max-steps, -m INT Maximum steps per task (default: 50)
--headless Force headless browser (llama_cpp defaults to headed)
--url URL Task start URL — enables non-interactive mode
--task TEXT Task description — enables non-interactive modeWhen --url / --task are not given, the agent enters an interactive loop:
🌐 Enter task URL (Enter = current page, 'quit' to exit):
🎯 Enter task description:
- Enter a URL (or press Enter to keep the current page)
- Describe the task
- The agent executes, saving a trajectory to the output directory
Type quit, exit, or q to end the session.
Every run saves:
output/YYYYMMDD_HHMMSS/HHMMSS/
├── screenshots/ # Raw PNG screenshots per step
├── annotated/ # Screenshots with action overlay (crosshair, step label)
└── trajectory.json # Full step-by-step log (URL, think, action, params, error)