Skip to content
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
adbaf2d
boilerplate
ollmer Jun 24, 2025
f29f048
fix import
ollmer Jun 24, 2025
30742cf
default args in the dataclass
ollmer Jun 24, 2025
a8b29cc
script to test that osworld works in the ubuntu with docker
ollmer Jun 25, 2025
9e07d8f
update makefile to setup os world development.
amanjaiswal73892 Jun 25, 2025
1f5d1da
install osworld through make command instead of requirements as we tw…
ollmer Jun 26, 2025
41e5298
ignore osworld tmp folders
ollmer Jun 26, 2025
5448b48
osworld bench tasks loading
ollmer Jun 26, 2025
77c1d1c
fmt
ollmer Jun 26, 2025
04505ec
osworld eval entrypoint and fixes
ollmer Jun 26, 2025
1b4fccf
osworld action set boilerplate
ollmer Jun 26, 2025
20524da
boilerplate for obs conversion
ollmer Jun 26, 2025
3728a3b
add convert_obs to env reset function
amanjaiswal73892 Jun 27, 2025
475a3f3
hardcoded os-world env in tool use agent (Refactor later)
amanjaiswal73892 Jun 27, 2025
11edee5
add timing decorator to step function for action execution metrics
amanjaiswal73892 Jun 27, 2025
42b38ca
pre-alpha initial working agent on os-world with desktop_env action s…
amanjaiswal73892 Jun 27, 2025
928fbaf
Add TODO's
amanjaiswal73892 Jun 30, 2025
3bf6c0a
Update TODO's
amanjaiswal73892 Jun 30, 2025
725546c
claude and oai config for osworld agent
ollmer Jul 2, 2025
c1ec395
enforce format and stricter type checks
ollmer Jul 2, 2025
3608c43
pass action set through the agent config
ollmer Jul 2, 2025
585d9f7
Add set_benchmark for tool-use agent to use os_world obs preprocessor.
amanjaiswal73892 Jul 2, 2025
bdb7ab1
Merge remote branch 'main' into osworld
amanjaiswal73892 Jul 3, 2025
a40aa42
Update Claude agent config to include axtree and obs history
amanjaiswal73892 Jul 4, 2025
400b947
Add osworld axtree preprocessing
amanjaiswal73892 Jul 4, 2025
2d7d5a2
Add max_steps parameter to OsworldGym and OsworldEnvArgs for step lim…
amanjaiswal73892 Jul 4, 2025
0dbb9dd
Add env.evaluate for episode evaluation
amanjaiswal73892 Jul 4, 2025
7f6b6c9
Refactor observation conversion, add axtree and remove Todos.
amanjaiswal73892 Jul 4, 2025
ce0b2d0
Add computer_13 action space tools definitions in OsworldGym and remo…
amanjaiswal73892 Jul 4, 2025
024935e
update run_osworld to use small test set and one task
amanjaiswal73892 Jul 4, 2025
1a0c483
Fix: Update tool call identifier key in Xray [for debugging only]
amanjaiswal73892 Jul 4, 2025
912932d
more progress logging
ollmer Jul 7, 2025
8ce45b8
debug parallel task
ollmer Jul 7, 2025
2911da5
Add method to fix settings file path in task configuration
amanjaiswal73892 Jul 7, 2025
ddf1d00
7 simple osworld tasks for debug
ollmer Jul 8, 2025
49fac6c
use subset of simple tasks during debug run
ollmer Jul 8, 2025
2b79b50
Temp commit for xray [Update toolagent config to be primitive types]
amanjaiswal73892 Jul 8, 2025
815893c
record task video, wait 60 sec after reset just as osworld own agent
ollmer Jul 9, 2025
7449033
put video recording under flag, lint
ollmer Jul 10, 2025
d7401bf
lint
ollmer Jul 10, 2025
7387922
Merge branch 'main' into osworld
ollmer Jul 10, 2025
cf4b277
refactor: rename use_osworld_obs_preprocessor to skip_preprocessing f…
amanjaiswal73892 Jul 10, 2025
bb38053
Remove 'action_set' from index_black_list in load_result_df and make …
amanjaiswal73892 Jul 10, 2025
63d141b
fix: rename COMPUTER_13_ACTIONS_OAI_RESPONSE_TOOLS to COMPUTER_13_ACT…
amanjaiswal73892 Jul 10, 2025
d36709a
update run_osworld.py with study relaunch capability and setup readme
amanjaiswal73892 Jul 10, 2025
9748ec3
update TODO and black refactor
amanjaiswal73892 Jul 10, 2025
725854b
Rename tool conversion function
amanjaiswal73892 Jul 11, 2025
a22eaed
bug fix to_tool_desc and refactor
amanjaiswal73892 Jul 11, 2025
8c2d469
Add tests
amanjaiswal73892 Jul 11, 2025
f740812
Black and darglint
amanjaiswal73892 Jul 11, 2025
532047a
Merge remote-tracking branch 'origin/main' into osworld
amanjaiswal73892 Jul 11, 2025
d2d59bc
more black
amanjaiswal73892 Jul 11, 2025
4f14015
Update osworld to be skipped if desktop_env not available
amanjaiswal73892 Jul 11, 2025
896e89a
add selective import for osworld module and tests.
amanjaiswal73892 Jul 11, 2025
8bee45f
black formatting again
amanjaiswal73892 Jul 11, 2025
60d7ce2
Add OSWorld benchmark to README
amanjaiswal73892 Jul 11, 2025
2afb28b
Remove commented code.
amanjaiswal73892 Jul 15, 2025
b0d4a99
Merge remote-tracking branch 'origin' into osworld
amanjaiswal73892 Jul 15, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 21 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
.PHONY: test setup miniwob lint stop-miniwob
.PHONY: test setup miniwob lint stop-miniwob osworld

setup:
@pip install -e .
Expand Down Expand Up @@ -30,3 +30,23 @@ test: setup miniwob check-miniwob run-tests stop-miniwob
lint: setup
@black src/ --check --diff
@darglint -v 2 -z short src/

osworld:
@echo "Setting up OSWorld..."
@git clone https://github.com/xlang-ai/OSWorld || true
@echo "Modifying OSWorld requirements.txt to remove pinned versions..."
@cd OSWorld && \
sed -i.bak 's/numpy~=.*/numpy/' requirements.txt && \

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

neat trick!

sed -i.bak 's/torch~=.*/torch/' requirements.txt && \
sed -i.bak 's/torch$$/torch/' requirements.txt && \
sed -i.bak 's/tqdm~=.*/tqdm/' requirements.txt && \
sed -i.bak 's/pandas~=.*/pandas/' requirements.txt
@echo "Installing OSWorld requirements..."
@cd OSWorld && pip install -r requirements.txt
@echo "Installing OSWorld in development mode..."
@cd OSWorld && pip install -e .
@echo "OSWorld setup completed!"
@echo "Next steps:"
@echo "1. Configure your VM (VMware/VirtualBox) according to OSWorld documentation"
@echo "2. Download or set up the Ubuntu VM image"
@echo "3. Run AgentLab with OSWorld tasks"
45 changes: 45 additions & 0 deletions osworld_docker_test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
import logging
from desktop_env.desktop_env import DesktopEnv

logging.basicConfig(
level=logging.INFO,
format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
handlers=[logging.StreamHandler()],
)

example = {
"id": "94d95f96-9699-4208-98ba-3c3119edf9c2",
"instruction": "I want to install Spotify on my current system. Could you please help me?",
"config": [
{
"type": "execute",
"parameters": {
"command": [
"python",
"-c",
"import pyautogui; import time; pyautogui.click(960, 540); time.sleep(0.5);"
]
}
}
],
"evaluator": {
"func": "check_include_exclude",
"result": {
"type": "vm_command_line",
"command": "which spotify"
},
"expected": {
"type": "rule",
"rules": {
"include": ["spotify"],
"exclude": ["not found"]
}
}
}
}

env = DesktopEnv(action_space="pyautogui", provider_name="docker", os_type="Ubuntu")

obs = env.reset(task_config=example)
obs, reward, done, info = env.step("pyautogui.rightClick()")
print(obs)
3 changes: 2 additions & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -26,4 +26,5 @@ matplotlib
ray[default]
python-slugify
pillow
gymnasium>=0.27
gymnasium>=0.27
desktop-env~=0.1.22
110 changes: 110 additions & 0 deletions src/agentlab/benchmarks/osworld.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
import logging
from dataclasses import dataclass
from typing import Any

from desktop_env.desktop_env import DesktopEnv
from distributed.protocol.cupy import d

from agentlab.benchmarks.abstract_env import AbstractBenchmark, AbstractEnv, AbstractEnvArgs

logger = logging.getLogger(__name__)


class OsworldGym(AbstractEnv):
def __init__(
self,
task: dict,
provider_name: str,
region: str | None,
path_to_vm: str | None,
snapshot_name: str,
action_space: str,
cache_dir: str,
screen_size: tuple[int, int],
headless: bool,
require_a11y_tree: bool,
require_terminal: bool,
os_type: str,
enable_proxy: bool,
):
self.task = task
self.env_info = {
"provider_name": provider_name,
"region": region,
"path_to_vm": path_to_vm,
"snapshot_name": snapshot_name,
"action_space": action_space,
"cache_dir": cache_dir,
"screen_size": screen_size,
"headless": headless,
"require_a11y_tree": require_a11y_tree,
"require_terminal": require_terminal,
"os_type": os_type,
"enable_proxy": enable_proxy,
}
self.env = DesktopEnv(
action_space=action_space,
provider_name=provider_name,
region=region, # type: ignore
path_to_vm=path_to_vm, # type: ignore
snapshot_name=snapshot_name,
cache_dir=cache_dir,
screen_size=screen_size, # type: ignore
headless=headless,
require_a11y_tree=require_a11y_tree,
require_terminal=require_terminal,
os_type=os_type,
)

def reset(self, seed: int | None = None) -> tuple[dict[str, Any], dict[str, Any]]:
obs = self.env.reset(task_config=self.task, seed=seed)
return obs, self.env_info

def step(self, action: str):
obs, reward, done, info = self.env.step(action)
truncated = False
return obs, reward, done, truncated, info

def close(self):
return self.env.close()


@dataclass
class OsworldEnvArgs(AbstractEnvArgs):
task: dict[str, Any]
path_to_vm: str | None = None
provider_name: str = "vmware" # path to .vmx file
region: str = "us-east-1" # AWS specific, does not apply to all providers
snapshot_name: str = "init_state" # snapshot name to revert to
action_space: str = "computer_13" # "computer_13" | "pyautogui"
cache_dir: str = "cache"
screen_size: tuple[int, int] = (1920, 1080)
headless: bool = False
require_a11y_tree: bool = True
require_terminal: bool = False
os_type: str = "Ubuntu"
enable_proxy: bool = False

def make_env(self) -> OsworldGym:
logger.info(f"Creating OSWorld Gym with task: {self.task}")
gym = OsworldGym(
task=self.task,
provider_name=self.provider_name,
region=self.region,
path_to_vm=self.path_to_vm,
snapshot_name=self.snapshot_name,
action_space=self.action_space,
cache_dir=self.cache_dir,
screen_size=self.screen_size,
headless=self.headless,
require_a11y_tree=self.require_a11y_tree,
require_terminal=self.require_terminal,
os_type=self.os_type,
enable_proxy=self.enable_proxy,
)
return gym


class OsworldBenchmark(AbstractBenchmark):
name: str = "osworld"
env_args_list: list[OsworldEnvArgs]
Loading