Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 36 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
name: CI

on:
push:
branches: [main, master]
pull_request:
branches: [main, master]

jobs:
test:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v4
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- uses: actions/checkout@v4
- uses: actions/checkout@v6

- uses: actions/setup-python@v4
with:
python-version: "3.11"
Comment on lines +15 to +17
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Update actions/setup-python to v5.

The workflow uses actions/setup-python@v4, which is flagged by actionlint as too old to run on current GitHub Actions runners. This may cause the workflow to fail.

🔎 Proposed fix
-      - uses: actions/setup-python@v4
+      - uses: actions/setup-python@v5
         with:
           python-version: "3.11"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- uses: actions/setup-python@v4
with:
python-version: "3.11"
- uses: actions/setup-python@v5
with:
python-version: "3.11"
🧰 Tools
🪛 actionlint (1.7.9)

15-15: the runner of "actions/setup-python@v4" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🤖 Prompt for AI Agents
In .github/workflows/ci.yml around lines 15 to 17 the workflow references
actions/setup-python@v4 which is outdated; update the action to
actions/setup-python@v5 and keep the existing python-version input (e.g.,
"3.11"). Edit the uses line to point to the v5 tag and run the workflow to
verify the runner accepts the updated action.


- name: Install dev requirements
run: |
python -m pip install --upgrade pip
pip install -r requirements-dev.txt

- name: Run linters
env:
PYTHONPATH: ${{ github.workspace }}
run: |
python -m black --check .
python -m isort --check-only .
python -m flake8 .

- name: Run tests
env:
PYTHONPATH: ${{ github.workspace }}
run: |
python -m pytest -q
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -161,4 +161,7 @@ cython_debug/
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/
logs/
.DS_Store
.DS_Store

# Local env file for secrets
.env
Comment on lines +164 to +167
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove duplicate .env entry; consolidate with existing Environments section.

The .env pattern already exists at line 125 within the "Environments" section. The duplicate entry at line 167 should be removed to maintain DRY and improve .gitignore clarity.

Consider either:

  1. Removing lines 166–167 entirely (redundant since .env is already covered), or
  2. Optionally adding an inline comment at line 125 if the intent is to emphasize secrets exclusion.
🔎 Proposed fix: remove duplicate entry
 logs/
 .DS_Store
-
-# Local env file for secrets
-.env
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
.DS_Store
# Local env file for secrets
.env
.DS_Store
🤖 Prompt for AI Agents
In .gitignore around lines 164 to 167, there's a duplicate ".env" entry that is
already listed in the "Environments" section (~line 125); remove the redundant
".env" entry (delete lines 166–167) so the file remains DRY, or if emphasis is
desired add a short inline comment at the original entry instead of duplicating
it.

20 changes: 20 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
repos:
- repo: https://github.com/psf/black
rev: 25.11.0
hooks:
- id: black
language_version: python3.11
- repo: https://github.com/PyCQA/isort
rev: 5.12.0
hooks:
- id: isort
- repo: https://github.com/pre-commit/mirrors-flake8
rev: 7.1.0
hooks:
- id: flake8
args: ["--max-line-length=88", "--extend-ignore=E203,W503"]
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.6.0
hooks:
- id: end-of-file-fixer
- id: trailing-whitespace
Comment on lines +1 to +20
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

latest stable version Black Python formatter 2024 2025

💡 Result:

The latest stable Black release is 25.12.0 (released Dec 8, 2025). [1][2]

Sources:

  • PyPI: black 25.12.0 release info. [1]
  • Black changelog / docs (25.12.0). [2]

🌐 Web query:

latest stable version isort Python 2024 2025

💡 Result:

Latest stable isort: 7.0.0 — released Oct 11, 2025. [1][2]

Sources:
[1] PyPI isort project page.
[2] Safety DB package entry for isort.


🌐 Web query:

latest stable version Flake8 linter 2024 2025

💡 Result:

The latest stable Flake8 release is 7.3.0 (released June 20, 2025). [1][2]

Sources:
[1] PyPI — flake8 release page.
[2] Flake8 documentation (release/FAQ pages).


🌐 Web query:

latest stable version pre-commit-hooks 2024 2025

💡 Result:

  • Latest stable (as of Dec 24, 2025): pre-commit-hooks v6.0.0. [1][2]
  • Latest stable in 2024 (most-recent 2024 series): v4.6.0 (the 4.x line). [2]

Sources:
[1] PyPI — pre-commit-hooks (release files show v6.0.0 uploaded Aug 9, 2025).
[2] GitHub — pre-commit/pre-commit-hooks releases/tags (shows v6.0.0, v5.0.0, v4.6.0, etc.).


Update pre-commit hook versions to current stable releases.

The pre-commit configuration is well-structured and Flake8 arguments correctly accommodate Black's formatting (line length 88, ignoring E203 and W503).

However, all hook versions are outdated:

  • Black: update from 25.11.0 to 25.12.0
  • isort: update from 5.12.0 to 7.0.0 (major version upgrade)
  • Flake8: update from 7.1.0 to 7.3.0
  • pre-commit-hooks: update from v4.6.0 to v6.0.0 (major version upgrade)

Verify compatibility with your codebase when updating isort and pre-commit-hooks due to their major version changes.

🤖 Prompt for AI Agents
.pre-commit-config.yaml lines 1-20: update the hook revisions to the requested
newer stable releases; change black rev to 25.12.0, isort rev to 7.0.0,
mirrors-flake8 rev to 7.3.0, and pre-commit-hooks rev to v6.0.0, then run
pre-commit autoupdate or reinstall hooks and run the test suite/linting to
verify isort and pre-commit-hooks major-version compatibility with the codebase.

6 changes: 6 additions & 0 deletions requirements-dev.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
pytest
pillow
black
flake8
isort
pre-commit
36 changes: 36 additions & 0 deletions tests/test_code_agent.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
from gui_agents.s3.agents.code_agent import extract_code_block, execute_code


class DummyEnvController:
def __init__(self):
pass

def run_python_script(self, code):
# emulate running python code
if "print(" in code:
return {"status": "success", "output": "printed", "returncode": 0}
return {"status": "success", "output": "ok", "returncode": 0}

def run_bash_script(self, code, timeout=30):
return {"status": "success", "output": code, "returncode": 0}


def test_extract_code_block():
s = "Some text ```python\nprint(1)\n``` more"
t, code = extract_code_block(s)
assert t == "python"
assert "print(1)" in code


def test_execute_code_python():
controller = DummyEnvController()
res = execute_code("python", "print(1)", controller)
assert res["status"] == "success"
assert "output" in res


def test_execute_code_bash():
controller = DummyEnvController()
res = execute_code("bash", "echo hi", controller)
assert res["status"] == "success"
assert res["output"] == "echo hi"
142 changes: 142 additions & 0 deletions tests/test_smoke.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
import importlib
import io
import sys

from PIL import Image


# ---- Insert lightweight dummy modules to avoid heavy external deps at import time ----
class DummyPytesseractModule:
Output = type("Output", (), {})()

@staticmethod
def image_to_data(image, output_type=None):
# Return minimal dict expected by grounding.get_ocr_elements
return {
"text": [],
"left": [],
"top": [],
"width": [],
"height": [],
"block_num": [],
}


sys.modules.setdefault("pytesseract", DummyPytesseractModule)


class DummyPyAutoGUI:
def size(self):
return (100, 100)

def screenshot(self):
return Image.new("RGB", (100, 100))

def press(self, *args, **kwargs):
pass

def click(self, *args, **kwargs):
pass

def hotkey(self, *args, **kwargs):
pass


sys.modules.setdefault("pyautogui", DummyPyAutoGUI())

# ---- Monkeypatch LMMAgent to avoid external LLM calls ----
import gui_agents.s3.core.mllm as mllm # noqa: E402


class FakeLMMAgent:
def __init__(self, engine_params=None, system_prompt=None, engine=None):
self.messages = []
self.system_prompt = system_prompt or "You are a helpful assistant."

def reset(self):
self.messages = [
{
"role": "system",
"content": [{"type": "text", "text": self.system_prompt}],
}
]

def add_system_prompt(self, prompt):
self.system_prompt = prompt

def add_message(self, text_content=None, image_content=None, role=None, **kwargs):
self.messages.append(
{
"role": role or "user",
"content": [{"type": "text", "text": text_content}],
}
)

def get_response(self, *args, **kwargs):
# Return a response that contains a single valid action: agent.wait
return "<thoughts>thinking</thoughts><answer>```python\nagent.wait(1.333)\n```</answer>"


mllm.LMMAgent = FakeLMMAgent
import gui_agents.s3.agents.code_agent as _code_agent
_code_agent.LMMAgent = FakeLMMAgent
import gui_agents.s3.agents.grounding as _grounding
_grounding.LMMAgent = FakeLMMAgent


def _create_screenshot_bytes():
img = Image.new("RGB", (100, 100), color=(73, 109, 137))
buf = io.BytesIO()
img.save(buf, format="PNG")
return buf.getvalue()


def test_agent_smoke_flow():
from gui_agents.s3.agents.agent_s import AgentS3
from gui_agents.s3.agents.grounding import OSWorldACI

screenshot = _create_screenshot_bytes()

grounding = OSWorldACI(
env=None,
platform="linux",
engine_params_for_generation={"engine_type": "mock"},
engine_params_for_grounding={
"engine_type": "mock",
"grounding_width": 100,
"grounding_height": 100,
},
width=100,
height=100,
)

agent = AgentS3(
worker_engine_params={"engine_type": "mock", "model": "gpt-4o"},
grounding_agent=grounding,
platform="linux",
)

info, actions = agent.predict(
instruction="Wait a bit", observation={"screenshot": screenshot}
)

assert isinstance(actions, list) and len(actions) > 0
assert "time.sleep" in actions[0]


def test_cli_help_runs_ok():
# ensure cli module can be imported with dummy pyautogui in sys.modules
cli = importlib.import_module("gui_agents.s3.cli_app")

# Running help should exit with code 0
import sys as _sys

prev_argv = _sys.argv.copy()
try:
_sys.argv = ["agent_s", "--help"]
try:
cli.main()
except SystemExit as e:
assert e.code == 0
finally:
_sys.argv = prev_argv
15 changes: 15 additions & 0 deletions tests/test_utils_formatters.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
from gui_agents.s3.utils.common_utils import (extract_agent_functions,
parse_code_from_string)


def test_parse_code_from_string_normal():
s = "Intro ```python\nagent.wait(1)\n``` end"
code = parse_code_from_string(s)
assert "agent.wait" in code


def test_extract_agent_functions():
code = "agent.wait(1); agent.click('ok')"
funcs = extract_agent_functions(code)
assert any("agent.wait" in f for f in funcs)
assert any("agent.click" in f for f in funcs)
79 changes: 79 additions & 0 deletions tests/test_worker.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
import io

from PIL import Image

from gui_agents.s3.agents.agent_s import AgentS3
from gui_agents.s3.agents.grounding import OSWorldACI
from gui_agents.s3.core import mllm as mllm_mod


# Monkeypatch LMMAgent used in Worker via module replacement
class FakeLMMAgent:
def __init__(self, engine_params=None, system_prompt=None, engine=None):
self.messages = []
self.system_prompt = system_prompt or "You are a helpful assistant."

def reset(self):
self.messages = [
{
"role": "system",
"content": [{"type": "text", "text": self.system_prompt}],
}
]

def add_system_prompt(self, prompt):
self.system_prompt = prompt

def add_message(self, text_content=None, image_content=None, role=None, **kwargs):
self.messages.append(
{
"role": role or "user",
"content": [{"type": "text", "text": text_content}],
}
)

def get_response(self, *args, **kwargs):
return "<thoughts>thinking</thoughts><answer>```python\nagent.wait(0.5)\n```</answer>"


mllm_mod.LMMAgent = FakeLMMAgent
import gui_agents.s3.agents.code_agent as _code_agent
_code_agent.LMMAgent = FakeLMMAgent
import gui_agents.s3.agents.grounding as _grounding
_grounding.LMMAgent = FakeLMMAgent


def _create_screenshot():
img = Image.new("RGB", (100, 100), color=(73, 109, 137))
buf = io.BytesIO()
img.save(buf, format="PNG")
return buf.getvalue()


def test_worker_generate_next_action():
screenshot = _create_screenshot()
grounding = OSWorldACI(
env=None,
platform="linux",
engine_params_for_generation={"engine_type": "mock"},
engine_params_for_grounding={
"engine_type": "mock",
"grounding_width": 100,
"grounding_height": 100,
},
width=100,
height=100,
)
agent = AgentS3(
worker_engine_params={"engine_type": "mock", "model": "gpt-4o"},
grounding_agent=grounding,
platform="linux",
)

info, actions = agent.predict(
instruction="Wait small", observation={"screenshot": screenshot}
)

assert isinstance(actions, list)
assert len(actions) == 1
assert "time.sleep" in actions[0] or "wait" in actions[0]