Skip to content

Commit 69fde20

Browse files
committed
feat: add AppWorld React agent training scripts and configuration
1 parent b91da74 commit 69fde20

File tree

5 files changed

+620
-10
lines changed

5 files changed

+620
-10
lines changed

ajet/copilot/write-swarm-client/SKILL.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -9,15 +9,15 @@ license: Complete terms in LICENSE.txt
99

1010
Your task is to create a trainable Agent (or Agent Loop, multi-agent system, etc.) based on the requirements, and provide it to the user for reinforcement learning training. Under the AgentJet reinforcement learning framework, this is very simple.
1111

12-
First, give the agent system a name based on the user's requirements, for example `user_math_agent`.
12+
First, give the agent system a name based on the user's requirements, always place your code at ``tutorial/opencode_build_*`, for example `opencode_build_math_agent`.
1313

1414
Next, create the directory:
15-
`tutorial/user_math_agent`
15+
`tutorial/opencode_build_math_agent`
1616

1717
Then, create the Agent source files:
18-
- `tutorial/user_math_agent/agent_roll.py` (Use `tutorial/example_academic_trans_swarm/trans_roll.py` as a template. There aren't many changes — the key is to ask the user for the necessary parameters.)
19-
- `tutorial/user_math_agent/agent_run.py` (Create the function or class to run the agent based on the user's requirements. Synchronous or asynchronous are both fine.)
20-
- `tutorial/user_math_agent/readme.md` (Agent description, along with training and debugging instructions.)
18+
- `tutorial/opencode_build_math_agent/agent_roll.py` (Use `tutorial/example_academic_trans_swarm/trans_roll.py` as a template. There aren't many changes — the key is to ask the user for the necessary parameters.)
19+
- `tutorial/opencode_build_math_agent/agent_run.py` (Create the function or class to run the agent based on the user's requirements. Synchronous or asynchronous are both fine.)
20+
- `tutorial/opencode_build_math_agent/readme.md` (Agent description, along with training and debugging instructions.)
2121

2222
## How to Write the Agent
2323

@@ -54,15 +54,15 @@ Below are some reference materials.
5454

5555
Your task is to create a trainable Agent (or Agent Loop, multi-agent system, etc.) based on the requirements, and provide it to the user for reinforcement learning training. Under the AgentJet reinforcement learning framework, this is very simple.
5656

57-
First, give the agent system a name based on the user's requirements, for example `user_math_agent`.
57+
First, give the agent system a name based on the user's requirements, for example `opencode_build_math_agent`.
5858

5959
Next, create the directory:
60-
`tutorial/user_math_agent`
60+
`tutorial/opencode_build_math_agent`
6161

6262
Then, create the Agent source files:
63-
- `tutorial/user_math_agent/agent_roll.py` (Use `tutorial/example_academic_trans_swarm/trans_roll.py` as a template. There aren't many changes — the key is to ask the user for the necessary parameters.)
64-
- `tutorial/user_math_agent/agent_run.py` (Create the function or class to run the agent based on the user's requirements. Synchronous or asynchronous are both fine.)
65-
- `tutorial/user_math_agent/readme.md` (Agent description, along with training and debugging instructions.)
63+
- `tutorial/opencode_build_math_agent/agent_roll.py` (Use `tutorial/example_academic_trans_swarm/trans_roll.py` as a template. There aren't many changes — the key is to ask the user for the necessary parameters.)
64+
- `tutorial/opencode_build_math_agent/agent_run.py` (Create the function or class to run the agent based on the user's requirements. Synchronous or asynchronous are both fine.)
65+
- `tutorial/opencode_build_math_agent/readme.md` (Agent description, along with training and debugging instructions.)
6666

6767
## How to Write the Agent
6868

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
# Generate an Agent / Agent Loop with AgentJet Swarm and train it
2+
3+
Use prompt below in opencode or claudecode to generate a one-key-to-tune agent (result is in `...`, generated by `claude sonnet 4.5`)
4+
5+
=============================
6+
7+
8+
=============================
9+
10+
你的任务:
11+
- 编写一个学习Appworld的React智能体
12+
- 我希望使用基础模型 '/mnt/data_cpfs/model_cache/modelscope/hub/Qwen/Qwen/Qwen2.5-7B-Instruct'
13+
- 使用 8 GPU 训练
14+
- 使用 Batch Size 32, GRPO N=4
15+
16+
17+
你的skill(首先读取该SKILL文件,获取必要知识):
18+
ajet/copilot/write-swarm-client/SKILL.md
19+
20+
21+
关于Appworld的提示
22+
运行安装并appworld:
23+
"wget https://dail-wlcb.oss-cn-wulanchabu.aliyuncs.com/astuner_archive/appworld_pack_v3.tar.gz",
24+
"tar -xzf ./appworld_pack_v3.tar.gz -C /tmp",
25+
你还需要设置环境变量
26+
os.environ["APPWORLD_PATH"] = "/tmp/pack_all_in_one"
27+
os.environ["APPWORLD_SCRIPT"] = "bash EnvService/env_sandbox/appworld.sh"
28+
在蜂群client中使用 ajet/utils/env_service_client/env_client_ng.py 与 appworld 进行交互
29+
30+
31+
使用原生的 OpenAI SDK
Lines changed: 265 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,265 @@
1+
"""
2+
AppWorld React Agent Training Script
3+
4+
This script sets up the training loop for the AppWorld React agent using AgentJet Swarm.
5+
6+
Usage:
7+
python -m tutorial.opencode_build_appworld_react.agent_roll
8+
"""
9+
10+
import os
11+
import subprocess
12+
from ajet.copilot.job import AgentJetJob
13+
from ajet.tuner_lib.experimental.as_swarm_client import SwarmClient, run_episodes_until_all_complete
14+
from ajet.utils.env_service_client.env_client_ng import EnvClient
15+
from ajet.schema.task import Task
16+
from tutorial.opencode_build_appworld_react.agent_run import run_agent_and_compute_reward
17+
18+
19+
# ==================== Configuration ====================
20+
21+
# Local configurations (client-side)
22+
LOCAL_GRPO_N = 4 # GRPO group size (number of rollouts per task)
23+
LOCAL_NUM_EPOCH = 1000 # Number of training epochs
24+
LOCAL_MAX_PARALLEL = 8 # Maximum parallel episodes
25+
26+
# Remote configurations (server-side)
27+
REMOTE_SWARM_URL = "http://localhost:10086" # Swarm server URL
28+
REMOTE_BATCH_SIZE = 32 # Batch size for training
29+
REMOTE_ALLOCATE_GPU_PER_NODE = 8 # Number of GPUs to use
30+
REMOTE_TRAIN_MODEL = '/mnt/data_cpfs/model_cache/modelscope/hub/Qwen/Qwen/Qwen2.5-7B-Instruct'
31+
32+
# Environment service configuration
33+
ENV_SERVICE_URL = "http://localhost:8080" # Environment service URL
34+
ENV_TYPE = "appworld" # Environment type
35+
36+
# AppWorld setup paths
37+
APPWORLD_PACK_URL = "https://dail-wlcb.oss-cn-wulanchabu.aliyuncs.com/astuner_archive/appworld_pack_v3.tar.gz"
38+
APPWORLD_INSTALL_PATH = "/tmp/pack_all_in_one"
39+
40+
41+
# ==================== Helper Functions ====================
42+
43+
def setup_appworld():
44+
"""
45+
Download and setup AppWorld environment.
46+
This should be run before starting the training.
47+
"""
48+
print("Setting up AppWorld environment...")
49+
50+
# Set environment variables
51+
os.environ["APPWORLD_PATH"] = APPWORLD_INSTALL_PATH
52+
os.environ["APPWORLD_SCRIPT"] = "bash EnvService/env_sandbox/appworld.sh"
53+
54+
# Check if already installed
55+
if os.path.exists(APPWORLD_INSTALL_PATH):
56+
print(f"AppWorld already installed at {APPWORLD_INSTALL_PATH}")
57+
return
58+
59+
# Download and extract AppWorld
60+
print("Downloading AppWorld...")
61+
subprocess.run(
62+
["wget", APPWORLD_PACK_URL, "-O", "/tmp/appworld_pack_v3.tar.gz"],
63+
check=True
64+
)
65+
66+
print("Extracting AppWorld...")
67+
subprocess.run(
68+
["tar", "-xzf", "/tmp/appworld_pack_v3.tar.gz", "-C", "/tmp"],
69+
check=True
70+
)
71+
72+
print("AppWorld setup complete!")
73+
74+
75+
def get_task_list_from_env(env_service_url: str, env_type: str, split: str = "train") -> list[str]:
76+
"""
77+
Get list of available tasks from the environment service.
78+
79+
Args:
80+
env_service_url: URL of the environment service
81+
env_type: Type of environment (e.g., "appworld")
82+
split: Dataset split ("train", "test", etc.)
83+
84+
Returns:
85+
List of task IDs
86+
"""
87+
env_client = EnvClient(base_url=env_service_url)
88+
task_ids = env_client.get_env_profile(env_type=env_type, split=split)
89+
return task_ids
90+
91+
92+
def create_task_from_id(task_id: str, env_type: str) -> Task:
93+
"""
94+
Create a Task object from a task ID.
95+
96+
Args:
97+
task_id: The task identifier
98+
env_type: Type of environment
99+
100+
Returns:
101+
Task object
102+
"""
103+
return Task(
104+
task_id=task_id,
105+
env_type=env_type,
106+
main_query="", # Will be set by environment
107+
init_messages=[],
108+
metadata={"source": "appworld"}
109+
)
110+
111+
112+
# ==================== Main Training Function ====================
113+
114+
def main():
115+
"""
116+
Main training loop for AppWorld React agent.
117+
"""
118+
119+
# Setup AppWorld environment
120+
print("=" * 60)
121+
print("AppWorld React Agent Training")
122+
print("=" * 60)
123+
124+
try:
125+
setup_appworld()
126+
except Exception as e:
127+
print(f"Warning: AppWorld setup failed: {e}")
128+
print("Make sure AppWorld is properly installed before running training.")
129+
130+
# Get task list from environment service
131+
print("\nFetching task list from environment service...")
132+
try:
133+
task_ids = get_task_list_from_env(ENV_SERVICE_URL, ENV_TYPE, split="train")
134+
print(f"Found {len(task_ids)} tasks")
135+
except Exception as e:
136+
print(f"Error: Failed to get task list: {e}")
137+
print("Make sure the environment service is running at {ENV_SERVICE_URL}")
138+
return
139+
140+
if not task_ids:
141+
print("Error: No tasks found. Please check environment service.")
142+
return
143+
144+
# Initialize swarm client
145+
print("\nConnecting to swarm server...")
146+
swarm_worker = SwarmClient(REMOTE_SWARM_URL)
147+
148+
# Configure and start training engine
149+
print("Configuring training engine...")
150+
yaml_job = AgentJetJob(
151+
algorithm="grpo",
152+
project_name="appworld-react-agent",
153+
experiment_name="qwen2.5-7b-appworld",
154+
n_gpu=REMOTE_ALLOCATE_GPU_PER_NODE,
155+
model=REMOTE_TRAIN_MODEL,
156+
batch_size=REMOTE_BATCH_SIZE,
157+
num_repeat=LOCAL_GRPO_N,
158+
)
159+
160+
swarm_worker.auto_sync_train_config_and_start_engine(yaml_job)
161+
print("Training engine started!")
162+
163+
# Define rollout function
164+
def rollout(task: Task) -> float | None:
165+
"""
166+
Execute a single episode rollout.
167+
168+
Args:
169+
task: The task to execute
170+
171+
Returns:
172+
Reward value or None if failed
173+
"""
174+
try:
175+
# Begin episode
176+
episode_uuid, api_baseurl_key = swarm_worker.begin_episode()
177+
178+
# Execute agent
179+
workflow_output = run_agent_and_compute_reward(
180+
task=task,
181+
base_url=api_baseurl_key.base_url,
182+
api_key=api_baseurl_key.api_key,
183+
env_service_url=ENV_SERVICE_URL
184+
)
185+
186+
# Report output back to swarm server
187+
swarm_worker.end_episode(task, episode_uuid, workflow_output)
188+
189+
# Print rollout statistics
190+
swarm_worker.print_rollout_stat()
191+
192+
reward = workflow_output.reward
193+
if isinstance(reward, list):
194+
return reward[0] if reward else 0.0
195+
return reward if reward is not None else 0.0
196+
except Exception as e:
197+
print(f"Episode failed: {e}")
198+
return None
199+
200+
# Training loop
201+
print("\nStarting training loop...")
202+
print(f"Configuration:")
203+
print(f" - GRPO N: {LOCAL_GRPO_N}")
204+
print(f" - Batch Size: {REMOTE_BATCH_SIZE}")
205+
print(f" - Max Epochs: {LOCAL_NUM_EPOCH}")
206+
print(f" - Model: {REMOTE_TRAIN_MODEL}")
207+
print("=" * 60)
208+
209+
next_batch = []
210+
total_episodes = 0
211+
212+
try:
213+
for epoch in range(LOCAL_NUM_EPOCH):
214+
print(f"\nEpoch {epoch + 1}/{LOCAL_NUM_EPOCH}")
215+
216+
# Iterate through tasks
217+
for task_id in task_ids:
218+
# Create task object
219+
task = create_task_from_id(task_id, ENV_TYPE)
220+
221+
# Rollout GRPO_N times for this task
222+
for _ in range(LOCAL_GRPO_N):
223+
next_batch.append(task)
224+
225+
# Execute batch when ready
226+
if len(next_batch) >= (REMOTE_BATCH_SIZE * LOCAL_GRPO_N):
227+
print(f"\nExecuting batch of {len(next_batch)} episodes...")
228+
229+
episode_results = run_episodes_until_all_complete(
230+
next_batch,
231+
func=rollout,
232+
auto_retry=True
233+
)
234+
235+
total_episodes += len(next_batch)
236+
237+
# Print statistics
238+
successful_episodes = sum(1 for r in episode_results if r is not None)
239+
avg_reward = sum(r for r in episode_results if r is not None) / max(successful_episodes, 1)
240+
241+
print(f"Batch complete:")
242+
print(f" - Total episodes: {total_episodes}")
243+
print(f" - Successful: {successful_episodes}/{len(next_batch)}")
244+
print(f" - Average reward: {avg_reward:.4f}")
245+
246+
next_batch.clear()
247+
248+
except KeyboardInterrupt:
249+
print("\n\nTraining interrupted by user")
250+
except Exception as e:
251+
print(f"\n\nTraining failed with error: {e}")
252+
import traceback
253+
traceback.print_exc()
254+
finally:
255+
# Execute remaining episodes if any
256+
if next_batch:
257+
print("\nExecuting remaining episodes...")
258+
run_episodes_until_all_complete(next_batch, func=rollout, auto_retry=True)
259+
260+
print("\nTraining complete!")
261+
print(f"Total episodes executed: {total_episodes}")
262+
263+
264+
if __name__ == "__main__":
265+
main()

0 commit comments

Comments
 (0)