Skip to content

Commit 76d0e3f

Browse files
committed
implement skills and skillbench example
1 parent 39ab72e commit 76d0e3f

File tree

7 files changed

+343
-23
lines changed

7 files changed

+343
-23
lines changed

ajet/context_tracker/multiagent_tracking.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -334,7 +334,9 @@ def detect_tool_call_madness(self, llm_output):
334334
# llm_output["tool_calls"] is not None, and is not []
335335
tool_calls = llm_output["tool_calls"]
336336
if "wrong_toolcall" in self.config.ajet.rollout.compute_madness_checklist:
337-
copy_tool_calls = copy.deepcopy(tool_calls)
337+
# copy_tool_calls = copy.deepcopy(tool_calls)
338+
# Shallow copy is sufficient - we're only reading the data
339+
copy_tool_calls = tool_calls
338340
wrong_toolcall = False
339341
for i in range(len(copy_tool_calls)):
340342
if ("function" in copy_tool_calls[i]) and (
Lines changed: 174 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,174 @@
1+
---
2+
name: train-complex-blackbox
3+
description: Create a trainable agent loop or agent workflow with AgentJet
4+
license: Complete terms in LICENSE.txt
5+
---
6+
7+
8+
## 0. Ask user for API key + model (or API key + base url + model) for debugging
9+
10+
This is not 100% necessary, but it can help a lot in debugging in step 1.
11+
If user has not given a API, ask user to give your one.
12+
13+
14+
By default, the code you write should be located at ./tutorial/opencode_build_xxxxxx/*.py
15+
16+
## 1. Initial Programming
17+
18+
### Writing dataset collector (`get_training_dataset_item_list.py`)
19+
- `get_training_dataset_item_list.py`: Returns a list of training data items. Maybe a list of training tasks, each item is a string identifier of a training task, or a dict containing necessary information for the training task.
20+
21+
### Episode Runner (`run_episode_once.py`)
22+
- `run_episode_once.py`:
23+
24+
- Argument Parser: takes (training data item identifier + api-key + base-url) as input, model-name is not required, you can make up a model name because we ignore it.
25+
26+
- Execute the agent: read the document of the agent user asked you to train, figure out how to execute the agent. In most cases you can use subprocess to start a commandline process to execute the agent, your biggest issue is to figure out how to pass the training data item identifier, api-key and base-url to that commandline process. You can also use python code to execute the agent if you think it's more convenient.
27+
28+
- Reward: extract / compute the reward/score for the agent's output. Some agents have clear reward sigal, but others don't.
29+
- clear reward signal: take that down as the reward, no need to do extra reward engineering.
30+
- no clear reward signal: you need to design a reward function to compute the reward/score for the agent's output. You can use another LLM to help you design the reward function, or you can design it by yourself if you have domain knowledge.
31+
32+
33+
### Test
34+
35+
Remember to test these two parts before moving to step 2, make sure they work as expected.
36+
37+
38+
39+
## 2. Writing training code
40+
41+
This part is easy, simply follow this template and change the necessary part such as dataset path, model name, etc.
42+
43+
`agent_roll.py`
44+
45+
```python
46+
# -*- coding: utf-8 -*-
47+
48+
import os
49+
import re
50+
import requests
51+
from textwrap import dedent
52+
from ajet.schema.task import Task, WorkflowOutput
53+
from ajet.copilot.job import AgentJetJob
54+
from ajet.task_reader import RouterTaskReader
55+
from ajet.utils.thread_executors import PeriodicDrainThreadPoolExecutor
56+
from ajet.tuner_lib.as_oai_baseurl_apikey import OpenaiBaseUrlAndApiKey
57+
from ajet.default_config.ajet_default import AjetTaskReader, HuggingfaceDatRepo
58+
from ajet.tuner_lib.experimental.as_swarm_client import SwarmClient
59+
60+
# python -m tutorial.example_math_swarm.math
61+
62+
GRPO_N = 4 # grpo group size
63+
NUM_EPOCH = 10000
64+
AJET_SWARM_URL = os.getenv("AJET_SWARM_URL", "http://localhost:10086")
65+
REMOTE_MODEL_PATH = os.getenv("REMOTE_MODEL_PATH", "/mnt/data_cpfs/model_cache/modelscope/hub/Qwen/Qwen/Qwen2.5-7B-Instruct")
66+
REMOTE_BATCH_SIZE = 32
67+
REMOTE_ALLOCATE_GPU_PER_NODE = 8
68+
69+
def main():
70+
71+
# Handshake with swarm remote, then send training param to swarm remote (such as model to be trained, algorithm, etc)
72+
dataset = RouterTaskReader(
73+
reader_type = "huggingface_dat_repo",
74+
reader_config = AjetTaskReader(
75+
huggingface_dat_repo = HuggingfaceDatRepo(
76+
dataset_path = '/mnt/data_cpfs/model_cache/modelscope/dataset/openai/gsm8k/main',
77+
# dataset_path = "/root/agentjet/benchmark_datasets/dataset/gsm8k/socratic",
78+
# dataset_path = "openai/gsm8k",
79+
# dataset_name = "main",
80+
)
81+
)
82+
)
83+
# Load the CountDown dataset
84+
# print(f"Loading dataset from: {LOCAL_DATASET_PATH}")
85+
# dataset = RouterTaskReader(
86+
# reader_type="jsonl_dataset_file",
87+
# reader_config=AjetTaskReader(
88+
# jsonl_dataset_file=JsonlDatasetFile(
89+
# training=JsonlTrainingFp(file_path=LOCAL_DATASET_PATH)
90+
# )
91+
# ),
92+
# )
93+
94+
# Hand shake with remote swarm server
95+
swarm_worker = SwarmClient(AJET_SWARM_URL)
96+
ajet_job = AgentJetJob(
97+
experiment_name="math_gsm8k_grpo",
98+
algorithm="grpo",
99+
n_gpu=REMOTE_ALLOCATE_GPU_PER_NODE,
100+
model=REMOTE_MODEL_PATH,
101+
batch_size=REMOTE_BATCH_SIZE,
102+
num_repeat=GRPO_N,
103+
)
104+
print(ajet_job.config.to_dict())
105+
swarm_worker.auto_sync_train_config_and_start_engine(
106+
ajet_job,
107+
force_restart=True,
108+
)
109+
110+
def rollout(task):
111+
# begin episode
112+
episode_uuid, api_baseurl_key = swarm_worker.begin_episode(discard_episode_timeout=60)
113+
# execute agent ( base_url = api_baseurl_key.base_url, api_key = api_baseurl_key.api_key )
114+
workflow_output = execute_agent(task, api_baseurl_key) # reward is in `workflow_output`
115+
# report output back to swarm remote
116+
swarm_worker.end_episode(task, episode_uuid, workflow_output)
117+
return
118+
119+
executor = PeriodicDrainThreadPoolExecutor(workers=GRPO_N * REMOTE_BATCH_SIZE, auto_retry=True)
120+
for _ in range(NUM_EPOCH):
121+
for _, task in enumerate(dataset.generate_training_tasks()):
122+
for _ in range(GRPO_N):
123+
executor.submit_with_periodic_drain(fn=rollout, task=task)
124+
125+
return None
126+
127+
128+
def execute_agent(task: Task, api_baseurl_key: OpenaiBaseUrlAndApiKey):
129+
....
130+
raw_reward: float = ... # compute the reward for the agent's output
131+
return WorkflowOutput(reward=raw_reward, metadata={"important_metadata": important_metadata})
132+
133+
134+
if __name__ == "__main__":
135+
main()
136+
137+
138+
```
139+
140+
141+
It is very clear now, your job in step 2 is to:
142+
143+
- use `get_training_dataset_item_list.py` to generate `List[Task]` (`from ajet.schema.task import Task`)
144+
- use `run_episode_once.py` to execute a single episode and place it in `execute_agent` function
145+
146+
147+
## 3. Simplify your code and fix bugs
148+
149+
before moving to step 4, you can simplify your code and fix bugs to make sure it can run smoothly.
150+
151+
152+
## 4. Training
153+
154+
Finally, you can start training.
155+
156+
Run `ajet-swarm start` to start training server (if the user has already installed agentjet swarm environment),
157+
if the user has docker environment, you can also refer to `docs/en/ajet-swarm-docker.md` to start a AgentSwarm docker container.
158+
159+
Create a duplication of `agent_roll.py` named `agent_roll_one_episode_debug.py`, and modify it to only run one episode, this can help you debug whether the episode runner and reward function work as expected.
160+
161+
After the server side is ready, run
162+
```bash
163+
python /path/to/agent_roll_one_episode_debug.py
164+
```
165+
watch console log to see if the episode can be executed successfully and reward can be computed correctly.
166+
167+
If anything goes wrong, keep server running, rewrite and fix `agent_roll_one_episode_debug.py`, and run it again until it can run one episode successfully.
168+
169+
Next, patch `agent_roll.py` if there are any bugs discorvered via the debugging of `agent_roll_one_episode_debug.py`, and then run
170+
```bash
171+
python /path/to/agent_roll.py
172+
```
173+
174+
to start the training!

ajet/copilot/write-swarm-client/SKILL.md

Lines changed: 65 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -4,25 +4,24 @@ description: Create a trainable agent loop or agent workflow with AgentJet
44
license: Complete terms in LICENSE.txt
55
---
66

7-
## 简介:
87

9-
你的任务是根据要求,创建一个可训练 Agent (或者Agent Loop,多智能体系统等等),提供给用户做强化学习训练。
10-
在AgentJet强化学习框架下,这是非常简单的。
8+
## Introduction:
119

12-
首先,根据用户的要求,给智能体系统起一个名字,例如 user_math_agent
10+
Your task is to create a trainable Agent (or Agent Loop, multi-agent system, etc.) based on the requirements, and provide it to the user for reinforcement learning training. Under the AgentJet reinforcement learning framework, this is very simple.
1311

14-
其次,创建文件:
15-
tutorial/user_math_agent
12+
First, give the agent system a name based on the user's requirements, for example `user_math_agent`.
1613

17-
接下来,创建Agent源文件:
18-
tutorial/user_math_agent/agent_roll.py (以 tutorial/example_academic_trans_swarm/trans_roll.py 为模板,变化不大,关键是向用户索取必要的参数)
19-
tutorial/user_math_agent/agent_run.py (根据用户的要求,创建运行智能体的函数,或者类,都可以。同步异步都可以。)
20-
tutorial/user_math_agent/readme.md (Agent说明,以及训练、调试方法说明)
14+
Next, create the directory:
15+
`tutorial/user_math_agent`
2116

17+
Then, create the Agent source files:
18+
- `tutorial/user_math_agent/agent_roll.py` (Use `tutorial/example_academic_trans_swarm/trans_roll.py` as a template. There aren't many changes — the key is to ask the user for the necessary parameters.)
19+
- `tutorial/user_math_agent/agent_run.py` (Create the function or class to run the agent based on the user's requirements. Synchronous or asynchronous are both fine.)
20+
- `tutorial/user_math_agent/readme.md` (Agent description, along with training and debugging instructions.)
2221

23-
## 智能体编写方法
22+
## How to Write the Agent
2423

25-
使用 OpenAI SDK 编写智能体,主要包含以下三个函数(以及必要的子函数和子模块):
24+
Write the agent using the OpenAI SDK. It mainly includes the following three functions (along with any necessary sub-functions and sub-modules):
2625

2726
```
2827
from ajet.schema.task import Task, WorkflowOutput
@@ -31,24 +30,68 @@ def _compute_reward(...)
3130
3231
def _execute_agent(...)
3332
34-
def run_agent_and_compute_reward(task: Task, base_url:string, api_key:string) -> WorkflowOutput:
33+
def run_agent_and_compute_reward(task: Task, base_url: string, api_key: string) -> WorkflowOutput:
3534
```
3635

37-
agent_roll 中,直接import run_agent_and_compute_reward即可。
36+
In `agent_roll`, simply import `run_agent_and_compute_reward`.
3837

39-
- 智能体的编写要领:通过一个或几个Agent的协作,高效完成用户给定的任务。
40-
- 奖励编写的要领:容易验证的,使用规则直接计算。不容易验证的,模仿 `tutorial/example_academic_trans_swarm/train_multi_model/trans_reward.py` 中的方法,使用其他大型模型生成 LLM as Judge 程序。
38+
- **Key points for writing the agent:** Efficiently complete the user's given task through the collaboration of one or several Agents.
39+
- **Key points for writing the reward:** For things that are easy to verify, calculate directly using rules. For things that are hard to verify, follow the approach in `tutorial/example_academic_trans_swarm/train_multi_model/trans_reward.py` and use other large models to create an LLM-as-Judge program.
4140

41+
## Training and Debugging Instructions
4242

43-
## 训练、调试方法说明
43+
Overall, the user first runs `ajet-swarm start`, then runs `agent_roll.py`, and training begins. You do not need to and are not allowed to run these bash commands.
44+
- First, help the user write `agent_run.py` and `agent_roll.py`.
45+
- Then, write clear instructions to guide the user through training (`readme.md`).
4446

45-
总体而言,就是用户先运行 `ajet-swarm start`, 然后再运行 `agent_roll.py` 训练就开始了。你不需要也不被允许运行这些bash命令。
46-
- 首先帮助用户写好 `agent_run.py``agent_roll.py`
47-
- 然后写清楚引导用户训练的说明(readme.md),
48-
你的任务就完成了。
47+
Your task is then complete.
4948

50-
以下是一些参考资料。
49+
Below are some reference materials.
5150

51+
---
52+
53+
## Introduction:
54+
55+
Your task is to create a trainable Agent (or Agent Loop, multi-agent system, etc.) based on the requirements, and provide it to the user for reinforcement learning training. Under the AgentJet reinforcement learning framework, this is very simple.
56+
57+
First, give the agent system a name based on the user's requirements, for example `user_math_agent`.
58+
59+
Next, create the directory:
60+
`tutorial/user_math_agent`
61+
62+
Then, create the Agent source files:
63+
- `tutorial/user_math_agent/agent_roll.py` (Use `tutorial/example_academic_trans_swarm/trans_roll.py` as a template. There aren't many changes — the key is to ask the user for the necessary parameters.)
64+
- `tutorial/user_math_agent/agent_run.py` (Create the function or class to run the agent based on the user's requirements. Synchronous or asynchronous are both fine.)
65+
- `tutorial/user_math_agent/readme.md` (Agent description, along with training and debugging instructions.)
66+
67+
## How to Write the Agent
68+
69+
Write the agent using the OpenAI SDK. It mainly includes the following three functions (along with any necessary sub-functions and sub-modules):
70+
71+
```
72+
from ajet.schema.task import Task, WorkflowOutput
73+
74+
def _compute_reward(...)
75+
76+
def _execute_agent(...)
77+
78+
def run_agent_and_compute_reward(task: Task, base_url: string, api_key: string) -> WorkflowOutput:
79+
```
80+
81+
In `agent_roll`, simply import `run_agent_and_compute_reward`.
82+
83+
- **Key points for writing the agent:** Efficiently complete the user's given task through the collaboration of one or several Agents.
84+
- **Key points for writing the reward:** For things that are easy to verify, calculate directly using rules. For things that are hard to verify, follow the approach in `tutorial/example_academic_trans_swarm/train_multi_model/trans_reward.py` and use other large models to create an LLM-as-Judge program.
85+
86+
## Training and Debugging Instructions
87+
88+
Overall, the user first runs `ajet-swarm start`, then runs `agent_roll.py`, and training begins. You do not need to and are not allowed to run these bash commands.
89+
- First, help the user write `agent_run.py` and `agent_roll.py`.
90+
- Then, write clear instructions to guide the user through training (`readme.md`).
91+
92+
Your task is then complete.
93+
94+
Below are some reference materials.
5295
---
5396

5497
# Using AgentJet Swarm to Train Your Agents

ajet/schema/extended_msg.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -244,9 +244,11 @@ def get_inc_simple(self, text_frag_from, text_frag_to, tokenizer):
244244
tokenizer_output = tokenizer(text_frag_from, return_tensors="pt", padding=False)
245245
tokenizer_input_ids = tokenizer_output["input_ids"][0].tolist()
246246
token_ids_acc = tokenizer_input_ids
247+
del tokenizer_output # Free memory immediately
247248

248249
tokenizer_output = tokenizer(text_frag_to, return_tensors="pt", padding=False)
249250
input_ids = tokenizer_output["input_ids"][0].tolist()
251+
del tokenizer_output # Free memory immediately
250252
# get the new tokens added in this step
251253
input_id_increment = input_ids[len(token_ids_acc) :]
252254
FN_DEBUG = False

0 commit comments

Comments
 (0)