|
| 1 | +--- |
| 2 | +name: train-complex-blackbox |
| 3 | +description: Create a trainable agent loop or agent workflow with AgentJet |
| 4 | +license: Complete terms in LICENSE.txt |
| 5 | +--- |
| 6 | + |
| 7 | + |
| 8 | +## 0. Ask user for API key + model (or API key + base url + model) for debugging |
| 9 | + |
| 10 | +This is not 100% necessary, but it can help a lot in debugging in step 1. |
| 11 | +If user has not given a API, ask user to give your one. |
| 12 | + |
| 13 | + |
| 14 | +By default, the code you write should be located at ./tutorial/opencode_build_xxxxxx/*.py |
| 15 | + |
| 16 | +## 1. Initial Programming |
| 17 | + |
| 18 | +### Writing dataset collector (`get_training_dataset_item_list.py`) |
| 19 | +- `get_training_dataset_item_list.py`: Returns a list of training data items. Maybe a list of training tasks, each item is a string identifier of a training task, or a dict containing necessary information for the training task. |
| 20 | + |
| 21 | +### Episode Runner (`run_episode_once.py`) |
| 22 | +- `run_episode_once.py`: |
| 23 | + |
| 24 | + - Argument Parser: takes (training data item identifier + api-key + base-url) as input, model-name is not required, you can make up a model name because we ignore it. |
| 25 | + |
| 26 | + - Execute the agent: read the document of the agent user asked you to train, figure out how to execute the agent. In most cases you can use subprocess to start a commandline process to execute the agent, your biggest issue is to figure out how to pass the training data item identifier, api-key and base-url to that commandline process. You can also use python code to execute the agent if you think it's more convenient. |
| 27 | + |
| 28 | + - Reward: extract / compute the reward/score for the agent's output. Some agents have clear reward sigal, but others don't. |
| 29 | + - clear reward signal: take that down as the reward, no need to do extra reward engineering. |
| 30 | + - no clear reward signal: you need to design a reward function to compute the reward/score for the agent's output. You can use another LLM to help you design the reward function, or you can design it by yourself if you have domain knowledge. |
| 31 | + |
| 32 | + |
| 33 | +### Test |
| 34 | + |
| 35 | +Remember to test these two parts before moving to step 2, make sure they work as expected. |
| 36 | + |
| 37 | + |
| 38 | + |
| 39 | +## 2. Writing training code |
| 40 | + |
| 41 | +This part is easy, simply follow this template and change the necessary part such as dataset path, model name, etc. |
| 42 | + |
| 43 | +`agent_roll.py` |
| 44 | + |
| 45 | +```python |
| 46 | +# -*- coding: utf-8 -*- |
| 47 | + |
| 48 | +import os |
| 49 | +import re |
| 50 | +import requests |
| 51 | +from textwrap import dedent |
| 52 | +from ajet.schema.task import Task, WorkflowOutput |
| 53 | +from ajet.copilot.job import AgentJetJob |
| 54 | +from ajet.task_reader import RouterTaskReader |
| 55 | +from ajet.utils.thread_executors import PeriodicDrainThreadPoolExecutor |
| 56 | +from ajet.tuner_lib.as_oai_baseurl_apikey import OpenaiBaseUrlAndApiKey |
| 57 | +from ajet.default_config.ajet_default import AjetTaskReader, HuggingfaceDatRepo |
| 58 | +from ajet.tuner_lib.experimental.as_swarm_client import SwarmClient |
| 59 | + |
| 60 | +# python -m tutorial.example_math_swarm.math |
| 61 | + |
| 62 | +GRPO_N = 4 # grpo group size |
| 63 | +NUM_EPOCH = 10000 |
| 64 | +AJET_SWARM_URL = os.getenv("AJET_SWARM_URL", "http://localhost:10086") |
| 65 | +REMOTE_MODEL_PATH = os.getenv("REMOTE_MODEL_PATH", "/mnt/data_cpfs/model_cache/modelscope/hub/Qwen/Qwen/Qwen2.5-7B-Instruct") |
| 66 | +REMOTE_BATCH_SIZE = 32 |
| 67 | +REMOTE_ALLOCATE_GPU_PER_NODE = 8 |
| 68 | + |
| 69 | +def main(): |
| 70 | + |
| 71 | + # Handshake with swarm remote, then send training param to swarm remote (such as model to be trained, algorithm, etc) |
| 72 | + dataset = RouterTaskReader( |
| 73 | + reader_type = "huggingface_dat_repo", |
| 74 | + reader_config = AjetTaskReader( |
| 75 | + huggingface_dat_repo = HuggingfaceDatRepo( |
| 76 | + dataset_path = '/mnt/data_cpfs/model_cache/modelscope/dataset/openai/gsm8k/main', |
| 77 | + # dataset_path = "/root/agentjet/benchmark_datasets/dataset/gsm8k/socratic", |
| 78 | + # dataset_path = "openai/gsm8k", |
| 79 | + # dataset_name = "main", |
| 80 | + ) |
| 81 | + ) |
| 82 | + ) |
| 83 | + # Load the CountDown dataset |
| 84 | + # print(f"Loading dataset from: {LOCAL_DATASET_PATH}") |
| 85 | + # dataset = RouterTaskReader( |
| 86 | + # reader_type="jsonl_dataset_file", |
| 87 | + # reader_config=AjetTaskReader( |
| 88 | + # jsonl_dataset_file=JsonlDatasetFile( |
| 89 | + # training=JsonlTrainingFp(file_path=LOCAL_DATASET_PATH) |
| 90 | + # ) |
| 91 | + # ), |
| 92 | + # ) |
| 93 | + |
| 94 | + # Hand shake with remote swarm server |
| 95 | + swarm_worker = SwarmClient(AJET_SWARM_URL) |
| 96 | + ajet_job = AgentJetJob( |
| 97 | + experiment_name="math_gsm8k_grpo", |
| 98 | + algorithm="grpo", |
| 99 | + n_gpu=REMOTE_ALLOCATE_GPU_PER_NODE, |
| 100 | + model=REMOTE_MODEL_PATH, |
| 101 | + batch_size=REMOTE_BATCH_SIZE, |
| 102 | + num_repeat=GRPO_N, |
| 103 | + ) |
| 104 | + print(ajet_job.config.to_dict()) |
| 105 | + swarm_worker.auto_sync_train_config_and_start_engine( |
| 106 | + ajet_job, |
| 107 | + force_restart=True, |
| 108 | + ) |
| 109 | + |
| 110 | + def rollout(task): |
| 111 | + # begin episode |
| 112 | + episode_uuid, api_baseurl_key = swarm_worker.begin_episode(discard_episode_timeout=60) |
| 113 | + # execute agent ( base_url = api_baseurl_key.base_url, api_key = api_baseurl_key.api_key ) |
| 114 | + workflow_output = execute_agent(task, api_baseurl_key) # reward is in `workflow_output` |
| 115 | + # report output back to swarm remote |
| 116 | + swarm_worker.end_episode(task, episode_uuid, workflow_output) |
| 117 | + return |
| 118 | + |
| 119 | + executor = PeriodicDrainThreadPoolExecutor(workers=GRPO_N * REMOTE_BATCH_SIZE, auto_retry=True) |
| 120 | + for _ in range(NUM_EPOCH): |
| 121 | + for _, task in enumerate(dataset.generate_training_tasks()): |
| 122 | + for _ in range(GRPO_N): |
| 123 | + executor.submit_with_periodic_drain(fn=rollout, task=task) |
| 124 | + |
| 125 | + return None |
| 126 | + |
| 127 | + |
| 128 | +def execute_agent(task: Task, api_baseurl_key: OpenaiBaseUrlAndApiKey): |
| 129 | + .... |
| 130 | + raw_reward: float = ... # compute the reward for the agent's output |
| 131 | + return WorkflowOutput(reward=raw_reward, metadata={"important_metadata": important_metadata}) |
| 132 | + |
| 133 | + |
| 134 | +if __name__ == "__main__": |
| 135 | + main() |
| 136 | + |
| 137 | + |
| 138 | +``` |
| 139 | + |
| 140 | + |
| 141 | +It is very clear now, your job in step 2 is to: |
| 142 | + |
| 143 | +- use `get_training_dataset_item_list.py` to generate `List[Task]` (`from ajet.schema.task import Task`) |
| 144 | +- use `run_episode_once.py` to execute a single episode and place it in `execute_agent` function |
| 145 | + |
| 146 | + |
| 147 | +## 3. Simplify your code and fix bugs |
| 148 | + |
| 149 | +before moving to step 4, you can simplify your code and fix bugs to make sure it can run smoothly. |
| 150 | + |
| 151 | + |
| 152 | +## 4. Training |
| 153 | + |
| 154 | +Finally, you can start training. |
| 155 | + |
| 156 | +Run `ajet-swarm start` to start training server (if the user has already installed agentjet swarm environment), |
| 157 | +if the user has docker environment, you can also refer to `docs/en/ajet-swarm-docker.md` to start a AgentSwarm docker container. |
| 158 | + |
| 159 | +Create a duplication of `agent_roll.py` named `agent_roll_one_episode_debug.py`, and modify it to only run one episode, this can help you debug whether the episode runner and reward function work as expected. |
| 160 | + |
| 161 | +After the server side is ready, run |
| 162 | +```bash |
| 163 | +python /path/to/agent_roll_one_episode_debug.py |
| 164 | +``` |
| 165 | +watch console log to see if the episode can be executed successfully and reward can be computed correctly. |
| 166 | + |
| 167 | +If anything goes wrong, keep server running, rewrite and fix `agent_roll_one_episode_debug.py`, and run it again until it can run one episode successfully. |
| 168 | + |
| 169 | +Next, patch `agent_roll.py` if there are any bugs discorvered via the debugging of `agent_roll_one_episode_debug.py`, and then run |
| 170 | +```bash |
| 171 | +python /path/to/agent_roll.py |
| 172 | +``` |
| 173 | + |
| 174 | +to start the training! |
0 commit comments