You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: ajet/context_tracker/timeline_merging/README.md
+3Lines changed: 3 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,6 +3,7 @@
3
3
In complex multi-agent LLM interactions, we define a Timeline as the token trajectory generated by repeatedly invoking an LLM during a task execution process.
4
4
5
5
A Timeline contains the following elements:
6
+
6
7
- Text message list
7
8
- Note: In most Qwen models, messages start with `<|im_start|>` and end with `<|im_end|>`, depending on the model's tokenizer and chat_template
- The last item $T_{i}^{[|T_{i}|]} = m_\text{i}$: always the output of this LLM request.
53
55
- The first $|T_{i}|-1$ items: always the input $M_\text{i}$ of this LLM request.
@@ -90,6 +92,7 @@ Note: Loss Mask is calculated in detail during post-processing based on the $\te
90
92
In practice, we found that when a token sequence is decoded into text and then re-encoded back into a token sequence by the tokenizer, it sometimes cannot be precisely converted back to the original token sequence.
91
93
92
94
Therefore, the following situation often occurs in reality:
Copy file name to clipboardExpand all lines: docs/en/context_timeline.md
+3Lines changed: 3 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,6 +3,7 @@
3
3
In complex multi-agent LLM interactions, we define a Timeline as the token trajectory generated by repeatedly invoking an LLM during a task execution process.
4
4
5
5
A Timeline contains the following elements:
6
+
6
7
- Text message list
7
8
- Note: In most Qwen models, messages start with `<|im_start|>` and end with `<|im_end|>`, depending on the model's tokenizer and chat_template
- The last item $T_{i}^{[|T_{i}|]} = m_\text{i}$: always the output of this LLM request.
53
55
- The first $|T_{i}|-1$ items: always the input $M_\text{i}$ of this LLM request.
@@ -90,6 +92,7 @@ Note: Loss Mask is calculated in detail during post-processing based on the $\te
90
92
In practice, we found that when a token sequence is decoded into text and then re-encoded back into a token sequence by the tokenizer, it sometimes cannot be precisely converted back to the original token sequence.
91
93
92
94
Therefore, the following situation often occurs in reality:
Copy file name to clipboardExpand all lines: docs/en/swarm.md
+3Lines changed: 3 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,6 +13,7 @@ However, the AgentJet Swarm mode has pioneered a brand-new training approach. Co
13
13
you can freely launch multiple "mother ships" (corresponding to multiple LLM models to be trained) on one or more servers.
14
14
Then, from an "airport" (e.g., your workstation, server, or even your Mac), you can "take off" any number of "Jets" to act as "worker bees" running the Agent workflow awaiting training,
15
15
forming a many-to-many training system:
16
+
16
17
- "Jets" are responsible for reading datasets, running the Agent workflow, and finally sending reward signals back to each "mother ship".
17
18
- "Mother ships" are responsible for providing vllm/sglang API interfaces (with AgentJet’s automatic context tracking & timeline merging capabilities that significantly accelerate training), coordinating and computing samples.
18
19
@@ -48,6 +49,7 @@ Notes:
48
49
## (2/2) Launching Swarm Clients ("jets")
49
50
50
51
You can run any amount of swarm client:
52
+
51
53
- on any devices (macbook, workstation, the same machine you run swarm-server, **wherever you want**).
52
54
- at any time (before or in the middle of a training, **whenever you want**)
The swarm server can be in the following states and transition between them as follows:
87
+
85
88
- **OFFLINE**: The swarm server is started but does not load any models or perform any training. It enters this state directly after startup. Additionally, it transitions to this state upon receiving a `stop_engine`command from (any) client whilein any other state.
86
89
- **BOOTING**: The swarm server enters this state upon receiving a configuration followed by an explicit `begin_engine` command. In this state, it loads model parameters, initializes FSDP, and initializes vLLM.
87
90
- **ROLLING**: The swarm server enters this state automatically after completing **BOOTING** or after finishing the **WEIGHT_SYNCING** state. This represents the sampling phase.
Copy file name to clipboardExpand all lines: docs/en/swarm_best_practice.md
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -49,6 +49,7 @@ swarm_worker.start_engine()
49
49
```
50
50
51
51
Hints:
52
+
52
53
- You can `yaml_job.dump_job_as_yaml('./config.yaml')` to take a look at the full configuration.
53
54
- You can `yaml_job.build_job_from_yaml('./config.yaml')` to load yaml configuration as override. (there are some configurations that must be edited from yaml).
One important thing to note is that before each episode begins, you need to call `begin_episode` to obtain the `base_url` and `api_key`. At the same time, you will receive an episode identifier, `episode_uuid`. The `swarm_worker` is thread-safe and does not hold the state of the `episode`, so you can safely invoke multiple `begin_episode` calls concurrently. When your agent finishes running, remember to call `end_episode` to send the reward signal back to the swarm server (with the `episode_uuid` parameter). Additionally, if you wish to discard an episode for reasons such as:
Copy file name to clipboardExpand all lines: docs/en/swarm_deepdive.md
+11Lines changed: 11 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -39,6 +39,7 @@ In the following section, we will deep dive into the AgentJet Swarm.
39
39
40
40
This gif displays the life cycle of a Swarm Server.
41
41
The possible states and transitions of the swarm server are as follows:
42
+
42
43
-**OFFLINE**: The swarm server starts but has not loaded any models and is not running any training. The swarm server enters this state directly after startup. Additionally, it enters this state after receiving a `stop_engine` command from any client while in any other state.
43
44
-**BOOTING**: The swarm server enters this state after receiving configuration and then an explicit `begin_engine` command, performing model parameter loading, FSDP initialization, and vLLM initialization.
44
45
-**ROLLING**: The swarm server sample collection state. It automatically enters this state when **BOOTING** ends or when the **WEIGHT_SYNCING** state ends.
@@ -112,6 +113,7 @@ swarm_client.start_engine()
112
113
```
113
114
114
115
Practical tips:
116
+
115
117
-**Treat YAML as the source of truth**: you can inspect it with `yaml_job.dump_job_as_yaml("./config.yaml")` and load overrides via `yaml_job.build_job_from_yaml("./config.yaml")`.
116
118
-**Idempotency**: `auto_sync_train_config_and_start_engine()` is designed to be safe if the engine is already **ROLLING** (it will do nothing) and will wait if the engine is **BOOTING / WEIGHT_SYNCING**.
117
119
-**Monitoring**: run `ajet-swarm overwatch --swarm-url=http://your-swarm-server:10086` (or `python -m ajet.launcher --swarm-overwatch=...`) to watch the server states and rollout pool.
- When the server is **ENGINE.ROLLING**, `abort_episode` typically **reverts** the episode back to the unclaimed pool, so other clients can pick it up.
159
162
- When the server is in **ENGINE.ROLLING_POST**, `abort_episode` will **delete** the episode record instead of re-queueing it, so weight syncing won’t be blocked by zombie episodes.
160
163
161
164
Timeouts you should understand:
165
+
162
166
-`discard_episode_timeout` (server-side): if an episode is **idle** (no LLM requests) for too long, the server can discard it.
163
167
- Client-side protection: the client records an internal max lifetime (currently `max_episode_time = 2 × discard_episode_timeout`). If you submit too late, `end_episode` will be converted into an `abort_episode` to avoid poisoning the pool.
4) Drive training by repeatedly running batches of episodes
202
206
203
207
The usual batching relationship is:
208
+
204
209
- remote `batch_size` is the number of tasks in one policy-gradient batch (server side)
205
210
- local `num_repeat` (a.k.a. rollout.n / GRPO N) is the number of rollouts per task
206
211
- so one “full” batch roughly needs `batch_size × num_repeat` completed episodes.
207
212
208
213
The helper `run_episodes_until_all_complete(tasks, func=rollout, auto_retry=True)` is just a convenience thread pool; you can implement your own scheduling.
209
214
210
215
Operational notes:
216
+
211
217
- Use `ajet-swarm overwatch --swarm-url=...` to watch **running episodes** and whether the pool is close to triggering **WEIGHT_SYNCING**.
212
218
- If you need to change training YAML, call `swarm_client.stop_engine()` first (server returns to **ENGINE.OFFLINE**), then sync again.
213
219
@@ -258,10 +264,12 @@ def rollout(task: Task):
258
264
```
259
265
260
266
Key design constraint:
267
+
261
268
- A “logical” rollout is only valid if you **commit/abort all involved episodes together**.
262
269
If one model’s episode is ended but the other is aborted (or hangs), you create asynchronous noise across models.
263
270
264
271
Batching rule of thumb:
272
+
265
273
- Keep `num_repeat` aligned across servers.
266
274
- It’s simplest when both servers use the same `batch_size` and you drive the outer loop by one of them (as in the best-practice example).
267
275
@@ -276,6 +284,7 @@ The one rule for the debug client is exactly what you noted:
276
284
**do not contribute data to the training batch**.
277
285
278
286
The simplest discipline is:
287
+
279
288
- Debug client still calls `begin_episode()` to obtain valid routing credentials.
280
289
- Debug client runs the agent.
281
290
- Debug client always ends with `abort_episode(episode_uuid)` (never `end_episode`).
@@ -293,9 +302,11 @@ def debug_once(task: Task):
293
302
```
294
303
295
304
Why this works:
305
+
296
306
-`abort_episode` returns the claimed episode to the pool (or deletes it in **ROLLING_POST**), so your debugging does not change the reward statistics used for the next weight update.
297
307
298
308
Practical cautions:
309
+
299
310
- Keep debug parallelism low. If the debug client claims too many episodes and holds them, training clients may temporarily see “No available episodes to claim”.
300
311
- Prefer short `discard_episode_timeout` for debugging so stuck runs get cleaned up fast.
301
312
- Keep `ajet-swarm overwatch` open to ensure debug episodes are quickly aborted and not piling up.
Copy file name to clipboardExpand all lines: docs/en/swarm_intro_blog_en.md
+4Lines changed: 4 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -56,6 +56,7 @@ on the other hand it supports any number of sampling nodes. -->
56
56
57
57
58
58
Previous Agentic RL training modes had some implicit assumptions:
59
+
59
60
- First, no matter how many agents are in the task to be trained, these agents can only share the same fine-tunable LLM model (shared "brain").
60
61
The reason for this phenomenon is that most training backends represented by VERL and TRL typically configure only one LLM model for fine-tuning.
61
62
- Second, in the reinforcement learning sample collection stage, all current training frameworks forcibly bind the agent Rollout task process.
@@ -218,6 +219,7 @@ AgentJet has invested heavily in engineering quality to ensure that every traini
218
219
219
220
**Version-by-Version Performance Tracking**:
220
221
We maintain a public [Performance Tracking Dashboard](https://benchmark.agentjet.top/), continuously recording AgentJet's training curves and final performance on multiple standard tasks (mathematical reasoning, code generation, tool use, etc.), across major Git versions, and across different training backends (VERL, etc.). With every code update, the test bot executes benchmarks, and any performance regression is immediately detected. This means:
222
+
221
223
- When upgrading AgentJet versions, you can clearly know how the new version performs on the tasks you care about.
222
224
- If an update introduces a hidden bug causing a decline in training effectiveness, we will capture it immediately.
223
225
- Researchers can confidently cite AgentJet's experimental results because they are reproducible.
@@ -283,6 +285,7 @@ AgentJet is fully open-sourced on GitHub. Researchers and developers in the comm
Copy file name to clipboardExpand all lines: docs/en/swarm_vibe_coding.md
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,6 +6,7 @@ Here is an example:
6
6
7
7
```txt
8
8
Your task:
9
+
9
10
- Write an intelligent agent that learns the CountDown task (You are an agent specialized in solving countdown number puzzles. Given a target number and a list of source numbers, find a way to reach the target number using basic arithmetic operations (+, -, *, /). Each source number can only be used once.)
10
11
- I hope to use the base model '/mnt/data_cpfs/model_cache/modelscope/hub/Qwen/Qwen/Qwen2.5-7B-Instruct'
0 commit comments