Skip to content

Commit 525ee60

Browse files
committed
fix markdown format
1 parent 57b63ab commit 525ee60

File tree

14 files changed

+49
-1
lines changed

14 files changed

+49
-1
lines changed

ajet/context_tracker/timeline_merging/README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
In complex multi-agent LLM interactions, we define a Timeline as the token trajectory generated by repeatedly invoking an LLM during a task execution process.
44

55
A Timeline contains the following elements:
6+
67
- Text message list
78
- Note: In most Qwen models, messages start with `<|im_start|>` and end with `<|im_end|>`, depending on the model's tokenizer and chat_template
89
- Token sequence message list
@@ -48,6 +49,7 @@ T_n\left(M_\text{n}, m_\text{n}, a_\text{n}\right)
4849
\rbrace$
4950

5051
Where:
52+
5153
- $T_i$ represents the $i$-th (unmerged) timeline. $T_i = [T_{i}^{[1]}, T_{i}^{[2]}, \dots, T_{i}^{[|T_{i}|]}]$.
5254
- The last item $T_{i}^{[|T_{i}|]} = m_\text{i}$: always the output of this LLM request.
5355
- The first $|T_{i}|-1$ items: always the input $M_\text{i}$ of this LLM request.
@@ -90,6 +92,7 @@ Note: Loss Mask is calculated in detail during post-processing based on the $\te
9092
In practice, we found that when a token sequence is decoded into text and then re-encoded back into a token sequence by the tokenizer, it sometimes cannot be precisely converted back to the original token sequence.
9193

9294
Therefore, the following situation often occurs in reality:
95+
9396
- $\text{Author}(T_{i}^{[k]}) = \text{llm}$
9497
- $\text{Author}(T_{j}^{[k]}) \neq \text{llm}$
9598
- $\text{Text}(T_{j}^{[k]}) = \text{Text}(T_{i}^{[k]})$

ajet/default_config/trinity/README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ You can configure mappings via the `ajet/default_config/trinity/config_auto_conv
2424
## Trinity Hyperparameter Quick Guide 📊
2525

2626
Trinity adopts a typical producer (explorer)-consumer (trainer) architecture:
27+
2728
- 🏭 **Producer**: Uses VLLM to generate samples
2829
- 🧠 **Consumer**: Consumes samples to update the model
2930
Both operate on different runtime schedules.
@@ -59,5 +60,6 @@ meanwhile
5960
### Training Memory Control 💾
6061

6162
Same as VERL, control training memory with the following parameters:
63+
6264
- `trainer.max_token_len_per_gpu`
6365
- `ulysses_sequence_parallel_size`

docs/en/context_timeline.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
In complex multi-agent LLM interactions, we define a Timeline as the token trajectory generated by repeatedly invoking an LLM during a task execution process.
44

55
A Timeline contains the following elements:
6+
67
- Text message list
78
- Note: In most Qwen models, messages start with `<|im_start|>` and end with `<|im_end|>`, depending on the model's tokenizer and chat_template
89
- Token sequence message list
@@ -48,6 +49,7 @@ T_n\left(M_\text{n}, m_\text{n}, a_\text{n}\right)
4849
\rbrace$
4950

5051
Where:
52+
5153
- $T_i$ represents the $i$-th (unmerged) timeline. $T_i = [T_{i}^{[1]}, T_{i}^{[2]}, \dots, T_{i}^{[|T_{i}|]}]$.
5254
- The last item $T_{i}^{[|T_{i}|]} = m_\text{i}$: always the output of this LLM request.
5355
- The first $|T_{i}|-1$ items: always the input $M_\text{i}$ of this LLM request.
@@ -90,6 +92,7 @@ Note: Loss Mask is calculated in detail during post-processing based on the $\te
9092
In practice, we found that when a token sequence is decoded into text and then re-encoded back into a token sequence by the tokenizer, it sometimes cannot be precisely converted back to the original token sequence.
9193

9294
Therefore, the following situation often occurs in reality:
95+
9396
- $\text{Author}(T_{i}^{[k]}) = \text{llm}$
9497
- $\text{Author}(T_{j}^{[k]}) \neq \text{llm}$
9598
- $\text{Text}(T_{j}^{[k]}) = \text{Text}(T_{i}^{[k]})$

docs/en/example_werewolves.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -137,6 +137,7 @@ If you need a more fine-grained evaluation (e.g., giving partial credit for key
137137
> **Visualization:** Training curves are generated by SwanLab. See [Visualization Tools](./visualization.md) for setup and usage.
138138

139139
As training progresses, win rate increases. This usually means the agent becomes more stable on **two things**:
140+
140141
- **Role-playing consistency**: the agent learns to maintain its werewolf cover under pressure, avoiding self-exposure even when voted out.
141142
- **Social deception skills**: it develops strategies to mislead opponents, sow suspicion among villagers, and implicitly coordinate with teammates.
142143

@@ -153,6 +154,7 @@ Significant role-playing improvement is observed during the experiment.
153154
> **Token-level Visualization:** These detailed logs are generated by Beast-Logger. See [Beast-Logger Usage](./beast_logger.md) for more details.
154155

155156
2. The agent develops multiple strategies for winning. For example:
157+
156158
- **Misleading opponents**: "Let's keep an eye on the seer and the witch. They could be werewolves trying to hide".
157159
- **Appealing to reason**: "We need to be wary of fake seers and watch for inconsistencies in stories, Player-Y as hunter should act carefully".
158160

docs/en/swarm.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ However, the AgentJet Swarm mode has pioneered a brand-new training approach. Co
1313
you can freely launch multiple "mother ships" (corresponding to multiple LLM models to be trained) on one or more servers.
1414
Then, from an "airport" (e.g., your workstation, server, or even your Mac), you can "take off" any number of "Jets" to act as "worker bees" running the Agent workflow awaiting training,
1515
forming a many-to-many training system:
16+
1617
- "Jets" are responsible for reading datasets, running the Agent workflow, and finally sending reward signals back to each "mother ship".
1718
- "Mother ships" are responsible for providing vllm/sglang API interfaces (with AgentJet’s automatic context tracking & timeline merging capabilities that significantly accelerate training), coordinating and computing samples.
1819

@@ -48,6 +49,7 @@ Notes:
4849
## (2/2) Launching Swarm Clients ("jets")
4950

5051
You can run any amount of swarm client:
52+
5153
- on any devices (macbook, workstation, the same machine you run swarm-server, **wherever you want**).
5254
- at any time (before or in the middle of a training, **whenever you want**)
5355

@@ -82,6 +84,7 @@ swarm_worker.auto_sync_train_config_and_start_engine(yaml_job)
8284
```
8385

8486
The swarm server can be in the following states and transition between them as follows:
87+
8588
- **OFFLINE**: The swarm server is started but does not load any models or perform any training. It enters this state directly after startup. Additionally, it transitions to this state upon receiving a `stop_engine` command from (any) client while in any other state.
8689
- **BOOTING**: The swarm server enters this state upon receiving a configuration followed by an explicit `begin_engine` command. In this state, it loads model parameters, initializes FSDP, and initializes vLLM.
8790
- **ROLLING**: The swarm server enters this state automatically after completing **BOOTING** or after finishing the **WEIGHT_SYNCING** state. This represents the sampling phase.

docs/en/swarm_best_practice.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,7 @@ swarm_worker.start_engine()
4949
```
5050

5151
Hints:
52+
5253
- You can `yaml_job.dump_job_as_yaml('./config.yaml')` to take a look at the full configuration.
5354
- You can `yaml_job.build_job_from_yaml('./config.yaml')` to load yaml configuration as override. (there are some configurations that must be edited from yaml).
5455

@@ -93,6 +94,7 @@ def rollout(task) -> float | None:
9394
```
9495

9596
One important thing to note is that before each episode begins, you need to call `begin_episode` to obtain the `base_url` and `api_key`. At the same time, you will receive an episode identifier, `episode_uuid`. The `swarm_worker` is thread-safe and does not hold the state of the `episode`, so you can safely invoke multiple `begin_episode` calls concurrently. When your agent finishes running, remember to call `end_episode` to send the reward signal back to the swarm server (with the `episode_uuid` parameter). Additionally, if you wish to discard an episode for reasons such as:
97+
9698
- **Reward miscalculation**
9799
- **External API out of credit**
98100
- **Debugging**

docs/en/swarm_deepdive.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@ In the following section, we will deep dive into the AgentJet Swarm.
3939

4040
This gif displays the life cycle of a Swarm Server.
4141
The possible states and transitions of the swarm server are as follows:
42+
4243
- **OFFLINE**: The swarm server starts but has not loaded any models and is not running any training. The swarm server enters this state directly after startup. Additionally, it enters this state after receiving a `stop_engine` command from any client while in any other state.
4344
- **BOOTING**: The swarm server enters this state after receiving configuration and then an explicit `begin_engine` command, performing model parameter loading, FSDP initialization, and vLLM initialization.
4445
- **ROLLING**: The swarm server sample collection state. It automatically enters this state when **BOOTING** ends or when the **WEIGHT_SYNCING** state ends.
@@ -112,6 +113,7 @@ swarm_client.start_engine()
112113
```
113114

114115
Practical tips:
116+
115117
- **Treat YAML as the source of truth**: you can inspect it with `yaml_job.dump_job_as_yaml("./config.yaml")` and load overrides via `yaml_job.build_job_from_yaml("./config.yaml")`.
116118
- **Idempotency**: `auto_sync_train_config_and_start_engine()` is designed to be safe if the engine is already **ROLLING** (it will do nothing) and will wait if the engine is **BOOTING / WEIGHT_SYNCING**.
117119
- **Monitoring**: run `ajet-swarm overwatch --swarm-url=http://your-swarm-server:10086` (or `python -m ajet.launcher --swarm-overwatch=...`) to watch the server states and rollout pool.
@@ -155,10 +157,12 @@ def rollout(task: Task) -> float | None:
155157
```
156158

157159
Abort semantics (why it is safe for debugging):
160+
158161
- When the server is **ENGINE.ROLLING**, `abort_episode` typically **reverts** the episode back to the unclaimed pool, so other clients can pick it up.
159162
- When the server is in **ENGINE.ROLLING_POST**, `abort_episode` will **delete** the episode record instead of re-queueing it, so weight syncing won’t be blocked by zombie episodes.
160163

161164
Timeouts you should understand:
165+
162166
- `discard_episode_timeout` (server-side): if an episode is **idle** (no LLM requests) for too long, the server can discard it.
163167
- Client-side protection: the client records an internal max lifetime (currently `max_episode_time = 2 × discard_episode_timeout`). If you submit too late, `end_episode` will be converted into an `abort_episode` to avoid poisoning the pool.
164168

@@ -201,13 +205,15 @@ swarm_client.auto_sync_train_config_and_start_engine(yaml_job)
201205
4) Drive training by repeatedly running batches of episodes
202206

203207
The usual batching relationship is:
208+
204209
- remote `batch_size` is the number of tasks in one policy-gradient batch (server side)
205210
- local `num_repeat` (a.k.a. rollout.n / GRPO N) is the number of rollouts per task
206211
- so one “full” batch roughly needs `batch_size × num_repeat` completed episodes.
207212

208213
The helper `run_episodes_until_all_complete(tasks, func=rollout, auto_retry=True)` is just a convenience thread pool; you can implement your own scheduling.
209214

210215
Operational notes:
216+
211217
- Use `ajet-swarm overwatch --swarm-url=...` to watch **running episodes** and whether the pool is close to triggering **WEIGHT_SYNCING**.
212218
- If you need to change training YAML, call `swarm_client.stop_engine()` first (server returns to **ENGINE.OFFLINE**), then sync again.
213219

@@ -258,10 +264,12 @@ def rollout(task: Task):
258264
```
259265

260266
Key design constraint:
267+
261268
- A “logical” rollout is only valid if you **commit/abort all involved episodes together**.
262269
If one model’s episode is ended but the other is aborted (or hangs), you create asynchronous noise across models.
263270

264271
Batching rule of thumb:
272+
265273
- Keep `num_repeat` aligned across servers.
266274
- It’s simplest when both servers use the same `batch_size` and you drive the outer loop by one of them (as in the best-practice example).
267275

@@ -276,6 +284,7 @@ The one rule for the debug client is exactly what you noted:
276284
**do not contribute data to the training batch**.
277285

278286
The simplest discipline is:
287+
279288
- Debug client still calls `begin_episode()` to obtain valid routing credentials.
280289
- Debug client runs the agent.
281290
- Debug client always ends with `abort_episode(episode_uuid)` (never `end_episode`).
@@ -293,9 +302,11 @@ def debug_once(task: Task):
293302
```
294303

295304
Why this works:
305+
296306
- `abort_episode` returns the claimed episode to the pool (or deletes it in **ROLLING_POST**), so your debugging does not change the reward statistics used for the next weight update.
297307

298308
Practical cautions:
309+
299310
- Keep debug parallelism low. If the debug client claims too many episodes and holds them, training clients may temporarily see “No available episodes to claim”.
300311
- Prefer short `discard_episode_timeout` for debugging so stuck runs get cleaned up fast.
301312
- Keep `ajet-swarm overwatch` open to ensure debug episodes are quickly aborted and not piling up.

docs/en/swarm_intro_blog_en.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,7 @@ on the other hand it supports any number of sampling nodes. -->
5656

5757

5858
Previous Agentic RL training modes had some implicit assumptions:
59+
5960
- First, no matter how many agents are in the task to be trained, these agents can only share the same fine-tunable LLM model (shared "brain").
6061
The reason for this phenomenon is that most training backends represented by VERL and TRL typically configure only one LLM model for fine-tuning.
6162
- Second, in the reinforcement learning sample collection stage, all current training frameworks forcibly bind the agent Rollout task process.
@@ -218,6 +219,7 @@ AgentJet has invested heavily in engineering quality to ensure that every traini
218219

219220
**Version-by-Version Performance Tracking**:
220221
We maintain a public [Performance Tracking Dashboard](https://benchmark.agentjet.top/), continuously recording AgentJet's training curves and final performance on multiple standard tasks (mathematical reasoning, code generation, tool use, etc.), across major Git versions, and across different training backends (VERL, etc.). With every code update, the test bot executes benchmarks, and any performance regression is immediately detected. This means:
222+
221223
- When upgrading AgentJet versions, you can clearly know how the new version performs on the tasks you care about.
222224
- If an update introduces a hidden bug causing a decline in training effectiveness, we will capture it immediately.
223225
- Researchers can confidently cite AgentJet's experimental results because they are reproducible.
@@ -283,6 +285,7 @@ AgentJet is fully open-sourced on GitHub. Researchers and developers in the comm
283285
<!--
284286
285287
swarm server的所有可能状态和转换方式如下:
288+
286289
- **OFFLINE**: swarm server启动,但未加载任何模型,也不运行任何训练。swarm server启动后,直接进入该状态。此外,在任何其他状态下收到来自(任意)client的 `stop_engine`命令后,进入该状态。
287290
- **BOOTING**: swarm server收到配置,然后收到明确的 `begin_engine`命令后,进入该状态,进行模型参数加载、FSDP初始化、vLLM初始化。
288291
- **ROLLING**: swarm server样本采集状态。当**BOOTING**结束后,或者**WEIGHT_SYNCING**状态结束后,自动进入该状态。
@@ -292,6 +295,7 @@ swarm server的所有可能状态和转换方式如下:
292295
293296
294297
唯有一个事情需要注意:每个episode开始前,你需要调用 `begin_episode` 来获取 `base_url` 和 `api_key`,与此同时,获取一个episode标识 `episode_uuid`。`swarm_worker`是线程安全,且不持有`episode`状态的,所以你可以随便同时并发多个`begin_episode`。当你的agent运行结束时,记得调用 `end_episode` 把奖励信号传递到 swarm server (带着`episode_uuid`参数)。此外,如果出于:
298+
295299
- **奖励写错了**
296300
- **外部API欠费**
297301
- **调试**

docs/en/swarm_vibe_coding.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ Here is an example:
66

77
```txt
88
Your task:
9+
910
- Write an intelligent agent that learns the CountDown task (You are an agent specialized in solving countdown number puzzles. Given a target number and a list of source numbers, find a way to reach the target number using basic arithmetic operations (+, -, *, /). Each source number can only be used once.)
1011
- I hope to use the base model '/mnt/data_cpfs/model_cache/modelscope/hub/Qwen/Qwen/Qwen2.5-7B-Instruct'
1112
- Train using 8 GPUs

docs/en/tune_your_first_agent.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -208,7 +208,7 @@ Now, we have obtained all materials required to train the agent.
208208
# ------------------ do not modify ------------------
209209
defaults:
210210
- - trinity_default verl_default
211-
211+
212212
- ajet_default
213213
- _self_
214214

@@ -473,6 +473,7 @@ ajet-swarm overwatch --swarm-url=http://localhost:10086
473473
```
474474

475475
The Swarm Server will:
476+
476477
- Load the model specified by the client
477478
- Provide vLLM API endpoints for inference
478479
- Compute gradients and update model parameters

0 commit comments

Comments
 (0)