Skip to content

Commit 6d4849a

Browse files
committed
update docs
1 parent 8af0644 commit 6d4849a

4 files changed

Lines changed: 115 additions & 94 deletions

File tree

README.md

Lines changed: 25 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
# AgentJet
22

33
[![Benchmarking](https://img.shields.io/badge/Benchmarking-0078D4?style=for-the-badge&logo=github)](https://benchmark.agent-matrix.com/)
4-
[![Docs](https://img.shields.io/badge/Docs-Read%20the%20Guide-0A7ECC?style=for-the-badge&logo=readthedocs&logoColor=white)](docs/en/installation.md)
4+
[![Docs](https://img.shields.io/badge/Docs-Read%20the%20Guide-0A7ECC?style=for-the-badge&logo=readthedocs&logoColor=white)](https://doc.agentjet.top/AgentJet)
55
[![License](https://img.shields.io/badge/License-Apache--2.0-4c1?style=for-the-badge)](LICENSE)
6-
[![Python](https://img.shields.io/badge/Python-3.10+-3776AB?style=for-the-badge&logo=python&logoColor=white)](docs/en/installation.md#requirements)
6+
[![Python](https://img.shields.io/badge/Python-3.10+-3776AB?style=for-the-badge&logo=python&logoColor=white)](https://doc.agentjet.top/AgentJet/en/installation#requirements)
77

88
<div align="center">
99
<img width="500" alt="AgentJet" src="docs/agentjet.jpg"/>
@@ -24,6 +24,9 @@ Let's begin with the simplest example: a math agent with a tool call.
2424
- Then, tune your first model using the minimum example.
2525
```python
2626
ajet --conf tutorial/example_math_agent/math_agent.yaml --backbone='verl'
27+
28+
# change to --backbone='trinity' if you want to switch to trinity training engine;
29+
# or --backbone='debug' if you want to debug with only vLLM
2730
```
2831

2932

@@ -34,7 +37,7 @@ We aim to build a easy-to-learn Agent tuner that unlock more possibilities for a
3437
- **Easy and Friendly**. AgentJet helps you tune models behind your agent workflows easily, optimizing your agents for top performance with minimal effort.
3538
- **Rich Tutorial Library**. AgentJet provides a rich library of [examples](https://github.com/modelscope/AgentJet/tree/main/tutorial) as tutorials.
3639
- **Efficient and Scalable**. AgentJet uses [verl] as the default backbone (`--backbone=verl`). However, we also support [trinity](https://github.com/modelscope/Trinity-RFT/) as alternative backbone, accelerating your tuning process via fully asynchronous RFT.
37-
- **Flexible and Fast**. AgentJet supports [multi-agent workflows](docs/en/workflow.md) and adopts a context merging technique, accelerating training by 1.5x to 10x when the workflow involves multi-turn (or multi-agent) conversations.
40+
- **Flexible and Fast**. AgentJet supports [multi-agent workflows](https://doc.agentjet.top/AgentJet/en/workflow.md) and adopts a context merging technique, accelerating training by 1.5x to 10x when the workflow involves multi-turn (or multi-agent) conversations.
3841
- **Reliability and Reproducibility**. Our team keeps track of framework performance across multiple [tasks + major-git-version + training-backbones](https://benchmark.agent-matrix.com/) (under construction, still gathering data, comming soon).
3942

4043
For advanced researchers, AgentJet also provides high-resolution logging and debugging solutions:
@@ -49,42 +52,26 @@ For advanced researchers, AgentJet also provides high-resolution logging and deb
4952

5053
#### Installation
5154

52-
We recommend using `uv` for dependency management.
53-
54-
1. **Clone the Repository**:
55-
```bash
56-
git clone https://github.com/modelscope/AgentJet.git
57-
cd AgentJet
58-
```
59-
60-
61-
2. **Set up Environment**:
62-
```bash
63-
uv venv --python=3.10.16 && source .venv/bin/activate
64-
uv pip install -e .[trinity]
65-
# Note: flash-attn must be installed after other dependencies
66-
uv pip install flash_attn==2.8.1 --no-build-isolation --no-cache-dir
67-
```
68-
55+
- **Click here to read the** [**installation guide**](https://doc.agentjet.top/AgentJet/en/installation/).
6956

7057
#### Run Training
7158

72-
You can start training your first agent with a single command using a pre-configured YAML file. Take the [Math agent](docs/en/example_math_agent.md) as an example:
59+
- You can start training your first agent with a single command using a pre-configured YAML file. Take the [Math agent](https://doc.agentjet.top/AgentJet/en/example_math_agent/) as an example:
7360

74-
```bash
75-
ajet --conf tutorial/example_math_agent/math_agent.yaml
76-
```
61+
```bash
62+
ajet --conf tutorial/example_math_agent/math_agent.yaml
63+
```
7764

7865
#### Example Library
7966

8067
Explore our rich library of examples to kickstart your journey:
8168

82-
- 🔢 [**Training a math agent that can write python code**](docs/en/example_math_agent.md).
83-
- 📱 [**Creating an AppWorld agent using AgentScope and training it**](docs/en/example_app_world.md).
84-
- 🐺 [**Developing Werewolves RPG agents and training them**](docs/en/example_werewolves.md).
85-
- 👩🏻‍⚕️ [**Learning to ask questions like a doctor**](docs/en/example_learning_to_ask.md).
86-
- 🎴 [**Writing a countdown game using AgentScope and solving it**](docs/en/example_countdown.md).
87-
- 🚶 [**Solving a frozen lake walking puzzle using AgentJet**](docs/en/example_frozenlake.md).
69+
- 🔢 [**Training a math agent that can write python code**](https://doc.agentjet.top/AgentJet/en/example_math_agent).
70+
- 📱 [**Creating an AppWorld agent using AgentScope and training it**](https://doc.agentjet.top/AgentJet/en/example_app_world).
71+
- 🐺 [**Developing Werewolves RPG agents and training them**](https://doc.agentjet.top/AgentJet/en/example_werewolves).
72+
- 👩🏻‍⚕️ [**Learning to ask questions like a doctor**](https://doc.agentjet.top/AgentJet/en/example_learning_to_ask).
73+
- 🎴 [**Writing a countdown game using AgentScope and solving it**](https://doc.agentjet.top/AgentJet/en/example_countdown).
74+
- 🚶 [**Solving a frozen lake walking puzzle using AgentJet**](https://doc.agentjet.top/AgentJet/en/example_frozenlake).
8875

8976

9077
---
@@ -102,9 +89,9 @@ AgentJet makes agent fine-tuning straightforward by separating the developer int
10289

10390
To optimize an agent, you provide three core inputs:
10491

105-
* [**Trainable Workflow**](docs/en/workflow.md): Define your agent logic by inheriting the Workflow class, supporting both simple agent setups and advanced multi-agent collaborations.
106-
* [**Task Reader**](docs/en/data_pipeline.md): Load training tasks from JSONL files, HuggingFace datasets, interactive environments, or auto-generate them from documents.
107-
* [**Task Judger**](docs/en/task_judger.md): Evaluates agent outputs and assigns rewards to guide training.
92+
* [**Trainable Workflow**](https://doc.agentjet.top/AgentJet/en/workflow): Define your agent logic by inheriting the Workflow class, supporting both simple agent setups and advanced multi-agent collaborations.
93+
* [**Task Reader**](https://doc.agentjet.top/AgentJet/en/data_pipeline): Load training tasks from JSONL files, HuggingFace datasets, interactive environments, or auto-generate them from documents.
94+
* [**Task Judger**](https://doc.agentjet.top/AgentJet/en/task_judger): Evaluates agent outputs and assigns rewards to guide training.
10895

10996
#### 2. Internal System Architecture
11097

@@ -118,14 +105,14 @@ The internal system orchestrates several specialized modules to handle the compl
118105
* **Context Tracker**: Monitors LLM calls and automatically merges shared-history timelines to improve training efficiency by **1.5x to 10x**.
119106

120107

121-
---
108+
122109

123110
### 🚦 Navigation
124111

125-
* 📖 **Tutorials**: From [Installation](docs/en/installation.md) to [Tuning your first agent](docs/en/tutorial.md) — the essential path for beginners.
126-
* 🛠️ **Core Components**: Define your [Trainable Workflow](docs/en/workflow.md) and manage [Data](docs/en/data_pipeline.md) and [Reward](docs/en/tune_your_first_agent.md).
127-
* 💡 **Example**: Check the [Example Library](#example-library) above for real-world cases like [Math](docs/en/example_math_agent.md), [Werewolves game](docs/en/example_werewolves.md) and [Learning to ask task](docs/en/example_learning_to_ask.md).
128-
* ⚙️ **Deep Dive**: Master advanced [Configuration](docs/en/configuration.md).
112+
* 📖 **Tutorials**: From [Installation](https://doc.agentjet.top/AgentJet/en/installation) to [Tuning your first agent](https://doc.agentjet.top/AgentJet/en/tune_your_first_agent) — the essential path for beginners.
113+
* 🛠️ **Core Components**: Define your [Trainable Workflow](https://doc.agentjet.top/AgentJet/en/workflow) and manage [Data](https://doc.agentjet.top/AgentJet/en/data_pipeline) and [Reward](https://doc.agentjet.top/AgentJet/en/task_judger).
114+
* 💡 **Example**: Check the [Example Library](#example-library) above for real-world cases like [Math](https://doc.agentjet.top/AgentJet/en/example_math_agent), [Werewolves game](https://doc.agentjet.top/AgentJet/en/example_werewolves) and [Learning to ask task](https://doc.agentjet.top/AgentJet/en/example_learning_to_ask).
115+
* ⚙️ **Deep Dive**: Master advanced [Configuration](https://doc.agentjet.top/AgentJet/en/configuration).
129116

130117
## 🗺️ Roadmap
131118

docs/en/example_math_agent.md

Lines changed: 67 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -2,24 +2,28 @@
22

33
Train a **tool-using Math Agent** (ReAct + Python executor) to solve GSM8K-style math problems. Rewards come from a **judge** that checks final-answer correctness.
44

5-
---
65

76
## Overview
87

9-
<div class="callout-tip">
10-
<p>
118
In <strong>Math Agent</strong>, each training sample is a math word problem (e.g., GSM8K). The agent learns to reason step by step (ReAct-style), call a Python tool when computation is needed, and produce a final answer that matches the reference.
12-
</p>
13-
</div>
149

15-
This tutorial is organized in two steps:
1610

17-
1. **Run it**: Download the dataset and start training with the default YAML config
18-
2. **Understand & customize**: Read the workflow and the judge/reward logic
11+
This tutorial is organized into the following sections:
12+
13+
- [**Run this tutorial**: Download the dataset and start training with the default YAML config.](#quick-start)
14+
- [**Understand & customize**: Read the workflow and the judge/reward logic.](#explain)
15+
- [**Training Curve**: Compare the training curlve.](#culve)
16+
17+
18+
19+
20+
21+
22+
1923

20-
---
2124

22-
## Quick Start
25+
26+
## Quick Start {#quick-start}
2327

2428
### Prepare Dataset
2529

@@ -71,11 +75,21 @@ ajet --conf tutorial/example_math_agent/math_agent.yaml --backbone='verl'
7175
}
7276
```
7377

74-
---
7578

76-
## Understanding the Training Pipeline
7779

78-
### What Happens Each Step
80+
81+
82+
83+
84+
85+
86+
87+
88+
89+
90+
## Understanding the Training Pipeline {#explain}
91+
92+
### Pipeline Abstraction
7993

8094
<div class="workflow-single">
8195
<div class="workflow-header">Training Step Flow</div>
@@ -85,44 +99,19 @@ ajet --conf tutorial/example_math_agent/math_agent.yaml --backbone='verl'
8599
<li><strong>Load one problem</strong>
86100

87101
Load a math problem from the dataset via `task_reader`.</li>
88-
<li><strong>Run the AgentScope workflow</strong>
102+
<li><strong>Run the Workflow</strong>
89103

90-
Build the prompt, let the ReAct agent call Python tools, and extract the final answer.</li>
91-
<li><strong>Register info for evaluation</strong>
104+
Build the prompt, let the ReActAgent call Python tools, and extract the final answer.</li>
105+
<li><strong>Return result as `WorkflowOutput`</strong>
92106

93-
Return `WorkflowOutput(reward=None, metadata={"final_answer": final_answer})`.</li>
107+
Return `WorkflowOutput(reward=None, metadata={"final_answer": final_answer})`. (reward=None because we want to compute reward outside the workflow)</li>
94108
<li><strong>Run the judge</strong>
95109

96110
Compare `final_answer` with reference, compute `raw_reward` and `is_success`.</li>
97111
</ol>
98112
</div>
99113
</div>
100114

101-
### YAML Configuration
102-
103-
Most wiring happens in `tutorial/example_math_agent/math_agent.yaml`:
104-
105-
```yaml title="math_agent.yaml"
106-
ajet:
107-
task_reader:
108-
type: huggingface_dat_repo # also supports: dataset_file / env_service
109-
110-
rollout:
111-
user_workflow: tutorial.example_math_agent.math_agent->ExampleMathLearn
112-
113-
task_judge:
114-
judge_protocol: tutorial.example_math_agent.math_answer_as_judge->MathAnswerAndLlmAsJudge
115-
116-
model:
117-
path: YOUR_MODEL_PATH
118-
```
119-
120-
| Field | Description |
121-
|-------|-------------|
122-
| `task_reader` | Where tasks come from |
123-
| `user_workflow` | Which workflow runs per sample |
124-
| `judge_protocol` | Which judge computes rewards |
125-
| `model.path` | Pretrained model to fine-tune |
126115

127116
### Code Walkthrough
128117

@@ -150,7 +139,11 @@ return WorkflowOutput(reward=None, metadata={"final_answer": final_answer})
150139
```
151140

152141
!!! warning "Important"
153-
Always provide the final answer via `WorkflowOutput.metadata` so the judge can score it.
142+
- User should put all elements necessary for reward computation in `WorkflowOutput.metadata`,
143+
so the judge can use them.
144+
- In this specific case, `final_answer` is that key element.
145+
146+
154147

155148
### Reward Computation
156149

@@ -168,9 +161,38 @@ The judge receives:
168161
- Behavior penalty (tool called but no `print`)
169162
- Keep answer correctness as the primary signal
170163

171-
---
172164

173-
## Results
165+
### YAML Configuration
166+
167+
Most wiring happens in `tutorial/example_math_agent/math_agent.yaml`:
168+
169+
```yaml title="math_agent.yaml"
170+
ajet:
171+
task_reader:
172+
type: huggingface_dat_repo # also supports: dataset_file / env_service
173+
174+
rollout:
175+
user_workflow: tutorial.example_math_agent.math_agent->ExampleMathLearn
176+
177+
task_judge:
178+
judge_protocol: tutorial.example_math_agent.math_answer_as_judge->MathAnswerAndLlmAsJudge
179+
180+
model:
181+
path: YOUR_MODEL_PATH
182+
```
183+
184+
| Field | Description |
185+
|-------|-------------|
186+
| `task_reader` | Where tasks come from |
187+
| `user_workflow` | Which workflow runs per sample |
188+
| `judge_protocol` | Which judge computes rewards |
189+
| `model.path` | Pretrained model to fine-tune |
190+
191+
192+
193+
194+
195+
## Results {#culve}
174196

175197
### Training Curve
176198

docs/en/task_judger.md

Lines changed: 12 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,19 @@
11
# Task Judger
22

3-
!!! warning ""
4-
Task judger will be **disabled** automatically when the user-defined workflow returned an effective `WorkflowOutput.reward` and `WorkflowOutput.reward != None`
5-
6-
73
Task Judger evaluates agent outputs and assigns rewards during training. This page covers built-in judgers for common scenarios and how to create custom judgers for specific evaluation needs.
84

5+
!!! warning "When to use the task judger"
6+
- **Is task judger necessary for all tasks? No**:
7+
- There are two options to generate reward:
8+
- Compute reward **inside** the user-defined workflow (`WorkflowOutput.reward is not None`)
9+
- Compute reward **outside** the user-defined workflow (`WorkflowOutput.reward is None`)
10+
- **Task judger** is how AgentJet handles **out-of-workflow** reward computation.
11+
- Task judger will be **Disabled and Ignored** when the user-defined workflow returned an effective `WorkflowOutput.reward` and `WorkflowOutput.reward != None`
12+
- Task judger will be **Enabled** when the user-defined workflow returned `WorkflowOutput.reward = None`.
13+
- **When to use the task judger**:
14+
- When the user plan to **re-used** the reward function in multiple other workflows in the future.
15+
- When the user want to **decouple** rollout and reward computation logic.
16+
- When the user want to use our [**OpenJudge**](https://github.com/modelscope/OpenJudge) integration to generate [Auto Rubrics reward](https://modelscope.github.io/OpenJudge/building_graders/generate_rubrics_as_graders/).
917

1018
## Overview
1119

@@ -117,7 +125,6 @@ Delegates evaluation to an external environment service, useful for complex inte
117125

118126
!!! tip "When to use"
119127
- Tasks with external simulators (e.g., AppWorld)
120-
- Complex state-based evaluation
121128
- Interactive environments with built-in evaluators
122129

123130
```yaml title="config.yaml"
@@ -127,12 +134,6 @@ ajet:
127134
judge_protocol: ajet.task_judge.env_service_as_judge->EnvServiceJudge
128135
```
129136
130-
!!! note "How it works"
131-
1. Calls `workflow_task.gym_env.evaluate()` to get a score from the environment
132-
2. Converts the score to a normalized reward:
133-
- Success (score ≥ 1): `1.0 + score * 0.5`
134-
- Failure (score < 1): `0.0 + score * 0.5`
135-
136137
137138
## Creating Custom Task Judgers
138139

tutorial/README.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
#### Example Library
2+
3+
Explore our rich library of examples to kickstart your journey.
4+
5+
- Example Documentation:
6+
7+
https://doc.agentjet.top/AgentJet/#example-library
8+
9+
- Example Benchmark Tracking System:
10+
11+
https://benchmark.agent-matrix.com/examples

0 commit comments

Comments
 (0)