You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# change to --backbone='trinity' if you want to switch to trinity training engine;
29
+
# or --backbone='debug' if you want to debug with only vLLM
27
30
```
28
31
29
32
@@ -34,7 +37,7 @@ We aim to build a easy-to-learn Agent tuner that unlock more possibilities for a
34
37
-**Easy and Friendly**. AgentJet helps you tune models behind your agent workflows easily, optimizing your agents for top performance with minimal effort.
35
38
-**Rich Tutorial Library**. AgentJet provides a rich library of [examples](https://github.com/modelscope/AgentJet/tree/main/tutorial) as tutorials.
36
39
-**Efficient and Scalable**. AgentJet uses [verl] as the default backbone (`--backbone=verl`). However, we also support [trinity](https://github.com/modelscope/Trinity-RFT/) as alternative backbone, accelerating your tuning process via fully asynchronous RFT.
37
-
-**Flexible and Fast**. AgentJet supports [multi-agent workflows](docs/en/workflow.md) and adopts a context merging technique, accelerating training by 1.5x to 10x when the workflow involves multi-turn (or multi-agent) conversations.
40
+
-**Flexible and Fast**. AgentJet supports [multi-agent workflows](https://doc.agentjet.top/AgentJet/en/workflow.md) and adopts a context merging technique, accelerating training by 1.5x to 10x when the workflow involves multi-turn (or multi-agent) conversations.
38
41
-**Reliability and Reproducibility**. Our team keeps track of framework performance across multiple [tasks + major-git-version + training-backbones](https://benchmark.agent-matrix.com/) (under construction, still gathering data, comming soon).
39
42
40
43
For advanced researchers, AgentJet also provides high-resolution logging and debugging solutions:
@@ -49,42 +52,26 @@ For advanced researchers, AgentJet also provides high-resolution logging and deb
49
52
50
53
#### Installation
51
54
52
-
We recommend using `uv` for dependency management.
-**Click here to read the**[**installation guide**](https://doc.agentjet.top/AgentJet/en/installation/).
69
56
70
57
#### Run Training
71
58
72
-
You can start training your first agent with a single command using a pre-configured YAML file. Take the [Math agent](docs/en/example_math_agent.md) as an example:
59
+
-You can start training your first agent with a single command using a pre-configured YAML file. Take the [Math agent](https://doc.agentjet.top/AgentJet/en/example_math_agent/) as an example:
Explore our rich library of examples to kickstart your journey:
81
68
82
-
- 🔢 [**Training a math agent that can write python code**](docs/en/example_math_agent.md).
83
-
- 📱 [**Creating an AppWorld agent using AgentScope and training it**](docs/en/example_app_world.md).
84
-
- 🐺 [**Developing Werewolves RPG agents and training them**](docs/en/example_werewolves.md).
85
-
- 👩🏻⚕️ [**Learning to ask questions like a doctor**](docs/en/example_learning_to_ask.md).
86
-
- 🎴 [**Writing a countdown game using AgentScope and solving it**](docs/en/example_countdown.md).
87
-
- 🚶 [**Solving a frozen lake walking puzzle using AgentJet**](docs/en/example_frozenlake.md).
69
+
- 🔢 [**Training a math agent that can write python code**](https://doc.agentjet.top/AgentJet/en/example_math_agent).
70
+
- 📱 [**Creating an AppWorld agent using AgentScope and training it**](https://doc.agentjet.top/AgentJet/en/example_app_world).
71
+
- 🐺 [**Developing Werewolves RPG agents and training them**](https://doc.agentjet.top/AgentJet/en/example_werewolves).
72
+
- 👩🏻⚕️ [**Learning to ask questions like a doctor**](https://doc.agentjet.top/AgentJet/en/example_learning_to_ask).
73
+
- 🎴 [**Writing a countdown game using AgentScope and solving it**](https://doc.agentjet.top/AgentJet/en/example_countdown).
74
+
- 🚶 [**Solving a frozen lake walking puzzle using AgentJet**](https://doc.agentjet.top/AgentJet/en/example_frozenlake).
88
75
89
76
90
77
---
@@ -102,9 +89,9 @@ AgentJet makes agent fine-tuning straightforward by separating the developer int
102
89
103
90
To optimize an agent, you provide three core inputs:
104
91
105
-
*[**Trainable Workflow**](docs/en/workflow.md): Define your agent logic by inheriting the Workflow class, supporting both simple agent setups and advanced multi-agent collaborations.
106
-
*[**Task Reader**](docs/en/data_pipeline.md): Load training tasks from JSONL files, HuggingFace datasets, interactive environments, or auto-generate them from documents.
107
-
*[**Task Judger**](docs/en/task_judger.md): Evaluates agent outputs and assigns rewards to guide training.
92
+
*[**Trainable Workflow**](https://doc.agentjet.top/AgentJet/en/workflow): Define your agent logic by inheriting the Workflow class, supporting both simple agent setups and advanced multi-agent collaborations.
93
+
*[**Task Reader**](https://doc.agentjet.top/AgentJet/en/data_pipeline): Load training tasks from JSONL files, HuggingFace datasets, interactive environments, or auto-generate them from documents.
94
+
*[**Task Judger**](https://doc.agentjet.top/AgentJet/en/task_judger): Evaluates agent outputs and assigns rewards to guide training.
108
95
109
96
#### 2. Internal System Architecture
110
97
@@ -118,14 +105,14 @@ The internal system orchestrates several specialized modules to handle the compl
118
105
***Context Tracker**: Monitors LLM calls and automatically merges shared-history timelines to improve training efficiency by **1.5x to 10x**.
119
106
120
107
121
-
---
108
+
122
109
123
110
### 🚦 Navigation
124
111
125
-
* 📖 **Tutorials**: From [Installation](docs/en/installation.md) to [Tuning your first agent](docs/en/tutorial.md) — the essential path for beginners.
126
-
* 🛠️ **Core Components**: Define your [Trainable Workflow](docs/en/workflow.md) and manage [Data](docs/en/data_pipeline.md) and [Reward](docs/en/tune_your_first_agent.md).
127
-
* 💡 **Example**: Check the [Example Library](#example-library) above for real-world cases like [Math](docs/en/example_math_agent.md), [Werewolves game](docs/en/example_werewolves.md) and [Learning to ask task](docs/en/example_learning_to_ask.md).
* 📖 **Tutorials**: From [Installation](https://doc.agentjet.top/AgentJet/en/installation) to [Tuning your first agent](https://doc.agentjet.top/AgentJet/en/tune_your_first_agent) — the essential path for beginners.
113
+
* 🛠️ **Core Components**: Define your [Trainable Workflow](https://doc.agentjet.top/AgentJet/en/workflow) and manage [Data](https://doc.agentjet.top/AgentJet/en/data_pipeline) and [Reward](https://doc.agentjet.top/AgentJet/en/task_judger).
114
+
* 💡 **Example**: Check the [Example Library](#example-library) above for real-world cases like [Math](https://doc.agentjet.top/AgentJet/en/example_math_agent), [Werewolves game](https://doc.agentjet.top/AgentJet/en/example_werewolves) and [Learning to ask task](https://doc.agentjet.top/AgentJet/en/example_learning_to_ask).
Copy file name to clipboardExpand all lines: docs/en/example_math_agent.md
+67-45Lines changed: 67 additions & 45 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,24 +2,28 @@
2
2
3
3
Train a **tool-using Math Agent** (ReAct + Python executor) to solve GSM8K-style math problems. Rewards come from a **judge** that checks final-answer correctness.
4
4
5
-
---
6
5
7
6
## Overview
8
7
9
-
<divclass="callout-tip">
10
-
<p>
11
8
In <strong>Math Agent</strong>, each training sample is a math word problem (e.g., GSM8K). The agent learns to reason step by step (ReAct-style), call a Python tool when computation is needed, and produce a final answer that matches the reference.
12
-
</p>
13
-
</div>
14
9
15
-
This tutorial is organized in two steps:
16
10
17
-
1.**Run it**: Download the dataset and start training with the default YAML config
18
-
2.**Understand & customize**: Read the workflow and the judge/reward logic
11
+
This tutorial is organized into the following sections:
12
+
13
+
-[**Run this tutorial**: Download the dataset and start training with the default YAML config.](#quick-start)
14
+
-[**Understand & customize**: Read the workflow and the judge/reward logic.](#explain)
15
+
-[**Training Curve**: Compare the training curlve.](#culve)
Return `WorkflowOutput(reward=None, metadata={"final_answer": final_answer})`. (reward=None because we want to compute reward outside the workflow)</li>
94
108
<li><strong>Run the judge</strong>
95
109
96
110
Compare `final_answer` with reference, compute `raw_reward` and `is_success`.</li>
97
111
</ol>
98
112
</div>
99
113
</div>
100
114
101
-
### YAML Configuration
102
-
103
-
Most wiring happens in `tutorial/example_math_agent/math_agent.yaml`:
104
-
105
-
```yaml title="math_agent.yaml"
106
-
ajet:
107
-
task_reader:
108
-
type: huggingface_dat_repo # also supports: dataset_file / env_service
Copy file name to clipboardExpand all lines: docs/en/task_judger.md
+12-11Lines changed: 12 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,11 +1,19 @@
1
1
# Task Judger
2
2
3
-
!!! warning ""
4
-
Task judger will be **disabled** automatically when the user-defined workflow returned an effective `WorkflowOutput.reward` and `WorkflowOutput.reward != None`
5
-
6
-
7
3
Task Judger evaluates agent outputs and assigns rewards during training. This page covers built-in judgers for common scenarios and how to create custom judgers for specific evaluation needs.
8
4
5
+
!!! warning "When to use the task judger"
6
+
- **Is task judger necessary for all tasks? No**:
7
+
- There are two options to generate reward:
8
+
- Compute reward **inside** the user-defined workflow (`WorkflowOutput.reward is not None`)
9
+
- Compute reward **outside** the user-defined workflow (`WorkflowOutput.reward is None`)
10
+
- **Task judger** is how AgentJet handles **out-of-workflow** reward computation.
11
+
- Task judger will be **Disabled and Ignored** when the user-defined workflow returned an effective `WorkflowOutput.reward` and `WorkflowOutput.reward != None`
12
+
- Task judger will be **Enabled** when the user-defined workflow returned `WorkflowOutput.reward = None`.
13
+
- **When to use the task judger**:
14
+
- When the user plan to **re-used** the reward function in multiple other workflows in the future.
15
+
- When the user want to **decouple** rollout and reward computation logic.
16
+
- When the user want to use our [**OpenJudge**](https://github.com/modelscope/OpenJudge) integration to generate [Auto Rubrics reward](https://modelscope.github.io/OpenJudge/building_graders/generate_rubrics_as_graders/).
9
17
10
18
## Overview
11
19
@@ -117,7 +125,6 @@ Delegates evaluation to an external environment service, useful for complex inte
117
125
118
126
!!! tip "When to use"
119
127
- Tasks with external simulators (e.g., AppWorld)
120
-
- Complex state-based evaluation
121
128
- Interactive environments with built-in evaluators
0 commit comments