You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -38,7 +38,7 @@ We aim to build a easy-to-learn Agent tuner that unlock more possibilities for a
38
38
-**Rich Tutorial Library**. AgentJet provides a rich library of [examples](https://github.com/modelscope/AgentJet/tree/main/tutorial) as tutorials.
39
39
-**Efficient and Scalable**. AgentJet uses [verl] as the default backbone (`--backbone=verl`). However, we also support [trinity](https://github.com/modelscope/Trinity-RFT/) as alternative backbone, accelerating your tuning process via fully asynchronous RFT.
40
40
-**Flexible and Fast**. AgentJet supports [multi-agent workflows](https://doc.agentjet.top/AgentJet/en/workflow.md) and adopts a context merging technique, accelerating training by 1.5x to 10x when the workflow involves multi-turn (or multi-agent) conversations.
41
-
-**Reliability and Reproducibility**. Our team keeps track of framework performance across multiple [tasks + major-git-version + training-backbones](https://benchmark.agent-matrix.com/) (under construction, still gathering data, comming soon).
41
+
-**Reliability and Reproducibility**. Our team keeps track of framework performance across multiple [tasks + major-git-version + training-backbones](https://benchmark.agent-matrix.com/) (under construction, still gathering data, coming soon).
42
42
43
43
For advanced researchers, AgentJet also provides high-resolution logging and debugging solutions:
44
44
<!-- For advanced researchers, AgentJet provides high-resolution logging and debugging solutions that are, to our knowledge, unprecedented in other prior projects. -->
Copy file name to clipboardExpand all lines: docs/en/example_learning_to_ask.md
+2-4Lines changed: 2 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,10 +9,8 @@ Train an agent to **ask the next best question** (instead of answering directly)
9
9
10
10
In **Learning to Ask**, each training sample is a short **doctor–patient chat history**. The agent outputs **one next question** the doctor should ask next (optionally with multiple-choice answers), rather than giving diagnosis or treatment.
Figure: Why "Learning to Ask" matters. Left: LLM gives a diagnosis with too little information. Right: LLM asks clear follow-up questions before concluding, which feels more reassuring.
<center><small>Why "Learning to Ask" matters. Left: LLM gives a diagnosis with too little information. Right: LLM asks clear follow-up questions before concluding, which feels more reassuring.</small></center>
Copy file name to clipboardExpand all lines: docs/en/example_werewolves.md
+5Lines changed: 5 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,6 +7,7 @@ This tutorial demonstrates how to train **multiple agents** to play the Werewolv
7
7
The Werewolves role-playing game is a typical POMDP (Partially Observable Markov Decision Process) problem. We can train agents in this cooperative multi-agent problem using shared-parameter methods.
8
8
9
9
Terms explained:
10
+
10
11
-**Partially Observable**: Agents are only able to receive **local information**. One agent cannot obtain others' perception, even if they are teammates.
11
12
-**Markov Decision Process**: Making decisions according to current situations.
12
13
-**Shared-parameter**: Using one model as policy for multiple agents. But notice agents **share** policy (model parameters) but **do not share** perception (model input).
@@ -18,6 +19,7 @@ Terms explained:
18
19
This page shows how to use the Werewolves social deduction game as a multi-agent environment to prepare data and environment, write an AgentScope Workflow, configure the reward module (Judge), and complete the full process from local debugging to formal training.
19
20
20
21
Scenario Overview
22
+
21
23
- Scenario: Classic Werewolves game, including roles such as werewolf, villager, seer, witch, and hunter.
22
24
- Goal: Train a specific role (in this example, the `werewolf`) to achieve a higher win rate in games.
23
25
@@ -70,6 +72,7 @@ When `--backbone=debug`, Ray is disabled. You can use a VSCode `.vscode/launch.j
70
72
### 3.1 Core Process
71
73
72
74
At a high level, each training iteration follows this flow:
75
+
73
76
- The task reader generates a new game setup (players, role assignments, initial state).
74
77
- The rollout runs the AgentScope workflow to simulate a full game.
75
78
- Agents in `trainable_targets` act by using the trainable model (via `tuner.as_agentscope_model(...)`), while opponents use the fixed model.
@@ -112,11 +115,13 @@ When `judge_protocol: null`, training relies on the reward (or win/loss outcome)
112
115
In `ExampleWerewolves.execute()`, the workflow first runs a full game by calling `werewolves_game(players, roles)`, and obtains `good_guy_win` (whether the good-guy side wins).
113
116
114
117
Then it uses a **turn-level sparse win/loss reward**:
118
+
115
119
- If `good_guy_win == True` and the training target is not `werewolf` (i.e., you are training a good-guy role), then `raw_reward = 1` and `is_success = True`.
116
120
- If `good_guy_win == False` and the training target is `werewolf` (i.e., you are training a werewolf-side role), then `raw_reward = 1` and `is_success = True`.
117
121
- Otherwise, the training side did not win: `raw_reward = 0`and `is_success = False`.
118
122
119
123
Exception / invalid-behavior penalty:
124
+
120
125
- If an exception is thrown during the game (e.g., the game cannot proceed), all trainable targets are penalized uniformly: `raw_reward = -0.1`and `is_success = False`.
121
126
122
127
If you need a more fine-grained evaluation (e.g., giving partial credit for key intermediate decisions instead of only win/loss), implement a custom Judge and enable it via `ajet.task_judge.judge_protocol`.
Copy file name to clipboardExpand all lines: docs/en/workflow.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -80,10 +80,10 @@ Next, use the `tuner` argument, call its `tuner.as_agentscope_model()` method:
80
80
### 3. Code Example
81
81
82
82
<divclass="card-grid">
83
-
<ahref="en/example_math_agent/"class="feature-card"><divclass="card-header"><imgsrc="https://api.iconify.design/mdi:calculator-variant.svg"class="card-icon card-icon-math"alt=""><h3>Math Agent</h3></div><pclass="card-desc">Training a math agent that can write Python code to solve mathematical problems.</p></a>
84
-
<ahref="en/example_learning_to_ask/"class="feature-card"><divclass="card-header"><imgsrc="https://api.iconify.design/mdi:comment-question.svg"class="card-icon card-icon-general"alt=""><h3>Learning to Ask</h3></div><pclass="card-desc">Learning to ask questions like a doctor for medical consultation scenarios.</p></a>
85
-
<ahref="en/example_countdown/"class="feature-card"><divclass="card-header"><imgsrc="https://api.iconify.design/mdi:timer-sand.svg"class="card-icon card-icon-tool"alt=""><h3>Countdown Game</h3></div><pclass="card-desc">Writing a countdown game using AgentScope and solving it with RL.</p></a>
86
-
<ahref="en/example_frozenlake/"class="feature-card"><divclass="card-header"><imgsrc="https://api.iconify.design/mdi:snowflake.svg"class="card-icon card-icon-data"alt=""><h3>Frozen Lake</h3></div><pclass="card-desc">Solving a frozen lake walking puzzle using AgentJet's reinforcement learning.</p></a>
83
+
<ahref="../example_math_agent/"class="feature-card"><divclass="card-header"><imgsrc="https://api.iconify.design/mdi:calculator-variant.svg"class="card-icon card-icon-math"alt=""><h3>Math Agent</h3></div><pclass="card-desc">Training a math agent that can write Python code to solve mathematical problems.</p></a>
84
+
<ahref="../example_learning_to_ask/"class="feature-card"><divclass="card-header"><imgsrc="https://api.iconify.design/mdi:comment-question.svg"class="card-icon card-icon-general"alt=""><h3>Learning to Ask</h3></div><pclass="card-desc">Learning to ask questions like a doctor for medical consultation scenarios.</p></a>
85
+
<ahref="../example_countdown/"class="feature-card"><divclass="card-header"><imgsrc="https://api.iconify.design/mdi:timer-sand.svg"class="card-icon card-icon-tool"alt=""><h3>Countdown Game</h3></div><pclass="card-desc">Writing a countdown game using AgentScope and solving it with RL.</p></a>
86
+
<ahref="../example_frozenlake/"class="feature-card"><divclass="card-header"><imgsrc="https://api.iconify.design/mdi:snowflake.svg"class="card-icon card-icon-data"alt=""><h3>Frozen Lake</h3></div><pclass="card-desc">Solving a frozen lake walking puzzle using AgentJet's reinforcement learning.</p></a>
0 commit comments