update docs

binary-husky · binary-husky · commit 6d4849a40ef0 · 2026-01-12T13:41:42.000+08:00
diff --git a/README.md b/README.md
@@ -1,9 +1,9 @@
 # AgentJet
 
 [![Benchmarking](https://img.shields.io/badge/Benchmarking-0078D4?style=for-the-badge&logo=github)](https://benchmark.agent-matrix.com/)
-[![Docs](https://img.shields.io/badge/Docs-Read%20the%20Guide-0A7ECC?style=for-the-badge&logo=readthedocs&logoColor=white)](docs/en/installation.md)
+[![Docs](https://img.shields.io/badge/Docs-Read%20the%20Guide-0A7ECC?style=for-the-badge&logo=readthedocs&logoColor=white)](https://doc.agentjet.top/AgentJet)
 [![License](https://img.shields.io/badge/License-Apache--2.0-4c1?style=for-the-badge)](LICENSE)
-[![Python](https://img.shields.io/badge/Python-3.10+-3776AB?style=for-the-badge&logo=python&logoColor=white)](docs/en/installation.md#requirements)
+[![Python](https://img.shields.io/badge/Python-3.10+-3776AB?style=for-the-badge&logo=python&logoColor=white)](https://doc.agentjet.top/AgentJet/en/installation#requirements)
 
 <div align="center">
 <img width="500" alt="AgentJet" src="docs/agentjet.jpg"/>
@@ -24,6 +24,9 @@ Let's begin with the simplest example: a math agent with a tool call.
 - Then, tune your first model using the minimum example.
   ```python
   ajet --conf tutorial/example_math_agent/math_agent.yaml --backbone='verl'
+
+  # change to --backbone='trinity' if you want to switch to trinity training engine;
+  # or --backbone='debug' if you want to debug with only vLLM
   ```
 
 
@@ -34,7 +37,7 @@ We aim to build a easy-to-learn Agent tuner that unlock more possibilities for a
 - **Easy and Friendly**. AgentJet helps you tune models behind your agent workflows easily, optimizing your agents for top performance with minimal effort.
 - **Rich Tutorial Library**. AgentJet provides a rich library of [examples](https://github.com/modelscope/AgentJet/tree/main/tutorial) as tutorials.
 - **Efficient and Scalable**. AgentJet uses [verl] as the default backbone (`--backbone=verl`). However, we also support [trinity](https://github.com/modelscope/Trinity-RFT/) as alternative backbone, accelerating your tuning process via fully asynchronous RFT.
-- **Flexible and Fast**. AgentJet supports [multi-agent workflows](docs/en/workflow.md) and adopts a context merging technique, accelerating training by 1.5x to 10x when the workflow involves multi-turn (or multi-agent) conversations.
+- **Flexible and Fast**. AgentJet supports [multi-agent workflows](https://doc.agentjet.top/AgentJet/en/workflow.md) and adopts a context merging technique, accelerating training by 1.5x to 10x when the workflow involves multi-turn (or multi-agent) conversations.
 - **Reliability and Reproducibility**. Our team keeps track of framework performance across multiple [tasks + major-git-version + training-backbones](https://benchmark.agent-matrix.com/) (under construction, still gathering data, comming soon).
 
 For advanced researchers, AgentJet also provides high-resolution logging and debugging solutions:
@@ -49,42 +52,26 @@ For advanced researchers, AgentJet also provides high-resolution logging and deb
 
 #### Installation
 
-We recommend using `uv` for dependency management.
-
-1. **Clone the Repository**:
-```bash
-git clone https://github.com/modelscope/AgentJet.git
-cd AgentJet
-```
-
-
-2. **Set up Environment**:
-```bash
-uv venv --python=3.10.16 && source .venv/bin/activate
-uv pip install -e .[trinity]
-# Note: flash-attn must be installed after other dependencies
-uv pip install flash_attn==2.8.1 --no-build-isolation --no-cache-dir
-```
-
+- **Click here to read the** [**installation guide**](https://doc.agentjet.top/AgentJet/en/installation/).
 
 #### Run Training
 
-You can start training your first agent with a single command using a pre-configured YAML file. Take the [Math agent](docs/en/example_math_agent.md) as an example:
+- You can start training your first agent with a single command using a pre-configured YAML file. Take the [Math agent](https://doc.agentjet.top/AgentJet/en/example_math_agent/) as an example:
 
-```bash
-ajet --conf tutorial/example_math_agent/math_agent.yaml
-```
+  ```bash
+  ajet --conf tutorial/example_math_agent/math_agent.yaml
+  ```
 
 #### Example Library
 
 Explore our rich library of examples to kickstart your journey:
 
-- 🔢 [**Training a math agent that can write python code**](docs/en/example_math_agent.md).
-- 📱 [**Creating an AppWorld agent using AgentScope and training it**](docs/en/example_app_world.md).
-- 🐺 [**Developing Werewolves RPG agents and training them**](docs/en/example_werewolves.md).
-- 👩🏻‍⚕️ [**Learning to ask questions like a doctor**](docs/en/example_learning_to_ask.md).
-- 🎴 [**Writing a countdown game using AgentScope and solving it**](docs/en/example_countdown.md).
-- 🚶 [**Solving a frozen lake walking puzzle using AgentJet**](docs/en/example_frozenlake.md).
+- 🔢 [**Training a math agent that can write python code**](https://doc.agentjet.top/AgentJet/en/example_math_agent).
+- 📱 [**Creating an AppWorld agent using AgentScope and training it**](https://doc.agentjet.top/AgentJet/en/example_app_world).
+- 🐺 [**Developing Werewolves RPG agents and training them**](https://doc.agentjet.top/AgentJet/en/example_werewolves).
+- 👩🏻‍⚕️ [**Learning to ask questions like a doctor**](https://doc.agentjet.top/AgentJet/en/example_learning_to_ask).
+- 🎴 [**Writing a countdown game using AgentScope and solving it**](https://doc.agentjet.top/AgentJet/en/example_countdown).
+- 🚶 [**Solving a frozen lake walking puzzle using AgentJet**](https://doc.agentjet.top/AgentJet/en/example_frozenlake).
 
 
 ---
@@ -102,9 +89,9 @@ AgentJet makes agent fine-tuning straightforward by separating the developer int
 
 To optimize an agent, you provide three core inputs:
 
-* [**Trainable Workflow**](docs/en/workflow.md): Define your agent logic by inheriting the Workflow class, supporting both simple agent setups and advanced multi-agent collaborations.
-* [**Task Reader**](docs/en/data_pipeline.md): Load training tasks from JSONL files, HuggingFace datasets, interactive environments, or auto-generate them from documents.
-* [**Task Judger**](docs/en/task_judger.md): Evaluates agent outputs and assigns rewards to guide training.
+* [**Trainable Workflow**](https://doc.agentjet.top/AgentJet/en/workflow): Define your agent logic by inheriting the Workflow class, supporting both simple agent setups and advanced multi-agent collaborations.
+* [**Task Reader**](https://doc.agentjet.top/AgentJet/en/data_pipeline): Load training tasks from JSONL files, HuggingFace datasets, interactive environments, or auto-generate them from documents.
+* [**Task Judger**](https://doc.agentjet.top/AgentJet/en/task_judger): Evaluates agent outputs and assigns rewards to guide training.
 
 #### 2. Internal System Architecture
 
@@ -118,14 +105,14 @@ The internal system orchestrates several specialized modules to handle the compl
 * **Context Tracker**: Monitors LLM calls and automatically merges shared-history timelines to improve training efficiency by **1.5x to 10x**.
 
 
----
+
 
 ### 🚦 Navigation
 
-* 📖 **Tutorials**: From [Installation](docs/en/installation.md) to [Tuning your first agent](docs/en/tutorial.md) — the essential path for beginners.
-* 🛠️ **Core Components**: Define your [Trainable Workflow](docs/en/workflow.md) and manage [Data](docs/en/data_pipeline.md) and [Reward](docs/en/tune_your_first_agent.md).
-* 💡 **Example**: Check the [Example Library](#example-library) above for real-world cases like [Math](docs/en/example_math_agent.md), [Werewolves game](docs/en/example_werewolves.md) and  [Learning to ask task](docs/en/example_learning_to_ask.md).
-* ⚙️ **Deep Dive**: Master advanced [Configuration](docs/en/configuration.md).
+* 📖 **Tutorials**: From [Installation](https://doc.agentjet.top/AgentJet/en/installation) to [Tuning your first agent](https://doc.agentjet.top/AgentJet/en/tune_your_first_agent) — the essential path for beginners.
+* 🛠️ **Core Components**: Define your [Trainable Workflow](https://doc.agentjet.top/AgentJet/en/workflow) and manage [Data](https://doc.agentjet.top/AgentJet/en/data_pipeline) and [Reward](https://doc.agentjet.top/AgentJet/en/task_judger).
+* 💡 **Example**: Check the [Example Library](#example-library) above for real-world cases like [Math](https://doc.agentjet.top/AgentJet/en/example_math_agent), [Werewolves game](https://doc.agentjet.top/AgentJet/en/example_werewolves) and  [Learning to ask task](https://doc.agentjet.top/AgentJet/en/example_learning_to_ask).
+* ⚙️ **Deep Dive**: Master advanced [Configuration](https://doc.agentjet.top/AgentJet/en/configuration).
 
 ## 🗺️ Roadmap
 
diff --git a/docs/en/example_math_agent.md b/docs/en/example_math_agent.md
@@ -2,24 +2,28 @@
 
 Train a **tool-using Math Agent** (ReAct + Python executor) to solve GSM8K-style math problems. Rewards come from a **judge** that checks final-answer correctness.
 
----
 
 ## Overview
 
-<div class="callout-tip">
-<p>
 In <strong>Math Agent</strong>, each training sample is a math word problem (e.g., GSM8K). The agent learns to reason step by step (ReAct-style), call a Python tool when computation is needed, and produce a final answer that matches the reference.
-</p>
-</div>
 
-This tutorial is organized in two steps:
 
-1. **Run it**: Download the dataset and start training with the default YAML config
-2. **Understand & customize**: Read the workflow and the judge/reward logic
+This tutorial is organized into the following sections:
+
+- [**Run this tutorial**: Download the dataset and start training with the default YAML config.](#quick-start)
+- [**Understand & customize**: Read the workflow and the judge/reward logic.](#explain)
+- [**Training Curve**: Compare the training curlve.](#culve)
+
+
+
+
+
+
+
 
----
 
-## Quick Start
+
+## Quick Start {#quick-start}
 
 ### Prepare Dataset
 
@@ -71,11 +75,21 @@ ajet --conf tutorial/example_math_agent/math_agent.yaml --backbone='verl'
     }
     ```
 
----
 
-## Understanding the Training Pipeline
 
-### What Happens Each Step
+
+
+
+
+
+
+
+
+
+
+## Understanding the Training Pipeline {#explain}
+
+### Pipeline Abstraction
 
 <div class="workflow-single">
 <div class="workflow-header">Training Step Flow</div>
@@ -85,44 +99,19 @@ ajet --conf tutorial/example_math_agent/math_agent.yaml --backbone='verl'
 <li><strong>Load one problem</strong>
 
 Load a math problem from the dataset via `task_reader`.</li>
-<li><strong>Run the AgentScope workflow</strong>
+<li><strong>Run the Workflow</strong>
 
-Build the prompt, let the ReAct agent call Python tools, and extract the final answer.</li>
-<li><strong>Register info for evaluation</strong>
+Build the prompt, let the ReActAgent call Python tools, and extract the final answer.</li>
+<li><strong>Return result as `WorkflowOutput`</strong>
 
-Return `WorkflowOutput(reward=None, metadata={"final_answer": final_answer})`.</li>
+Return `WorkflowOutput(reward=None, metadata={"final_answer": final_answer})`. (reward=None because we want to compute reward outside the workflow)</li>
 <li><strong>Run the judge</strong>
 
 Compare `final_answer` with reference, compute `raw_reward` and `is_success`.</li>
 </ol>
 </div>
 </div>
 
-### YAML Configuration
-
-Most wiring happens in `tutorial/example_math_agent/math_agent.yaml`:
-
-```yaml title="math_agent.yaml"
-ajet:
-  task_reader:
-    type: huggingface_dat_repo   # also supports: dataset_file / env_service
-
-  rollout:
-    user_workflow: tutorial.example_math_agent.math_agent->ExampleMathLearn
-
-  task_judge:
-    judge_protocol: tutorial.example_math_agent.math_answer_as_judge->MathAnswerAndLlmAsJudge
-
-  model:
-    path: YOUR_MODEL_PATH
-```
-
-| Field | Description |
-|-------|-------------|
-| `task_reader` | Where tasks come from |
-| `user_workflow` | Which workflow runs per sample |
-| `judge_protocol` | Which judge computes rewards |
-| `model.path` | Pretrained model to fine-tune |
 
 ### Code Walkthrough
 
@@ -150,7 +139,11 @@ return WorkflowOutput(reward=None, metadata={"final_answer": final_answer})
 ```
 
 !!! warning "Important"
-    Always provide the final answer via `WorkflowOutput.metadata` so the judge can score it.
+    - User should put all elements necessary for reward computation in `WorkflowOutput.metadata`,
+    so the judge can use them.
+    - In this specific case, `final_answer` is that key element.
+
+
 
 ### Reward Computation
 
@@ -168,9 +161,38 @@ The judge receives:
     - Behavior penalty (tool called but no `print`)
     - Keep answer correctness as the primary signal
 
----
 
-## Results
+### YAML Configuration
+
+Most wiring happens in `tutorial/example_math_agent/math_agent.yaml`:
+
+```yaml title="math_agent.yaml"
+ajet:
+  task_reader:
+    type: huggingface_dat_repo   # also supports: dataset_file / env_service
+
+  rollout:
+    user_workflow: tutorial.example_math_agent.math_agent->ExampleMathLearn
+
+  task_judge:
+    judge_protocol: tutorial.example_math_agent.math_answer_as_judge->MathAnswerAndLlmAsJudge
+
+  model:
+    path: YOUR_MODEL_PATH
+```
+
+| Field | Description |
+|-------|-------------|
+| `task_reader` | Where tasks come from |
+| `user_workflow` | Which workflow runs per sample |
+| `judge_protocol` | Which judge computes rewards |
+| `model.path` | Pretrained model to fine-tune |
+
+
+
+
+
+## Results {#culve}
 
 ### Training Curve
 
diff --git a/docs/en/task_judger.md b/docs/en/task_judger.md
@@ -1,11 +1,19 @@
 # Task Judger
 
-!!! warning ""
-    Task judger will be **disabled** automatically when the user-defined workflow returned an effective `WorkflowOutput.reward` and `WorkflowOutput.reward != None`
-
-
 Task Judger evaluates agent outputs and assigns rewards during training. This page covers built-in judgers for common scenarios and how to create custom judgers for specific evaluation needs.
 
+!!! warning "When to use the task judger"
+    - **Is task judger necessary for all tasks? No**:
+        - There are two options to generate reward:
+            - Compute reward **inside** the user-defined workflow (`WorkflowOutput.reward is not None`)
+            - Compute reward **outside** the user-defined workflow (`WorkflowOutput.reward is None`)
+        - **Task judger** is how AgentJet handles **out-of-workflow** reward computation.
+        - Task judger will be **Disabled and Ignored** when the user-defined workflow returned an effective `WorkflowOutput.reward` and `WorkflowOutput.reward != None`
+        - Task judger will be **Enabled** when the user-defined workflow returned `WorkflowOutput.reward = None`.
+    - **When to use the task judger**:
+        - When the user plan to **re-used** the reward function in multiple other workflows in the future.
+        - When the user want to **decouple** rollout and reward computation logic.
+        - When the user want to use our [**OpenJudge**](https://github.com/modelscope/OpenJudge) integration to generate [Auto Rubrics reward](https://modelscope.github.io/OpenJudge/building_graders/generate_rubrics_as_graders/).
 
 ## Overview
 
@@ -117,7 +125,6 @@ Delegates evaluation to an external environment service, useful for complex inte
 
 !!! tip "When to use"
     - Tasks with external simulators (e.g., AppWorld)
-    - Complex state-based evaluation
     - Interactive environments with built-in evaluators
 
 ```yaml title="config.yaml"
@@ -127,12 +134,6 @@ ajet:
     judge_protocol: ajet.task_judge.env_service_as_judge->EnvServiceJudge
 ```
 
-!!! note "How it works"
-    1. Calls `workflow_task.gym_env.evaluate()` to get a score from the environment
-    2. Converts the score to a normalized reward:
-        - Success (score ≥ 1): `1.0 + score * 0.5`
-        - Failure (score < 1): `0.0 + score * 0.5`
-
 
 ## Creating Custom Task Judgers
 
diff --git a/tutorial/README.md b/tutorial/README.md
@@ -0,0 +1,11 @@
+#### Example Library
+
+Explore our rich library of examples to kickstart your journey.
+
+- Example Documentation:
+
+    https://doc.agentjet.top/AgentJet/#example-library
+
+- Example Benchmark Tracking System:
+
+    https://benchmark.agent-matrix.com/examples