add logo

binary-husky · binary-husky · commit 8af0644b6bc3 · 2026-01-12T12:17:11.000+08:00
diff --git a/README.md b/README.md
@@ -23,7 +23,7 @@ Let's begin with the simplest example: a math agent with a tool call.
 - First, please check out the [installation guide](https://doc.agentjet.top/AgentJet/en/installation/) to set up the training environment.
 - Then, tune your first model using the minimum example.
   ```python
-  ajet --conf tutorial/example_math_agent/math_agent.yaml --backbone='verl' --with-ray
+  ajet --conf tutorial/example_math_agent/math_agent.yaml --backbone='verl'
   ```
 
 
diff --git a/docs/en/configuration.md b/docs/en/configuration.md
@@ -2,7 +2,7 @@
 
 This page provides a detailed description of the configuration files for AgentJet.
 
----
+
 
 ## Overview
 
@@ -31,7 +31,7 @@ At a high level, a typical config contains a single root section `ajet`, which i
 
 </div>
 
----
+
 
 ## Model Configuration
 
@@ -58,7 +58,6 @@ export DASHSCOPE_API_KEY='sk-xxxxxx|sk-yyyyyy'
 export DASHSCOPE_API_KEY_BACKUP='sk-zzzzzz'
 ```
 
----
 
 ## Data Configuration
 
@@ -123,7 +122,6 @@ ajet:
 | `customized_protocol` | Use a custom Python class for scoring |
 | `rubrics_auto_grader` | Use LLM-based automatic grading |
 
----
 
 ## Training Configuration
 
@@ -213,7 +211,6 @@ ajet:
 | `use_kl_loss` | Include KL divergence in loss |
 | `kl_loss_coef` | KL loss coefficient |
 
----
 
 ## Debug Mode
 
@@ -234,7 +231,6 @@ ajet:
     - **Fixing randomness**: `debug_vllm_seed` helps reproduce issues
     - **Reduced parallelism**: Easier to debug with smaller concurrency
 
----
 
 ## Logging & Monitoring
 
@@ -262,108 +258,7 @@ All experiment outputs are saved in `./launcher_record/{experiment_name}`:
 | **Metrics** | Training metrics (depends on logger) |
 | **Checkpoint** | Model checkpoints |
 
----
-
-## Full Configuration Example
-
-??? example "Complete Configuration Template"
-    ```yaml title="config.yaml"
-    ajet:
-      project_name: "ajet_default_project"
-      experiment_name: "read_yaml_name"
-      experiment_dir: "auto"
-      backbone: debug
-
-      model:
-        path: /path/to/model/Qwen2.5-14B-Instruct
-
-      data:
-        max_prompt_length: 3000
-        max_response_length: 15000
-        train_batch_size: 32
-
-      rollout:
-        user_workflow: tutorial.example_appworld.appworld->ExampleAgentScopeWorkflow
-        force_disable_toolcalls: False
-        max_env_worker: 128
-        gamma: 1.0
-        compute_madness_checklist:
-          - "nonsense"
-        agent_madness_termination: True
-        agent_madness_reward: -1.0
-        max_response_length_in_one_turn: 4096
-        max_model_len: 18000
-        multi_turn:
-          max_sample_per_task: 30
-          max_steps: 30
-          expected_steps: 1
-        tensor_model_parallel_size: 1
-        n_vllm_engine: 2
-        max_num_seqs: 10
-        name: vllm
-        num_repeat: 4
-        temperature: 0.9
-        top_p: 1.0
-        val_kwargs:
-          temperature: 0.0
-          top_k: -1
-          top_p: 1.0
-          do_sample: False
-          num_repeat: 1
-
-      task_reader:
-        type: env_service
-        env_service:
-          env_type: "appworld"
-          env_url: "http://127.0.0.1:8080"
-          env_action_preference: code
-          training_split: train
-          validation_split: dev
-
-      task_judge:
-        judge_type: customized_protocol
-        judge_protocol: ajet.task_judge.env_service_as_judge->EnvServiceJudge
-        alien_llm_model: qwen3-235b-a22b-instruct-2507
-        alien_llm_response_length: 512
-
-      debug:
-        debug_max_parallel: 16
-        debug_first_n_tasks: 2
-        debug_vllm_port: 18000
-        debug_vllm_seed: 12345
-        debug_tensor_parallel_size: 4
-
-      trainer_common:
-        val_before_train: False
-        val_pass_n: 4
-        save_freq: 20
-        test_freq: 20
-        total_epochs: 50
-        nnodes: 1
-        n_gpus_per_node: 8
-        logger: swanlab
-        algorithm:
-          adv_estimator: grpo
-          use_kl_in_reward: False
-        mini_batch_num: 1
-        fsdp_config:
-          param_offload: True
-          optimizer_offload: True
-        optim:
-          lr: 1e-6
-        use_kl_loss: True
-        kl_loss_coef: 0.002
-        kl_loss_type: low_var_kl
-        ulysses_sequence_parallel_size: 1
-        checkpoint_base_dir: ./saved_checkpoints
-
-      context_tracker:
-        context_tracker_type: "linear"
-        alien_llm_model: qwen3-235b-a22b-instruct-2507
-        alien_llm_response_length: 512
-    ```
 
----
 
 ## Next Steps
 
diff --git a/docs/en/installation.md b/docs/en/installation.md
@@ -9,9 +9,10 @@ This document provides a step-by-step guide to installing AgentJet.
 
 ## Prerequisites
 
-| Requirement | Version |
+| Requirement | Detail |
 |-------------|---------|
-| **Python** | 3.10 |
+| **Python**         | 3.10 |
+| Package Management | `uv` or `conda` |
 
 
 ## Install from Source
diff --git a/docs/en/intro.md b/docs/en/intro.md
@@ -12,7 +12,7 @@ AgentJet aims to build a state-of-the-art agent tuning platform for both develop
 - **Easy and Friendly**. AgentJet helps you tune models behind your agent workflows easily, optimizing your agents for top performance with minimal effort.
 - **Rich Tutorial Library**. AgentJet provides a rich library of [examples](https://github.com/modelscope/AgentJet/tree/main/tutorial) as tutorials.
 - **Efficient and Scalable**. AgentJet uses [verl] as the default backbone (`--backbone=verl`). However, we also support [trinity](https://github.com/modelscope/Trinity-RFT/) as alternative backbone, accelerating your tuning process via fully asynchronous RFT.
-- **Flexible and Fast**. AgentJet supports [multi-agent workflows](docs/en/workflow.md) and adopts a context merging technique, accelerating training by 1.5x to 10x when the workflow involves multi-turn (or multi-agent) conversations.
+- **Flexible and Fast**. AgentJet supports [multi-agent workflows](workflow.md) and adopts a context merging technique, accelerating training by 1.5x to 10x when the workflow involves multi-turn (or multi-agent) conversations.
 - **Reliability and Reproducibility**. Our team keeps track of framework performance across multiple [tasks + major-git-version + training-backbones](https://benchmark.agent-matrix.com/) (under construction, still gathering data, comming soon).
 
 For advanced researchers, AgentJet also provides high-resolution logging and debugging solutions:
diff --git a/docs/en/quickstart.md b/docs/en/quickstart.md
@@ -1,39 +1,23 @@
 # Quick Start
 
-AgentJet provides a complete feature set for tuning agents. You can try starting training an agent right away:
+
+## 1. Testing Pre-define Demo
+
+AgentJet provides a complete feature set for tuning agents. You can try starting training an agent right away by running a demo:
 
 ```bash
 ajet --conf tutorial/example_math_agent/math_agent.yaml
 ```
 
----
 
-## Minimum Example
+## 2. Minimum Example
 
 Let's begin with the simplest example: a math agent with a tool call.
 
-<div class="workflow-single">
-<div class="workflow-header">Getting Started Flow</div>
-
-<div class="workflow">
-<ol class="workflow-steps">
-<li><strong>Set up Environment</strong>
-
-Check out the [installation guide](./installation.md) to set up the training environment.</li>
-<li><strong>Define Your Workflow</strong>
-
-Write an Agent class (e.g., `MathToolWorkflow`) that inherits from the base Workflow class.</li>
-<li><strong>Configure and Run</strong>
-
-Use the `AgentJetJob` API to configure and start training.</li>
-</ol>
-</div>
-</div>
-
-### Code Example
-
 ```python title="train_math_agent.py"
 from ajet import AgentJetJob
+
+# refer to `https://doc.agentjet.top/AgentJet/en/tune_your_first_agent/` on how to write your own workflow
 from tutorial.example_math_agent.math_agent_simplify import MathToolWorkflow
 
 model_path = "YOUR_MODEL_PATH"
@@ -57,9 +41,17 @@ tuned_model = job.tune()
     ajet --conf ./saved_experiments/math.yaml
     ```
 
----
 
-## Explore Examples
+## 3. Compare with Community Training Curves
+
+<div class="card-grid">
+<a href="https://benchmark.agent-matrix.com/examples" class="feature-card"><div class="card-header"><img src="https://api.iconify.design/mdi:calculator-variant.svg" class="card-icon card-icon-math" alt=""><h3>AgentJet Bechmark Tracking System (Developing-In-Progress)</h3></div><p class="card-desc">Compare training culves with community. Investigate the influence of versions, backbones, hyper-parameters, etc.</p></a>
+</div>
+
+
+
+
+## 4. Explore Example Gallery
 
 Explore our rich library of examples to kickstart your journey:
 
@@ -72,11 +64,8 @@ Explore our rich library of examples to kickstart your journey:
 <a href="./example_frozenlake/" class="feature-card"><div class="card-header"><img src="https://api.iconify.design/mdi:snowflake.svg" class="card-icon card-icon-data" alt=""><h3>Frozen Lake</h3></div><p class="card-desc">Solving a frozen lake walking puzzle.</p></a>
 </div>
 
----
-
----
 
-## Next Steps
+## 5. Next Steps
 
 <div class="card-grid">
 <a href="../tune_your_first_agent/" class="feature-card"><div class="card-header"><img src="https://api.iconify.design/mdi:rocket-launch.svg" class="card-icon card-icon-agent" alt=""><h3>Tune Your First Agent</h3></div><p class="card-desc">Complete step-by-step guide to building your own agent from scratch.</p></a>
diff --git a/docs/en/tune_your_first_agent.md b/docs/en/tune_your_first_agent.md
@@ -51,7 +51,7 @@ tutorial/example_math_agent
 └── math_agent.yaml
 ```
 
-Next, define your workflow (or convert an existing workflow). Here we use AgentScope to implement this agent. You can toggle two code before and after convertion to see the difference. If you prefer langchain or openai sdk, [please refer to this article](../agent_framework_support).
+Next, define your workflow (or convert an existing workflow). Here we use AgentScope to implement this agent. You can toggle two code before and after convertion to see the difference. If you prefer langchain or openai sdk, [please refer to this article](agent_framework_support.md).
 
 === "`math_agent.py` - AgentJet Workflow (After Convertion)"
 
@@ -199,6 +199,32 @@ Now, we have obtained all materials required to train the agent.
 
     ```
 
+### Configuration Parameters
+
+| Category | Parameter | Description | Example Value |
+|----------|-----------|-------------|---------------|
+| **Project** | `project_name` | Name of the training project | `example_math_agent` |
+| **Task Reader** | `type` | Type of data source to read tasks from | `huggingface_dat_repo` (options: `env_service`, `dataset_file`, `huggingface_dat_repo`) |
+| | `dataset_path` | Path or identifier of the dataset | `openai/gsm8k` |
+| | `training_split` | Dataset split used for training | `train` |
+| | `validation_split` | Dataset split used for validation/testing | `test` |
+| **Model** | `path` | Path or identifier of the model to be trained | `Qwen/Qwen2.5-7B` |
+| **Rollout** | `user_workflow` | Python module path to the workflow class | `tutorial.example_math_agent.math_agent->ExampleMathLearn` |
+| | `num_repeat` | Number of rollout repeats per task (GRPO `n` parameter) | `6` |
+| | `tensor_model_parallel_size` | vLLM tensor parallelism size | `1` |
+| | `max_response_length_in_one_turn` | Maximum token length for a single agent response | `1024` |
+| | `max_model_len` | Maximum total context length for the model | `10000` |
+| **Data** | `train_batch_size` | Number of tasks per training batch | `100` |
+| | `max_prompt_length` | Maximum token length for input prompts | `3000` |
+| | `max_response_length` | Maximum token length for model responses | `7000` |
+| **Debug** | `debug_max_parallel` | Maximum parallel workers in debug mode | `1` |
+| | `debug_first_n_tasks` | Number of tasks to process in debug mode | `1` |
+| **Trainer** | `save_freq` | Frequency (in steps) to save model checkpoints | `100` |
+| | `test_freq` | Frequency (in steps) to run validation | `100` |
+| | `total_epochs` | Total number of training epochs | `100` |
+| | `logger` | Logging backend for experiment tracking | `swanlab` |
+| **Task Judge** | `judge_protocol` | Protocol for judging task completion | `null` (reward is computed in workflow) |
+
 
 ## Step 3: ✨Debug (Optional)
 
@@ -231,7 +257,7 @@ We choose VSCode to debug because it is open-source and fast.
 
 After `.vscode/launch.json` is created, press `F5` to start debugging. (Do not forget to configure python venv path in VSCode.)
 
-For more debugging techniques, please refer to [debugging guidelines](../debugging_guide).
+For more debugging techniques, please refer to [debugging guidelines](debugging_guide.md).
 
 
 ## Step 4: ✨Start Training
diff --git a/docs/index.md b/docs/index.md
@@ -82,7 +82,7 @@
 </div>
 
 
-We recommend using `uv` for dependency management. [Click here](./installation/) for details and other training backbone (e.g. Trinity-RFT) options.
+We recommend using `uv` for dependency management. [Click here](en/installation.md) for details and other training backbone (e.g. Trinity-RFT) options.
 
 - Clone the Repository:
     ```bash