modelscope
diff --git a/‎.gitignore‎
Lines changed: 2 additions & 0 deletions b/‎.gitignore‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 55 additions & 60 deletions b/‎README.md‎
Lines changed: 55 additions & 60 deletions
diff --git a/‎ajet/__init__.py‎
Lines changed: 1 addition & 1 deletion b/‎ajet/__init__.py‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎ajet/backbone/main_verl.py‎
Lines changed: 2 additions & 3 deletions b/‎ajet/backbone/main_verl.py‎
Lines changed: 2 additions & 3 deletions
diff --git a/‎ajet/backbone/main_vllm.py‎
Lines changed: 4 additions & 5 deletions b/‎ajet/backbone/main_vllm.py‎
Lines changed: 4 additions & 5 deletions
diff --git a/‎ajet/backbone/trainer_verl.py‎
Lines changed: 4 additions & 4 deletions b/‎ajet/backbone/trainer_verl.py‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎ajet/context_tracker/basic_tracker.py‎
Lines changed: 3 additions & 3 deletions b/‎ajet/context_tracker/basic_tracker.py‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎ajet/default_config/ajet_default.yaml‎
Lines changed: 1 addition & 1 deletion b/‎ajet/default_config/ajet_default.yaml‎
Lines changed: 1 addition & 1 deletion
@@ -149,3 +149,5 @@ appworld_pack_v2.tar*
 saved_checkpoints
 data
 datasets
+tutorial2
+site
@@ -1,36 +1,46 @@
-# AgentJet
+# AgentJet (Beta)
 
 [![Benchmarking](https://img.shields.io/badge/Benchmarking-0078D4?style=for-the-badge&logo=github)](https://benchmark.agent-matrix.com/)
-[![Docs](https://img.shields.io/badge/Docs-Read%20the%20Guide-0A7ECC?style=for-the-badge&logo=readthedocs&logoColor=white)](docs/en/installation.md)
+[![Docs](https://img.shields.io/badge/Docs-Read%20the%20Documents-0A7ECC?style=for-the-badge&logo=readthedocs&logoColor=white)](https://doc.agentjet.top/AgentJet)
 [![License](https://img.shields.io/badge/License-Apache--2.0-4c1?style=for-the-badge)](LICENSE)
-[![Python](https://img.shields.io/badge/Python-3.10+-3776AB?style=for-the-badge&logo=python&logoColor=white)](docs/en/installation.md#requirements)
+[![Python](https://img.shields.io/badge/Python-3.10+-3776AB?style=for-the-badge&logo=python&logoColor=white)](https://doc.agentjet.top/AgentJet/en/installation#requirements)
+
+<div align="center">
+  <a href="https://doc.agentjet.top/AgentJet" target="_blank">
+    <img width="500" alt="AgentJet" src="docs/agentjet.jpg"/>
+  </a>
+</div>
 
-**AgentJet (AJet)** is a cutting-edge, user-friendly training framework designed to optimize agents and workflows (built with OpenAI SDK, AgentScope, and even vllm http requests), fine-tuning language model weights behind the scenes.
 
-Simply provide your Agent workflow, training data, and reward function, and we will be ready to enhance your agents to their optimal performance!
+**AgentJet (AJet)** is a cutting-edge, user-friendly training framework designed to optimize agents and workflows (built with OpenAI SDK, AgentScope, Langchain, or just HTTP requests), fine-tuning language model weights behind the scenes.
 
+Simply provide your agent **workflow**, training **dataset**, and **reward** function, and **AgentJet** will be ready to enhance your agents to their optimal performance!
 
 
-## 💡 Minimum Example
+
+## 🛩️ Minimum Example
 
 Let's begin with the simplest example: a math agent with a tool call.
 
-- First, please check out the [installation guide](docs/en/installation.md) to set up the training environment.
+- First, please check out the [installation guide](https://doc.agentjet.top/AgentJet/en/installation/) to set up the training environment.
 - Then, tune your first model using the minimum example.
   ```python
-  ajet --conf tutorial/example_math_agent/math_agent.yaml --backbone='verl' --with-ray
+  ajet --conf tutorial/example_math_agent/math_agent.yaml --backbone='verl'
+
+  # change to --backbone='trinity' if you want to switch to trinity training engine;
+  # or --backbone='debug' if you want to debug with only vLLM
   ```
 
 
-## Features
+## 🛩️ Features
 
 We aim to build a easy-to-learn Agent tuner that unlock more possibilities for agent developers:
 
 - **Easy and Friendly**. AgentJet helps you tune models behind your agent workflows easily, optimizing your agents for top performance with minimal effort.
 - **Rich Tutorial Library**. AgentJet provides a rich library of [examples](https://github.com/modelscope/AgentJet/tree/main/tutorial) as tutorials.
 - **Efficient and Scalable**. AgentJet uses [verl] as the default backbone (`--backbone=verl`). However, we also support [trinity](https://github.com/modelscope/Trinity-RFT/) as alternative backbone, accelerating your tuning process via fully asynchronous RFT.
-- **Flexible and Fast**. AgentJet supports [multi-agent workflows](docs/en/workflow.md) and adopts a context merging technique, accelerating training by 1.5x to 20x when the workflow involves multi-turn (or multi-agent) conversations.
-- **Reliability and Reproducibility**. Our team keeps track of framework performance across multiple [tasks + major-git-version + training-backbones](https://benchmark.agent-matrix.com/) (under construction, still gathering data, comming soon).
+- **Flexible and Fast**. AgentJet supports [multi-agent workflows](https://doc.agentjet.top/AgentJet/en/workflow.md) and adopts a context merging technique, accelerating training by 1.5x to 10x when the workflow involves multi-turn (or multi-agent) conversations.
+- **Reliability and Reproducibility**. Our team keeps track of framework performance across multiple [tasks + major-git-version + training-backbones](https://benchmark.agent-matrix.com/) (under construction, still gathering data, coming soon).
 
 For advanced researchers, AgentJet also provides high-resolution logging and debugging solutions:
 <!-- For advanced researchers, AgentJet provides high-resolution logging and debugging solutions that are, to our knowledge, unprecedented in other prior projects. -->
@@ -40,51 +50,35 @@ For advanced researchers, AgentJet also provides high-resolution logging and deb
 
 ---
 
-### 🚀 Quick Start
+### 🛩️ Quick Start
 
 #### Installation
 
-We recommend using `uv` for dependency management.
-
-1. **Clone the Repository**:
-```bash
-git clone https://github.com/modelscope/AgentJet.git
-cd AgentJet
-```
-
-
-2. **Set up Environment**:
-```bash
-uv venv --python=3.10.16 && source .venv/bin/activate
-uv pip install -e .[trinity]
-# Note: flash-attn must be installed after other dependencies
-uv pip install flash_attn==2.8.1 --no-build-isolation --no-cache-dir
-```
-
+- **Click here to read the** [**installation guide**](https://doc.agentjet.top/AgentJet/en/installation/).
 
 #### Run Training
 
-You can start training your first agent with a single command using a pre-configured YAML file. Take the [Math agent](docs/en/example_math_agent.md) as an example:
+- You can start training your first agent with a single command using a pre-configured YAML file. Take the [Math agent](https://doc.agentjet.top/AgentJet/en/example_math_agent/) as an example:
 
-```bash
-ajet --conf tutorial/example_math_agent/math_agent.yaml --backbone='trinity' --with-ray
-```
+  ```bash
+  ajet --conf tutorial/example_math_agent/math_agent.yaml
+  ```
 
 #### Example Library
 
 Explore our rich library of examples to kickstart your journey:
 
-- 🔢 [**Training a math agent that can write python code**](docs/en/example_math_agent.md).
-- 📱 [**Creating an AppWorld agent using AgentScope and training it**](docs/en/example_app_world.md).
-- 🐺 [**Developing Werewolves RPG agents and training them**](docs/en/example_werewolves.md).
-- 👩🏻‍⚕️ [**Learning to ask questions like a doctor**](docs/en/example_learning_to_ask.md).
-- 🎴 [**Writing a countdown game using AgentScope and solving it**](docs/en/example_countdown.md).
-- 🚶 [**Solving a frozen lake walking puzzle using AgentJet**](docs/en/example_frozenlake.md).
+- 🔢 [**Training a math agent that can write python code**](https://doc.agentjet.top/AgentJet/en/example_math_agent).
+- 📱 [**Creating an AppWorld agent using AgentScope and training it**](https://doc.agentjet.top/AgentJet/en/example_app_world).
+- 🐺 [**Developing Werewolves RPG agents and training them**](https://doc.agentjet.top/AgentJet/en/example_werewolves).
+- 👩🏻‍⚕️ [**Learning to ask questions like a doctor**](https://doc.agentjet.top/AgentJet/en/example_learning_to_ask).
+- 🎴 [**Writing a countdown game using AgentScope and solving it**](https://doc.agentjet.top/AgentJet/en/example_countdown).
+- 🚶 [**Solving a frozen lake walking puzzle using AgentJet**](https://doc.agentjet.top/AgentJet/en/example_frozenlake).
 
 
 ---
 
-### 🧩 Core Concepts
+### 🛩️ Core Concepts
 
 AgentJet makes agent fine-tuning straightforward by separating the developer interface from the internal execution logic.
 
@@ -97,9 +91,9 @@ AgentJet makes agent fine-tuning straightforward by separating the developer int
 
 To optimize an agent, you provide three core inputs:
 
-* [**Trainable Workflow**](docs/en/workflow.md): Define your agent logic by inheriting the Workflow class, supporting both simple agent setups and advanced multi-agent collaborations.
-* [**Task Reader**](docs/en/data_pipeline.md): Load training tasks from JSONL files, HuggingFace datasets, interactive environments, or auto-generate them from documents.
-* [**Task Judger**](docs/en/task_judger.md): Evaluates agent outputs and assigns rewards to guide training.
+* [**Trainable Workflow**](https://doc.agentjet.top/AgentJet/en/workflow): Define your agent logic by inheriting the Workflow class, supporting both simple agent setups and advanced multi-agent collaborations.
+* [**Task Reader**](https://doc.agentjet.top/AgentJet/en/data_pipeline): Load training tasks from JSONL files, HuggingFace datasets, interactive environments, or auto-generate them from documents.
+* [**Task Judger**](https://doc.agentjet.top/AgentJet/en/task_judger): Evaluates agent outputs and assigns rewards to guide training.
 
 #### 2. Internal System Architecture
 
@@ -110,28 +104,29 @@ The internal system orchestrates several specialized modules to handle the compl
 * **Task Rollout**: Bridges LLM engines and manages the Gym environment lifecycle.
 * **Task Runner**: Executes the Agent workflow and calculates rewards.
 * **Model Tuner**: Forwards inference requests from the workflow to the LLM engine.
-* **Context Tracker**: Monitors LLM calls and automatically merges shared-history timelines to improve training efficiency by **3x to 10x**.
+* **Context Tracker**: Monitors LLM calls and automatically merges shared-history timelines to improve training efficiency by **1.5x to 10x**.
+
 
 
----
 
-### 🚦 Navigation
+### 🛩️ Navigation
 
-* 📖 **Tutorials**: From [Installation](docs/en/installation.md) to [Tuning your first agent](docs/en/tutorial.md) — the essential path for beginners.
-* 🛠️ **Core Components**: Define your [Trainable Workflow](docs/en/workflow.md) and manage [Data](docs/en/data_pipeline.md) and [Reward](docs/en/tune_your_first_agent.md).
-* 💡 **Example**: Check the [Example Library](#example-library) above for real-world cases like [Math](docs/en/example_math_agent.md), [Werewolves game](docs/en/example_werewolves.md) and  [Learning to ask task](docs/en/example_learning_to_ask.md).
-* ⚙️ **Deep Dive**: Master advanced [Configuration](docs/en/configuration.md).
+* **Tutorials**: From [Installation](https://doc.agentjet.top/AgentJet/en/installation) to [Tuning your first agent](https://doc.agentjet.top/AgentJet/en/tune_your_first_agent) — the essential path for beginners.
+* **Core Components**: Define your [Trainable Workflow](https://doc.agentjet.top/AgentJet/en/workflow) and manage [Data](https://doc.agentjet.top/AgentJet/en/data_pipeline) and [Reward](https://doc.agentjet.top/AgentJet/en/task_judger).
+* **Example**: Check the [Example Library](#example-library) above for real-world cases like [Math](https://doc.agentjet.top/AgentJet/en/example_math_agent), [Werewolves game](https://doc.agentjet.top/AgentJet/en/example_werewolves) and  [Learning to ask task](https://doc.agentjet.top/AgentJet/en/example_learning_to_ask).
+* **Deep Dive**: Master advanced [Configuration](https://doc.agentjet.top/AgentJet/en/configuration).
 
-## 🗺️ Roadmap
+## 🛩️ Roadmap
 
 AgentJet is a constantly evolving project. We are planning to add the following features in the near future.
 
-- [ ] Advanced LLM-based multi-agent reinforcement learning.
-- [ ] Training dataset generation from few-shot samples.
-- [ ] Prompt tuning.
-- [ ] Multi-modal training support.
-- [ ] Cross-process Tuner wrapper to pass though process forking.
-- [ ] Providing training → user feedback → data augmentation → retraining data flywheel example.
-- [ ] Optimize configurations for long-context adaptation on smaller GPUs.
-- [ ] Add LoRA training examples.
-- [ ] Covering LangGraph and AutoGen frameworks.
+| Category | Feature | Status |
+| :--- | :--- | :--- |
+| **Examples** | Covering LangGraph and AutoGen frameworks | Done & Verifying |
+| **Examples** | Add LoRA training examples | Todo |
+| **Infra** | Cross-process Tuner wrapper to pass though process forking | Done & Verifying |
+| **Infra** | Optimize configurations for long-context adaptation on smaller GPUs | In Progress |
+| **Capability** | Prompt tuning | In Progress |
+| **Capability** | Multi-modal training support | Todo |
+| **Capability** | MARL Credit assignment | Todo |
+| **Capability** | Training dataset generation from few-shot samples | Done & Verifying |
@@ -10,7 +10,7 @@
     "WorkflowOutput",
     "AjetTuner",
     "AgentJetJob",
-    "bp",
+    "bp"
 ]
 
 __version__ = "0.1.0"
@@ -22,6 +22,7 @@
 import hydra
 import ray
 from beast_logger import print_dict
+from loguru import logger
 from omegaconf import OmegaConf
 from verl.trainer.ppo.reward import load_reward_manager
 from verl.utils.device import is_cuda_available
@@ -112,7 +113,7 @@ def run(self, config):
         from omegaconf import OmegaConf
         from verl.utils.fs import copy_to_local
 
-        print(f"TaskRunner hostname: {socket.gethostname()}, PID: {os.getpid()}")
+        logger.info(f"TaskRunner hostname: {socket.gethostname()}, PID: {os.getpid()}")
         pprint(OmegaConf.to_container(config, resolve=True))
         OmegaConf.resolve(config)
 
@@ -148,8 +149,6 @@ def run(self, config):
                 from verl.workers.fsdp_workers import CriticWorker
             elif use_legacy_worker_impl == "disable":
                 from verl.workers.roles import CriticWorker
-
-                print("Using new worker implementation")
             else:
                 raise ValueError(f"Invalid use_legacy_worker_impl: {use_legacy_worker_impl}")
 
 
@@ -10,6 +10,7 @@
 from ajet.utils.launch_utils import set_loguru_default_color
 from ajet.schema.logprob import TokenAndProb
 from ajet.utils.core_env_vars import get_runtime_env
+from loguru import logger
 
 set_loguru_default_color()
 
@@ -116,12 +117,11 @@ def run(config):
         config.ajet.task_reader,
     )
     tasks = task_reader.get_validation_tasks()
-    print(tasks[:2])
+    logger.info(tasks[:n_task])
     ctx_tracker = parallel_env.rollout(
         tasks=tasks[:n_task], mode="sample", epoch="1"
     )  # "sample" or "validate"
     _ = parallel_env.to_dataproto(ctx_tracker)
-    print("Generated batch output")
 
 
 @hydra.main(
@@ -133,7 +133,6 @@ def main(config):
     from omegaconf import OmegaConf
 
     OmegaConf.resolve(config)
-    print("*" * 20)
 
     runtime_env = get_runtime_env()
     os.environ.update(runtime_env["env_vars"])
@@ -147,12 +146,12 @@ def companion_launch():
 
         from ajet.utils.smart_daemon import LaunchCommandWhenAbsent
 
-        print("Launching companion process for async LLM server...")
+        logger.info("Launching companion process for async LLM server...")
         model_path = config.ajet.model.path
         tensor_parallel_size = config.ajet.debug.debug_tensor_parallel_size
         n_avail_gpus = torch.cuda.device_count()
         if tensor_parallel_size > n_avail_gpus:
-            print(
+            logger.info(
                 f"Warning: tensor_parallel_size {tensor_parallel_size} is greater than available GPUs {n_avail_gpus}. Setting tensor_parallel_size to {n_avail_gpus}."
             )
             tensor_parallel_size = n_avail_gpus
 
@@ -302,15 +302,15 @@ def check_mutually_exclusive(mbs, mbs_per_gpu, name: str):
             )
 
         if self.config.algorithm.use_kl_in_reward and config.actor_rollout_ref.actor.use_kl_loss:
-            print("NOTICE: You have both enabled in-reward kl and kl loss.")
+            logger.warning("NOTICE: You have both enabled in-reward kl and kl loss.")
 
         # critic
         if self.use_critic:
             critic_config = omega_conf_to_dataclass(config.critic)
             critic_config.validate(n_gpus, config.ajet.data.train_batch_size)
 
         if config.data.get("val_batch_size", None) is not None:
-            print(
+            logger.warning(
                 "WARNING: val_batch_size is deprecated."
                 + " Validation datasets are sent to inference engines as a whole batch,"
                 + " which will schedule the memory themselves."
@@ -322,7 +322,7 @@ def check_mutually_exclusive(mbs, mbs_per_gpu, name: str):
                 config.ajet.rollout.temperature > 0
             ), "validation gen temperature should be greater than 0 when enabling do_sample"
 
-        print("[validate_config] All configuration checks passed successfully!")
+        logger.success("[validate_config] All configuration checks passed successfully!")
 
     def init_workers(self):
         """Initialize distributed training workers using Ray backend.
@@ -807,7 +807,7 @@ def fit(self):  # noqa: C901
                         or esi_close_to_expiration
                     ):
                         if esi_close_to_expiration:
-                            print("Force saving checkpoint: ESI instance expiration approaching.")
+                            logger.info("Force saving checkpoint: ESI instance expiration approaching.")
                         with marked_timer("save_checkpoint", timing_raw, color="green"):
                             self._save_checkpoint()
 
 
@@ -1,8 +1,8 @@
+import torch
 import copy
 from collections import defaultdict
 from typing import List, Tuple
-
-import torch
+from loguru import logger
 
 from ajet.context_tracker.base_tracker import (
     BaseTracker,
@@ -233,7 +233,7 @@ def group_tokenize_multi_group(self):
             sample_arr += [sample]
 
         if len(sample_arr) > max_num_group:
-            print(f"Warning: allow {max_num_group} groups, but got {len(sample_arr)} groups")
+            logger.warning(f"Warning: allow {max_num_group} groups, but got {len(sample_arr)} groups")
             import random
 
             sample_arr = random.sample(sample_arr, max_num_group)  # preserve max_num_group groups
 
@@ -7,7 +7,7 @@ ajet:
 
 
   # the experimental reverse proxy feature that allows `tuner.as_oai_baseurl_apikey` feature
-  enable_experimental_reverse_proxy: True
+  enable_experimental_reverse_proxy: False
 
   model:
     # which model should be trained
Original file line number	Diff line number	Diff line change
`@@ -10,7 +10,7 @@`
`10`	`10`	`"WorkflowOutput",`
`11`	`11`	`"AjetTuner",`
`12`	`12`	`"AgentJetJob",`
`13`		`- "bp",`
	`13`	`+ "bp"`
`14`	`14`	`]`
`15`	`15`
`16`	`16`	`__version__ = "0.1.0"`