DeepFinance Enhancements (#6)

TaoShuchang · binary-husky · web-flow · commit 224560fc8c65 · 2026-01-23T15:18:57.000+08:00
* feat(finworld): Added AgentScope learning protocol and OpenJudge evaluation functionality to the FinWorld task. - Added the ExampleAgentScopeLearnProtocol class to implement the AgentScope execution flow for multi-turn interactions. - Integrated semaphore control to manage the parallelism of environment calls, improving environment stepping performance. - Implemented a mechanism for detecting context overflows and quickly terminating during environment interactions to prevent blocking. - Added a finworld.yaml configuration file to define project training and rollout parameters. - Added the FinWorldJudgeByOpenJudge class, integrating multiple evaluators including RM Gallery and OpenJudge (@Haoran). - Implemented a mechanism for converting task output, asynchronous calls, and retrying to ensure evaluation stability. - Weight normalization manages the contributions of each evaluator, merging them to calculate the final reward and success determination. * Precommit fix (#4) * fix end of files * autoflake import fix * add mypy check * fix test bench import * refactor(finworld): Replace agent protocol and unify configuration updates - Renamed ExampleAgentScopeLearnProtocol to ExampleDeepResearchProtocol and modified the execute method signature. - Unified the parameter name of the model tuner to `tuner` and its related attribute references. - Optimized the multi-turn interaction step configuration, changing it to use `tuner.config.ajet.rollout.multi_turn.max_steps`. - Modified the context overflow judgment logic to prevent tool call blocking. - Updated the finworld.yaml configuration, replacing astune with ajet-related configurations, and adjusted the workflow protocol and environment parameters. - Modified the default environment variable values ​​and log saving paths in finworld_judge.py. - Added and improved multi-machine and single-machine startup scripts, supporting dynamic generation of MCP configuration and environment variable loading. - Added the finworld_single.yaml template to adapt to single-machine training configurations. - Adjusted the key reference for multi-turn step configuration in ma_deepresearch.py, using the ajet configuration path. * feat(finworld): Added FinWorld training environment configuration scripts and templates - Added bash startup scripts for multi-machine, multi-GPU training, supporting dynamic configuration generation and environment variable import. - Implemented training configuration file templates, supporting automatic injection of various weight parameters and model paths. - Adjusted the default request timeout of EnvClient from 30 seconds to 300 seconds to accommodate long training requests. - Added a new finworld example directory and related documentation, improving the example project structure. * refactor(utils): Remove unused extract and compute functions `extract_tool_stats_from_cmts` * refactor(finworld): Replace the old model with OpenJudge, update evaluation configuration and scripts - Replaced model initialization in FinWorldJudgeByOpenJudge with the `_init_openjudge_model` method - Read Judge model parameters from the configuration file first, using environment variables as a fallback - Optimized RM Gallery initialization, using configuration-first logic, and improved exception stack trace printing - Cleaned up and removed the old `_init_model` singleton method and related code - Updated the example startup script `ajet_finworld.sh`, adding OPENJUDGE_LLM and RM_LLM configurations - Modified YAML templates and configuration files to unify the structure and field naming of Judge configuration items - Deleted the outdated `cc_rm4_res2cit2fai2_30b.sh` script - Adjusted the `env_service` startup path to improve environment activation compatibility - Adjusted script log output format and content to enhance the clarity of configuration parameter printing * feat(task_reader): Support data reading of type jsonl_with_env_service - Added the jsonl_with_env_service type, which allows loading data from jsonl files while calling tools via env_service. - Extended ResourceKeeper to handle the creation and release logic of environment instances for jsonl_with_env_service. - Maintained the env_service type logic, calling create_instance to register instances and initializing them using init_messages from the jsonl file. - Added an example protocol, ExampleDeepResearchProtocol, to implement multi-turn interaction and environment call coordination. - Provided training scripts and YAML configuration templates for finworld, supporting the jsonl_with_env_service mode training environment. - Optimized scripts to support multi-node multi-GPU training, including environment variables and Ray cluster configuration. * feat(core): add finworld task reader support to framework * feat(finworld): implement specialized data reader and openjudge-based grading logic * refactor(finworld): optimize configuration templates and prompt engineering * chore(finworld): update launch scripts and add variant experiment scripts * feat(finworld): Added support for multi-machine, multi-GPU training scripts and configuration templates: * chore(git): ignore finworld/yaml/* * fix(metrics): Fix and enhance the compatibility and debugging output of the metrics update logic - Modified the `update_metrics` function, adding a `prefix` parameter to distinguish between training and validation metrics. - Adjusted the data source for extracting `reward_stats` and `tool_stats`, migrating from `workflow_metadata` to `log_metrics`. - Added debug printing to output the `log_metrics` content and metric key names at key steps for easier troubleshooting. - Used the appropriate prefix when calling `update_metrics` in `trainer_verl.py`, and added multiple debug prints. - Modified `WorkflowOutput` to place `tool_stats` and `reward_stats` into the `log_metrics` field. - Removed redundant and deprecated code for extracting `reward_stats` and calculation functions. - Added debug information output to the `finworld` and `finworld_judge` modules to track log metrics and scoring data. * fix(metrics): Remove debug prints and synchronize reward statistics - Removed debug print statements before and after the `update_metrics` call in `trainer_verl.py` - Removed debug print statements related to the `log_metrics` key in `finworld.py` - Removed debug print statements before updating `metadata_stats` in `finworld_judge.py` - Added logic in `general_runner.py` to synchronize `reward_stats` from `metadata` to `log_metrics` after the judge calculation - Cleaned up debug print statements within `update_metrics` in `metric_helper`, improving code readability. * chore: "Stop tracking existing yaml files in tutorial directory" * fix(task_runner): Synchronize reward_stats to log_metrics feat(tutorial): Added FinWorld multi-machine multi-GPU training startup script * refactor(script): Refactored the finworld training script, integrating configuration and startup processes. * Refactor(deep_finance): Replace and remove finworld-related implementations - Switched the example directory from example_finworld to example_deep_finance - Modified startup parameters and logic to support deep_finance, replacing the finworld option - Replaced finworld_reader with deep_finance_reader in the task reader - Adjusted environment client configuration in resource management, using deep_finance instead of finworld-related checks - Updated reward metric tool documentation to support deep_finance - Deleted finworld-related configuration files, scripts, code, and evaluation modules, cleaning up leftover files and scripts - Replaced the keyword "finworld" with "deep_finance" in comments and logs * refactor(deepfinance): Rename and unify DeepFinance module and config references - Replace all "finworld" and "deep_finance" names with the unified "deepfinance" format. - Modify command-line arguments to `--with-deepfinance` for consistency. - Adjust the class name in `task_reader` from `deep_financeReader` to `DeepFinanceReader`. - Update the documentation description and file name of the `metric_helper` module to DeepFinance. - Modify environment variables and configuration paths in the example script `deep_finance.sh` to use the `DEEPFINANCE` prefix. - Update `judge_protocol` to `DeepFinanceJudgeByOpenJudge` in the `deep_finance.yaml` configuration. - Refactor the `FinWorldJudgeByOpenJudge` class in `deep_finance_judge.py` to `DeepFinanceJudgeByOpenJudge`. - Rename the `FinworldReader` class in `deep_finance_reader.py` to `DeepFinanceReader`. - Modify the debug log identifier and corresponding environment variable name to `DEEPFINANCE_DEBUG`. - Update the evaluation protocol in the `deep_finance_template.yaml` template to `DeepFinanceJudgeByOpenJudge`. - Ensure that internal references and comments in all modules are updated to use DeepFinance and deepfinance-related names. * refactor(tutorial): Optimize dynamic generation logic for configuration file paths * fix(deep_finance): argparse: with-deepfinance * fix(tutorial): Fixed issues with multi-machine training environment variable settings * fix(env): Corrected the assignment logic for reward and info when returning environment state - Corrected the `env_output` return value structure in `BaseGymEnv` to ensure correct assignment of `reward` and `info` fields. - Removed `RefJudge` and `StructureJudge` related metric calculations and statistics from `reward_metric_helper`. - Cleaned up redundant code in `reward_metric_helper`, removing invalid comments and statistical items. - Modified `save_trajectory_as_json` to always print trajectory saving confirmation information. - Corrected log comments in `example_deep_finance` to avoid meaningless log output. - Added the `save_trajectory_as_json_file` configuration item to `deep_finance_template.yaml` to support trajectory saving functionality. * chore(config): Update example_deep_finance configuration and clean up files - Added a new ignore rule for config file paths in .gitignore - Deleted the automatically generated mcp_finance_tool_generated.json file in example_deep_finance - Refactored the deep_finance.yaml configuration file, adjusting project and experiment names - Reorganized Judge configuration, clarifying openjudge_llm and rm_llm models - Optimized model paths and training parameter configurations, adding parallel and batch processing settings - Adjusted data reading methods and training/validation set path placeholders - Reduced GPU memory usage ratio for rollout to 0.8 - Updated the default save directory path for the trainer to a placeholder variable - Cleaned up unused and commented-out code to improve configuration file conciseness * Refactor(metric): Optimize tool metric calculation and data saving logic - Corrected the data source field for timeline data used during trajectory saving. - Removed redundant fields in tool execution time, cache hit rate, and error rate statistics. - Updated .gitignore to add ignore rules for the example script directory. - Removed unnecessary debugging information from logs to reduce log noise. - Adjusted log printing in the multi-round interaction execution process to simplify output content. - Streamlined log code for environment observation and termination checks to improve code readability. * fix(metric_helper): fix tool cache metric * fix little bug * fix(utils): Suppress httpx AsyncClient.aclose() exception warnings * comments to english * feat: 支持服务名称前缀功能 - 在 launcher 中添加 --prefix 参数支持 - 在 pty_launch 函数中实现前缀逻辑 - 更新 deep_finance.sh 脚本以使用前缀功能 - 允许在同一环境中运行多个服务实例 * fix: 改进 MultiAgent 消息内容解析逻辑 - 支持 tool_result 格式的消息内容块 - 改进非文本内容的处理逻辑，继续处理其他项而非跳过整个消息 - 添加 tool_use 类型的处理（跳过，因为已通过 tool_calls 字段处理） - 优化代码结构和注释，提高可读性 * fix: 优化 DeepFinance 判断逻辑和配置 - 修复 tool_stats 提取逻辑，从 log_metrics 中正确获取数据 - 添加惩罚项调试信息输出 - 启用 tool calls 功能（force_disable_toolcalls: False） - 确保奖励计算准确性 * chore(deps): bump agentscope from 1.0.7 to 1.0.8 * fix(metric_helper): correct trajectory save path and add tool call metric - Change trajectory save directory from "ctx_trackers" to "trajectory" to organize files better - Add recording of tool call counts alongside error rates in tool metrics - Update experiment suffix in deep finance example script for clearer naming convention * revise message parsing --------- Co-authored-by: binary-husky <96192199+binary-husky@users.noreply.github.com> Co-authored-by: Qingxu Fu <qingxu.fu@outlook.com> Co-authored-by: qingxu.fu <fuqingxu.fqx@alibaba-inc.com>
diff --git a/ajet/context_tracker/multiagent_tracking.py b/ajet/context_tracker/multiagent_tracking.py
@@ -82,13 +82,28 @@ def extract_text_content_from_content_dict(self, msg):
         #        },
         #    ],
         # }
+        # or tool_result format?? not observed yet:
+        # msg = {
+        #    "role": "tool",
+        #    "content": [
+        #        {
+        #           "type": "tool_result",
+        #           "id": "call_xxx",
+        #           "output": "tool output content",
+        #           "name": "tool_name"
+        #        },
+        #    ],
+        # }
 
         str_content = ""
         for item in msg["content"]:
             # item = {
             #   "type": "text",
             #   "text": "some text"
             # },
+            item_type = item.get("type", "")
+            assert not item_type == "tool_use", f"never observed such protocal yet"
+            assert not item_type == "tool_result", f"never observed such protocal yet"
 
             assert isinstance(item, dict), f"Unsupported non-dict item in message content: {item}. Full message: {msg}"
 
diff --git a/ajet/launcher.py b/ajet/launcher.py
@@ -99,6 +99,7 @@ def parse_args():
         default=False,
         help="Kill system processes (ray + vllm + python) that may block the current experiment",
     )
+    parser.add_argument("--prefix", type=str, default="", required=False, help="Prefix for deepfinance service names")
     return parser.parse_args()
 
 
@@ -304,7 +305,7 @@ def main():
         pty_launch("appworld")
 
     if args.with_deepfinance:
-        pty_launch("deepfinance")
+        pty_launch("deepfinance", prefix=args.prefix)
 
     if args.with_crafters:
         pty_launch("crafters")
diff --git a/ajet/utils/metric_helper/save_trajectory_as_json.py b/ajet/utils/metric_helper/save_trajectory_as_json.py
@@ -40,7 +40,7 @@ def save_trajectory_as_json(ctx_trackers, global_steps, prefix="train"):
         # Define save directory and file path
         traj_save_dir = os.path.join(
             os.environ.get("BEST_LOGGER_PATH", "launcher_record"),
-            "ctx_trackers",
+            "trajectory",
             prefix,
             f"step_{global_steps}"
         )
diff --git a/ajet/utils/metric_helper/tool_metric_helper.py b/ajet/utils/metric_helper/tool_metric_helper.py
@@ -125,6 +125,7 @@ def compute_tool_metrics(tool_stats_list: List[Dict[str, Any]], prefix: str = ""
         if calls > 0:
             error_rate = errors / calls * 100
             metrics[f"{prefix}tool_error/{tool_name}/error_rate"] = round(error_rate, 2)
+            metrics[f"{prefix}tool_error/{tool_name}/calls"] = calls
 
 
     return metrics
diff --git a/ajet/utils/pty.py b/ajet/utils/pty.py
@@ -96,13 +96,15 @@ def pty_wrapper_final(human_cmd, dir, env_dict):
     pty_wrapper(["/bin/bash", "-c", human_cmd], dir, env_dict)
 
 
-def pty_launch(service_name: str, success_std_string="Starting server on"):
+def pty_launch(service_name: str, success_std_string="Starting server on", prefix: str=""):
     from ajet.utils.smart_daemon import LaunchCommandWhenAbsent
 
     service_path = os.environ.get(f"{service_name.upper()}_PATH")
     service_script = os.environ.get(f"{service_name.upper()}_SCRIPT")
     if service_path is None or service_script is None:
         raise ValueError(f"Environment variables for {service_name} not properly set.")
+    if prefix != "":
+        service_name = prefix + "_" + service_name  
     companion = LaunchCommandWhenAbsent(
         full_argument_list=[service_script],
         dir=service_path,
diff --git a/pyproject.toml b/pyproject.toml
@@ -17,7 +17,7 @@ classifiers = [
 ]
 requires-python = ">=3.10,<3.13"
 dependencies = [
-    "agentscope==1.0.7",
+    "agentscope==1.0.8",
     "chromadb",
     "httpx",
     "tenacity",
diff --git a/tutorial/example_deep_finance/deep_finance.sh b/tutorial/example_deep_finance/deep_finance.sh
@@ -3,7 +3,7 @@ set -e
 #===============================================================================
 # 1. 配置区域 - 用户只需修改这里
 #===============================================================================
-SUFFIX="ajet_deep_finance"     # 实验后缀，影响所有日志和实验名称
+SUFFIX="deep_finance"     # 实验后缀，影响所有日志和实验名称
 PREFIX="open"                        # 实验前缀，影响日志和实验所在文件夹
 
 # OpenJudge 模型配置
@@ -208,6 +208,7 @@ if [[ $HOSTNAME == *"-master-"* ]]; then
         --with-deepfinance \
         --conf ${CONFIG_FILE} \
         --backbone="verl" \
+        --prefix=${SUFFIX} \
         2>&1 | tee ${TRAIN_LOG}
     
 
diff --git a/tutorial/example_deep_finance/deep_finance_judge.py b/tutorial/example_deep_finance/deep_finance_judge.py
@@ -373,8 +373,12 @@ def compute_reward(self, workflow_task: WorkflowTask, workflow_output: WorkflowO
             fused_reward, contributions = self._fuse_grader_scores(grader_scores, rm_raw)
             
             # 6. 计算惩罚项（保留原有的 tool_calls 惩罚逻辑）
-            tool_calls = metadata.get("tool_stats", {}).get("total_calls", 0)
+            # 从 log_metrics 中提取 tool_stats（deep_finance.py 将其放在 log_metrics 而非 metadata）
+            tool_stats = workflow_output.log_metrics.get("tool_stats", {})
+            tool_calls = tool_stats.get("total_calls", 0)
             penalty = self._compute_penalty(tool_calls)
+            if penalty < 0:
+                print(f"⚠️ Penalty applied: penalty={penalty}, tool_calls={tool_stats}")
             
             # 7. 汇总
             final_reward = fused_reward + step_reward + penalty
diff --git a/tutorial/example_deep_finance/yaml_template/deep_finance_template.yaml b/tutorial/example_deep_finance/yaml_template/deep_finance_template.yaml
@@ -32,7 +32,7 @@ ajet:
   rollout:
     # ✨✨✨✨ 编写并选择Agent
     user_workflow: tutorial.example_deep_finance.deep_finance->ExampleDeepResearchProtocol
-    force_disable_toolcalls: True
+    force_disable_toolcalls: False
     enable_oversample: False
     tensor_model_parallel_size: 8
     num_repeat: {{NUM_REPEAT}}

Original file line number	Diff line number	Diff line change
`@@ -40,7 +40,7 @@ def save_trajectory_as_json(ctx_trackers, global_steps, prefix="train"):`
`40`	`40`	`# Define save directory and file path`
`41`	`41`	`traj_save_dir = os.path.join(`
`42`	`42`	`os.environ.get("BEST_LOGGER_PATH", "launcher_record"),`
`43`		`- "ctx_trackers",`
	`43`	`+ "trajectory",`
`44`	`44`	`prefix,`
`45`	`45`	`f"step_{global_steps}"`
`46`	`46`	`)`
Original file line number	Diff line number	Diff line change
`@@ -17,7 +17,7 @@ classifiers = [`
`17`	`17`	`]`
`18`	`18`	`requires-python = ">=3.10,<3.13"`
`19`	`19`	`dependencies = [`
`20`		`- "agentscope==1.0.7",`
	`20`	`+ "agentscope==1.0.8",`
`21`	`21`	`"chromadb",`
`22`	`22`	`"httpx",`
`23`	`23`	`"tenacity",`