Skip to content

Commit 224560f

Browse files
DeepFinance Enhancements (#6)
* feat(finworld): Added AgentScope learning protocol and OpenJudge evaluation functionality to the FinWorld task. - Added the ExampleAgentScopeLearnProtocol class to implement the AgentScope execution flow for multi-turn interactions. - Integrated semaphore control to manage the parallelism of environment calls, improving environment stepping performance. - Implemented a mechanism for detecting context overflows and quickly terminating during environment interactions to prevent blocking. - Added a finworld.yaml configuration file to define project training and rollout parameters. - Added the FinWorldJudgeByOpenJudge class, integrating multiple evaluators including RM Gallery and OpenJudge (@Haoran). - Implemented a mechanism for converting task output, asynchronous calls, and retrying to ensure evaluation stability. - Weight normalization manages the contributions of each evaluator, merging them to calculate the final reward and success determination. * Precommit fix (#4) * fix end of files * autoflake import fix * add mypy check * fix test bench import * refactor(finworld): Replace agent protocol and unify configuration updates - Renamed ExampleAgentScopeLearnProtocol to ExampleDeepResearchProtocol and modified the execute method signature. - Unified the parameter name of the model tuner to `tuner` and its related attribute references. - Optimized the multi-turn interaction step configuration, changing it to use `tuner.config.ajet.rollout.multi_turn.max_steps`. - Modified the context overflow judgment logic to prevent tool call blocking. - Updated the finworld.yaml configuration, replacing astune with ajet-related configurations, and adjusted the workflow protocol and environment parameters. - Modified the default environment variable values ​​and log saving paths in finworld_judge.py. - Added and improved multi-machine and single-machine startup scripts, supporting dynamic generation of MCP configuration and environment variable loading. - Added the finworld_single.yaml template to adapt to single-machine training configurations. - Adjusted the key reference for multi-turn step configuration in ma_deepresearch.py, using the ajet configuration path. * feat(finworld): Added FinWorld training environment configuration scripts and templates - Added bash startup scripts for multi-machine, multi-GPU training, supporting dynamic configuration generation and environment variable import. - Implemented training configuration file templates, supporting automatic injection of various weight parameters and model paths. - Adjusted the default request timeout of EnvClient from 30 seconds to 300 seconds to accommodate long training requests. - Added a new finworld example directory and related documentation, improving the example project structure. * refactor(utils): Remove unused extract and compute functions `extract_tool_stats_from_cmts` * refactor(finworld): Replace the old model with OpenJudge, update evaluation configuration and scripts - Replaced model initialization in FinWorldJudgeByOpenJudge with the `_init_openjudge_model` method - Read Judge model parameters from the configuration file first, using environment variables as a fallback - Optimized RM Gallery initialization, using configuration-first logic, and improved exception stack trace printing - Cleaned up and removed the old `_init_model` singleton method and related code - Updated the example startup script `ajet_finworld.sh`, adding OPENJUDGE_LLM and RM_LLM configurations - Modified YAML templates and configuration files to unify the structure and field naming of Judge configuration items - Deleted the outdated `cc_rm4_res2cit2fai2_30b.sh` script - Adjusted the `env_service` startup path to improve environment activation compatibility - Adjusted script log output format and content to enhance the clarity of configuration parameter printing * feat(task_reader): Support data reading of type jsonl_with_env_service - Added the jsonl_with_env_service type, which allows loading data from jsonl files while calling tools via env_service. - Extended ResourceKeeper to handle the creation and release logic of environment instances for jsonl_with_env_service. - Maintained the env_service type logic, calling create_instance to register instances and initializing them using init_messages from the jsonl file. - Added an example protocol, ExampleDeepResearchProtocol, to implement multi-turn interaction and environment call coordination. - Provided training scripts and YAML configuration templates for finworld, supporting the jsonl_with_env_service mode training environment. - Optimized scripts to support multi-node multi-GPU training, including environment variables and Ray cluster configuration. * feat(core): add finworld task reader support to framework * feat(finworld): implement specialized data reader and openjudge-based grading logic * refactor(finworld): optimize configuration templates and prompt engineering * chore(finworld): update launch scripts and add variant experiment scripts * feat(finworld): Added support for multi-machine, multi-GPU training scripts and configuration templates: * chore(git): ignore finworld/yaml/* * fix(metrics): Fix and enhance the compatibility and debugging output of the metrics update logic - Modified the `update_metrics` function, adding a `prefix` parameter to distinguish between training and validation metrics. - Adjusted the data source for extracting `reward_stats` and `tool_stats`, migrating from `workflow_metadata` to `log_metrics`. - Added debug printing to output the `log_metrics` content and metric key names at key steps for easier troubleshooting. - Used the appropriate prefix when calling `update_metrics` in `trainer_verl.py`, and added multiple debug prints. - Modified `WorkflowOutput` to place `tool_stats` and `reward_stats` into the `log_metrics` field. - Removed redundant and deprecated code for extracting `reward_stats` and calculation functions. - Added debug information output to the `finworld` and `finworld_judge` modules to track log metrics and scoring data. * fix(metrics): Remove debug prints and synchronize reward statistics - Removed debug print statements before and after the `update_metrics` call in `trainer_verl.py` - Removed debug print statements related to the `log_metrics` key in `finworld.py` - Removed debug print statements before updating `metadata_stats` in `finworld_judge.py` - Added logic in `general_runner.py` to synchronize `reward_stats` from `metadata` to `log_metrics` after the judge calculation - Cleaned up debug print statements within `update_metrics` in `metric_helper`, improving code readability. * chore: "Stop tracking existing yaml files in tutorial directory" * fix(task_runner): Synchronize reward_stats to log_metrics feat(tutorial): Added FinWorld multi-machine multi-GPU training startup script * refactor(script): Refactored the finworld training script, integrating configuration and startup processes. * Refactor(deep_finance): Replace and remove finworld-related implementations - Switched the example directory from example_finworld to example_deep_finance - Modified startup parameters and logic to support deep_finance, replacing the finworld option - Replaced finworld_reader with deep_finance_reader in the task reader - Adjusted environment client configuration in resource management, using deep_finance instead of finworld-related checks - Updated reward metric tool documentation to support deep_finance - Deleted finworld-related configuration files, scripts, code, and evaluation modules, cleaning up leftover files and scripts - Replaced the keyword "finworld" with "deep_finance" in comments and logs * refactor(deepfinance): Rename and unify DeepFinance module and config references - Replace all "finworld" and "deep_finance" names with the unified "deepfinance" format. - Modify command-line arguments to `--with-deepfinance` for consistency. - Adjust the class name in `task_reader` from `deep_financeReader` to `DeepFinanceReader`. - Update the documentation description and file name of the `metric_helper` module to DeepFinance. - Modify environment variables and configuration paths in the example script `deep_finance.sh` to use the `DEEPFINANCE` prefix. - Update `judge_protocol` to `DeepFinanceJudgeByOpenJudge` in the `deep_finance.yaml` configuration. - Refactor the `FinWorldJudgeByOpenJudge` class in `deep_finance_judge.py` to `DeepFinanceJudgeByOpenJudge`. - Rename the `FinworldReader` class in `deep_finance_reader.py` to `DeepFinanceReader`. - Modify the debug log identifier and corresponding environment variable name to `DEEPFINANCE_DEBUG`. - Update the evaluation protocol in the `deep_finance_template.yaml` template to `DeepFinanceJudgeByOpenJudge`. - Ensure that internal references and comments in all modules are updated to use DeepFinance and deepfinance-related names. * refactor(tutorial): Optimize dynamic generation logic for configuration file paths * fix(deep_finance): argparse: with-deepfinance * fix(tutorial): Fixed issues with multi-machine training environment variable settings * fix(env): Corrected the assignment logic for reward and info when returning environment state - Corrected the `env_output` return value structure in `BaseGymEnv` to ensure correct assignment of `reward` and `info` fields. - Removed `RefJudge` and `StructureJudge` related metric calculations and statistics from `reward_metric_helper`. - Cleaned up redundant code in `reward_metric_helper`, removing invalid comments and statistical items. - Modified `save_trajectory_as_json` to always print trajectory saving confirmation information. - Corrected log comments in `example_deep_finance` to avoid meaningless log output. - Added the `save_trajectory_as_json_file` configuration item to `deep_finance_template.yaml` to support trajectory saving functionality. * chore(config): Update example_deep_finance configuration and clean up files - Added a new ignore rule for config file paths in .gitignore - Deleted the automatically generated mcp_finance_tool_generated.json file in example_deep_finance - Refactored the deep_finance.yaml configuration file, adjusting project and experiment names - Reorganized Judge configuration, clarifying openjudge_llm and rm_llm models - Optimized model paths and training parameter configurations, adding parallel and batch processing settings - Adjusted data reading methods and training/validation set path placeholders - Reduced GPU memory usage ratio for rollout to 0.8 - Updated the default save directory path for the trainer to a placeholder variable - Cleaned up unused and commented-out code to improve configuration file conciseness * Refactor(metric): Optimize tool metric calculation and data saving logic - Corrected the data source field for timeline data used during trajectory saving. - Removed redundant fields in tool execution time, cache hit rate, and error rate statistics. - Updated .gitignore to add ignore rules for the example script directory. - Removed unnecessary debugging information from logs to reduce log noise. - Adjusted log printing in the multi-round interaction execution process to simplify output content. - Streamlined log code for environment observation and termination checks to improve code readability. * fix(metric_helper): fix tool cache metric * fix little bug * fix(utils): Suppress httpx AsyncClient.aclose() exception warnings * comments to english * feat: 支持服务名称前缀功能 - 在 launcher 中添加 --prefix 参数支持 - 在 pty_launch 函数中实现前缀逻辑 - 更新 deep_finance.sh 脚本以使用前缀功能 - 允许在同一环境中运行多个服务实例 * fix: 改进 MultiAgent 消息内容解析逻辑 - 支持 tool_result 格式的消息内容块 - 改进非文本内容的处理逻辑,继续处理其他项而非跳过整个消息 - 添加 tool_use 类型的处理(跳过,因为已通过 tool_calls 字段处理) - 优化代码结构和注释,提高可读性 * fix: 优化 DeepFinance 判断逻辑和配置 - 修复 tool_stats 提取逻辑,从 log_metrics 中正确获取数据 - 添加惩罚项调试信息输出 - 启用 tool calls 功能(force_disable_toolcalls: False) - 确保奖励计算准确性 * chore(deps): bump agentscope from 1.0.7 to 1.0.8 * fix(metric_helper): correct trajectory save path and add tool call metric - Change trajectory save directory from "ctx_trackers" to "trajectory" to organize files better - Add recording of tool call counts alongside error rates in tool metrics - Update experiment suffix in deep finance example script for clearer naming convention * revise message parsing --------- Co-authored-by: binary-husky <96192199+binary-husky@users.noreply.github.com> Co-authored-by: Qingxu Fu <qingxu.fu@outlook.com> Co-authored-by: qingxu.fu <fuqingxu.fqx@alibaba-inc.com>
1 parent 20e4296 commit 224560f

File tree

9 files changed

+31
-7
lines changed

9 files changed

+31
-7
lines changed

ajet/context_tracker/multiagent_tracking.py

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -82,13 +82,28 @@ def extract_text_content_from_content_dict(self, msg):
8282
# },
8383
# ],
8484
# }
85+
# or tool_result format?? not observed yet:
86+
# msg = {
87+
# "role": "tool",
88+
# "content": [
89+
# {
90+
# "type": "tool_result",
91+
# "id": "call_xxx",
92+
# "output": "tool output content",
93+
# "name": "tool_name"
94+
# },
95+
# ],
96+
# }
8597

8698
str_content = ""
8799
for item in msg["content"]:
88100
# item = {
89101
# "type": "text",
90102
# "text": "some text"
91103
# },
104+
item_type = item.get("type", "")
105+
assert not item_type == "tool_use", f"never observed such protocal yet"
106+
assert not item_type == "tool_result", f"never observed such protocal yet"
92107

93108
assert isinstance(item, dict), f"Unsupported non-dict item in message content: {item}. Full message: {msg}"
94109

ajet/launcher.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -99,6 +99,7 @@ def parse_args():
9999
default=False,
100100
help="Kill system processes (ray + vllm + python) that may block the current experiment",
101101
)
102+
parser.add_argument("--prefix", type=str, default="", required=False, help="Prefix for deepfinance service names")
102103
return parser.parse_args()
103104

104105

@@ -304,7 +305,7 @@ def main():
304305
pty_launch("appworld")
305306

306307
if args.with_deepfinance:
307-
pty_launch("deepfinance")
308+
pty_launch("deepfinance", prefix=args.prefix)
308309

309310
if args.with_crafters:
310311
pty_launch("crafters")

ajet/utils/metric_helper/save_trajectory_as_json.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ def save_trajectory_as_json(ctx_trackers, global_steps, prefix="train"):
4040
# Define save directory and file path
4141
traj_save_dir = os.path.join(
4242
os.environ.get("BEST_LOGGER_PATH", "launcher_record"),
43-
"ctx_trackers",
43+
"trajectory",
4444
prefix,
4545
f"step_{global_steps}"
4646
)

ajet/utils/metric_helper/tool_metric_helper.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -125,6 +125,7 @@ def compute_tool_metrics(tool_stats_list: List[Dict[str, Any]], prefix: str = ""
125125
if calls > 0:
126126
error_rate = errors / calls * 100
127127
metrics[f"{prefix}tool_error/{tool_name}/error_rate"] = round(error_rate, 2)
128+
metrics[f"{prefix}tool_error/{tool_name}/calls"] = calls
128129

129130

130131
return metrics

ajet/utils/pty.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -96,13 +96,15 @@ def pty_wrapper_final(human_cmd, dir, env_dict):
9696
pty_wrapper(["/bin/bash", "-c", human_cmd], dir, env_dict)
9797

9898

99-
def pty_launch(service_name: str, success_std_string="Starting server on"):
99+
def pty_launch(service_name: str, success_std_string="Starting server on", prefix: str=""):
100100
from ajet.utils.smart_daemon import LaunchCommandWhenAbsent
101101

102102
service_path = os.environ.get(f"{service_name.upper()}_PATH")
103103
service_script = os.environ.get(f"{service_name.upper()}_SCRIPT")
104104
if service_path is None or service_script is None:
105105
raise ValueError(f"Environment variables for {service_name} not properly set.")
106+
if prefix != "":
107+
service_name = prefix + "_" + service_name
106108
companion = LaunchCommandWhenAbsent(
107109
full_argument_list=[service_script],
108110
dir=service_path,

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ classifiers = [
1717
]
1818
requires-python = ">=3.10,<3.13"
1919
dependencies = [
20-
"agentscope==1.0.7",
20+
"agentscope==1.0.8",
2121
"chromadb",
2222
"httpx",
2323
"tenacity",

tutorial/example_deep_finance/deep_finance.sh

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ set -e
33
#===============================================================================
44
# 1. 配置区域 - 用户只需修改这里
55
#===============================================================================
6-
SUFFIX="ajet_deep_finance" # 实验后缀,影响所有日志和实验名称
6+
SUFFIX="deep_finance" # 实验后缀,影响所有日志和实验名称
77
PREFIX="open" # 实验前缀,影响日志和实验所在文件夹
88

99
# OpenJudge 模型配置
@@ -208,6 +208,7 @@ if [[ $HOSTNAME == *"-master-"* ]]; then
208208
--with-deepfinance \
209209
--conf ${CONFIG_FILE} \
210210
--backbone="verl" \
211+
--prefix=${SUFFIX} \
211212
2>&1 | tee ${TRAIN_LOG}
212213

213214

tutorial/example_deep_finance/deep_finance_judge.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -373,8 +373,12 @@ def compute_reward(self, workflow_task: WorkflowTask, workflow_output: WorkflowO
373373
fused_reward, contributions = self._fuse_grader_scores(grader_scores, rm_raw)
374374

375375
# 6. 计算惩罚项(保留原有的 tool_calls 惩罚逻辑)
376-
tool_calls = metadata.get("tool_stats", {}).get("total_calls", 0)
376+
# 从 log_metrics 中提取 tool_stats(deep_finance.py 将其放在 log_metrics 而非 metadata)
377+
tool_stats = workflow_output.log_metrics.get("tool_stats", {})
378+
tool_calls = tool_stats.get("total_calls", 0)
377379
penalty = self._compute_penalty(tool_calls)
380+
if penalty < 0:
381+
print(f"⚠️ Penalty applied: penalty={penalty}, tool_calls={tool_stats}")
378382

379383
# 7. 汇总
380384
final_reward = fused_reward + step_reward + penalty

tutorial/example_deep_finance/yaml_template/deep_finance_template.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ ajet:
3232
rollout:
3333
# ✨✨✨✨ 编写并选择Agent
3434
user_workflow: tutorial.example_deep_finance.deep_finance->ExampleDeepResearchProtocol
35-
force_disable_toolcalls: True
35+
force_disable_toolcalls: False
3636
enable_oversample: False
3737
tensor_model_parallel_size: 8
3838
num_repeat: {{NUM_REPEAT}}

0 commit comments

Comments
 (0)