feat: integrate FinWorldJudge with OpenJudge support & add project blogs by TaoShuchang · Pull Request #21 · modelscope/AgentJet

TaoShuchang · 2026-04-06T15:38:47Z

Description

This Pull Request introduces the FinWorldJudgeByOpenJudge protocol to enhance the automated evaluation capabilities of AgentJet in financial scenarios. Additionally, it includes comprehensive documentation updates, including bilingual blogs and an improved README to better guide users and contributors.

Key Changes

1. Core Logic & Evaluation

New Judge Protocol: Implemented FinWorldJudgeByOpenJudge, leveraging the openjudge framework to provide more nuanced and reliable scoring for financial agent tasks.
Environment Integration: Seamlessly integrated the new judge into the evaluation pipeline, ensuring compatibility with existing financial benchmarks (FinWorld).
Dependency Update: Added openjudge to the project requirements to support the new evaluation backend.

2. Documentation & Community

Bilingual Blogs: Added both English and Chinese technical blogs detailing the design philosophy behind AgentJet and the implementation of the new judging mechanism.
README Enhancements:
- Updated the main README.md with clearer setup instructions.
- Added a dedicated section for the new FinWorldJudge protocol.
- Improved the overall project structure description for better developer onboarding.

3. Git Maintenance

Resolved merge conflicts between dev/shuchang_newjudge and the main branch to ensure a clean merge.

Type of Change

New Feature: Integration of FinWorldJudgeByOpenJudge.
Documentation: Added CN/EN blogs and updated README.
Refactoring: Conflict resolution and dependency management.

…on OpenJudge - Refactored reward_metric_helper, optimizing the data structure and statistical logic of OpenJudge and Finance Evaluator - Added the DeepFinanceJudgeByOpenJudge class to achieve unified calls and weighted fusion across multiple Graders - Supports both RM Gallery and Finance Evaluator as evaluation sources, enhancing evaluation dimensions - Asynchronously calls OpenJudge Runner, adding retry and error handling mechanisms - Implements cached loading of reference answers, improving RM Gallery evaluation efficiency - Added tool call penalty calculation, fusing step_reward and scores from each Grade - Added automatic saving of debug information when OpenJudge scores for each Grade are zero - Log recording and time consumption statistics cover the entire evaluation process, facilitating performance monitoring and troubleshooting

…tion

…dependent Model Configuration - Added a new OpenJudge-based `FinanceCompositionEvaluator` to replace the legacy implementation. - Implemented domain-based routing to direct requests to the appropriate set of graders, supporting multiple fields such as stock analysis and industry research. - Implemented an asynchronous pairwise evaluation interface that returns scores within the 0–1 range. - Enabled independent configuration for `finance_llm`; if not explicitly configured, the general `openjudge_llm` model is reused. - Cleaned up redundant imports and deprecated code within `DeepFinanceJudgeByOpenJudge`. - Updated `deep_finance_openjudge_template.yaml` to include documentation for the `finance_llm` option. - Refined the description of "evidence traceability" in `deep_finance.md`, renaming it to "Reference Logic Audit" and enhancing the details regarding the workflow and judgment criteria.

gemini-code-assist

Code Review

This pull request refactors the reward calculation and evaluation framework for the Finance Deep Research Agent, transitioning from the RM Gallery implementation to a more flexible OpenJudge-based FinanceCompositionEvaluator. It also updates the training infrastructure, configuration templates, and documentation to support this new evaluation approach. My feedback focuses on improving the robustness of the training scripts by removing hardcoded paths in favor of environment variables, fixing documentation errors, and cleaning up unused configuration templates.

gemini-code-assist · 2026-04-06T15:43:44Z

tutorial/example_deep_finance/deep_finance.sh

+export TRAIN_DATA_PATH="/mnt/data_cpfs/taoshuchang.tsc/deepresearch/AgentJet_new/tutorial/example_deep_finance/data/train_merged_all.json"
+export TRAIN_REF_ANS_PATH="/mnt/data_cpfs/taoshuchang.tsc/deepresearch/AgentJet_new/tutorial/example_deep_finance/data/Reference_merged_all.json"


The script hardcodes user-specific paths for TRAIN_DATA_PATH and TRAIN_REF_ANS_PATH. These paths will not work on other machines. The script already sources the .env file, which is the correct place for these configurations. Please remove these export lines to allow the values from the .env file to be used.

gemini-code-assist · 2026-04-06T15:43:44Z

tutorial/example_deep_finance/deep_finance.sh

 export RAY_CLUSTER_MODE="multi_node"
-export DEEPFINANCE_PATH="${ENV_SERVICE_ROOT}" # AgentJet 内部可能使用此路径
+export DEEPFINANCE_PATH="${ENV_SERVICE_ROOT}"
 export DEEPFINANCE_SCRIPT="source /mnt/data/taoshuchang.tsc/anaconda3/etc/profile.d/conda.sh && conda activate finworld_1209  && cd ${ENV_SERVICE_ROOT} && DEEPFINANCE_TOOL_RESULT_MAX_CHARS=${DEEPFINANCE_TOOL_RESULT_MAX_CHARS} DEEPFINANCE_MCP_CONFIG=${DEEPFINANCE_MCP_CONFIG} CACHE_TYPE=${CACHE_TYPE} MONGO_URI=${MONGO_URI} MONGO_DB_NAME=${MONGO_DB_NAME} MONGO_COLLECTION_NAME=${MONGO_COLLECTION_NAME} python -m env_service.env_service --env finworld --portal 0.0.0.0 --port 8080"


The DEEPFINANCE_SCRIPT variable contains a hardcoded, user-specific path to conda.sh (/mnt/data/taoshuchang.tsc/anaconda3/etc/profile.d/conda.sh). This will fail on any other developer's machine. The .env_sample file already defines a CONDA_PATH variable for this purpose. Please use that variable here.

Suggested change

export DEEPFINANCE_SCRIPT="source /mnt/data/taoshuchang.tsc/anaconda3/etc/profile.d/conda.sh && conda activate finworld_1209 && cd ${ENV_SERVICE_ROOT} && DEEPFINANCE_TOOL_RESULT_MAX_CHARS=${DEEPFINANCE_TOOL_RESULT_MAX_CHARS} DEEPFINANCE_MCP_CONFIG=${DEEPFINANCE_MCP_CONFIG} CACHE_TYPE=${CACHE_TYPE} MONGO_URI=${MONGO_URI} MONGO_DB_NAME=${MONGO_DB_NAME} MONGO_COLLECTION_NAME=${MONGO_COLLECTION_NAME} python -m env_service.env_service --env finworld --portal 0.0.0.0 --port 8080"

export DEEPFINANCE_SCRIPT="source ${CONDA_PATH} && conda activate finworld_1209 && cd ${ENV_SERVICE_ROOT} && DEEPFINANCE_TOOL_RESULT_MAX_CHARS=${DEEPFINANCE_TOOL_RESULT_MAX_CHARS} DEEPFINANCE_MCP_CONFIG=${DEEPFINANCE_MCP_CONFIG} CACHE_TYPE=${CACHE_TYPE} MONGO_URI=${MONGO_URI} MONGO_DB_NAME=${MONGO_DB_NAME} MONGO_COLLECTION_NAME=${MONGO_COLLECTION_NAME} python -m env_service.env_service --env finworld --portal 0.0.0.0 --port 8080"

gemini-code-assist · 2026-04-06T15:43:44Z

tutorial/example_deep_finance/deep_finance_single.sh

+export TRAIN_DATA_PATH="/mnt/data_cpfs/taoshuchang.tsc/deepresearch/AgentJet_new/tutorial/example_deep_finance/data/train_merged_all.json"
+export TRAIN_REF_ANS_PATH="/mnt/data_cpfs/taoshuchang.tsc/deepresearch/AgentJet_new/tutorial/example_deep_finance/data/Reference_merged_all.json"


The script hardcodes user-specific paths for TRAIN_DATA_PATH and TRAIN_REF_ANS_PATH. These paths will not work on other machines. The script already sources the .env file, which is the correct place for these configurations. Please remove these export lines to allow the values from the .env file to be used.

gemini-code-assist · 2026-04-06T15:43:45Z

tutorial/example_deep_finance/deep_finance_single.sh

-
+export PYTHONPATH="${AJET_ROOT}:${OPENJUDGE_ROOT}:${PYTHONPATH}"
+export DEEPFINANCE_PATH="${ENV_SERVICE_ROOT}"
+export DEEPFINANCE_SCRIPT="source /mnt/data/taoshuchang.tsc/anaconda3/etc/profile.d/conda.sh && conda activate finworld_1209  && cd ${ENV_SERVICE_ROOT} && DEEPFINANCE_TOOL_RESULT_MAX_CHARS=${DEEPFINANCE_TOOL_RESULT_MAX_CHARS} DEEPFINANCE_MCP_CONFIG=${DEEPFINANCE_MCP_CONFIG} CACHE_TYPE=${CACHE_TYPE} MONGO_URI=${MONGO_URI} MONGO_DB_NAME=${MONGO_DB_NAME} MONGO_COLLECTION_NAME=${MONGO_COLLECTION_NAME} python -m env_service.env_service --env finworld --portal 0.0.0.0 --port 8080"


The DEEPFINANCE_SCRIPT variable contains a hardcoded, user-specific path to conda.sh (/mnt/data/taoshuchang.tsc/anaconda3/etc/profile.d/conda.sh). This will fail on any other developer's machine. The .env_sample file already defines a CONDA_PATH variable for this purpose. Please use that variable here.

Suggested change

export DEEPFINANCE_SCRIPT="source /mnt/data/taoshuchang.tsc/anaconda3/etc/profile.d/conda.sh && conda activate finworld_1209 && cd ${ENV_SERVICE_ROOT} && DEEPFINANCE_TOOL_RESULT_MAX_CHARS=${DEEPFINANCE_TOOL_RESULT_MAX_CHARS} DEEPFINANCE_MCP_CONFIG=${DEEPFINANCE_MCP_CONFIG} CACHE_TYPE=${CACHE_TYPE} MONGO_URI=${MONGO_URI} MONGO_DB_NAME=${MONGO_DB_NAME} MONGO_COLLECTION_NAME=${MONGO_COLLECTION_NAME} python -m env_service.env_service --env finworld --portal 0.0.0.0 --port 8080"

export DEEPFINANCE_SCRIPT="source ${CONDA_PATH} && conda activate finworld_1209 && cd ${ENV_SERVICE_ROOT} && DEEPFINANCE_TOOL_RESULT_MAX_CHARS=${DEEPFINANCE_TOOL_RESULT_MAX_CHARS} DEEPFINANCE_MCP_CONFIG=${DEEPFINANCE_MCP_CONFIG} CACHE_TYPE=${CACHE_TYPE} MONGO_URI=${MONGO_URI} MONGO_DB_NAME=${MONGO_DB_NAME} MONGO_COLLECTION_NAME=${MONGO_COLLECTION_NAME} python -m env_service.env_service --env finworld --portal 0.0.0.0 --port 8080"

gemini-code-assist · 2026-04-06T15:43:45Z

tutorial/example_deep_finance/blog_cn.md

+| **model**                       | **finance**       | **others** | **overall**           |             |                   |                   |         |                       |             |                   |                   |         |                       |             |                   |
+| ------------------------------- | ----------------- | ---------- | --------------------- | ----------- | ----------------- | ----------------- | ------- | --------------------- | ----------- | ----------------- | ----------------- | ------- | --------------------- | ----------- | ----------------- |
+|                                 | comprehensiveness | insight    | instruction_following | readability | **overall_score** | comprehensiveness | insight | instruction_following | readability | **overall_score** | comprehensiveness | insight | instruction_following | readability | **overall_score** |
+| **Qwen3-30B-A3B-Instruct-2507** | 0.181             | 0.169      | 0.191                 | 0.211       | 0.184             | 0.112             | 0.111   | 0.117                 | 0.137       | 0.118             | 0.122             | 0.119   | 0.128                 | 0.148       | 0.127             |
+| **Tongyi DeepResearch**         | 0.291             | 0.282      | 0.316                 | 0.313       | 0.296             | 0.270             | 0.260   | 0.289                 | 0.290       | 0.274             | 0.273             | 0.263   | 0.293                 | 0.293       | 0.277             |
+| **Claude 3.7**                  | 0.404             | 0.398      | 0.465                 | 0.416       | 0.417             | 0.412             | 0.406   | 0.462                 | 0.417       | 0.423             | 0.411             | 0.405   | 0.462                 | 0.417       | 0.422             |
+| **Ours**                        | 0.476             | 0.472      | 0.488                 | 0.487       | 0.479             | 0.470             | 0.470   | 0.485                 | 0.484       | 0.475             | 0.471             | 0.471   | 0.485                 | 0.484       | **0.476**         |


The current markdown table is very wide and difficult to read due to the attempt to simulate colspan for headers. This is not standard in markdown and may render poorly in some viewers. For better readability and correctness, I suggest restructuring the table into a 'long' format.

Here is an example of a more conventional and readable structure:

| Model | Category | Comprehensiveness | Insight | Instruction Following | Readability | Overall Score | | ------------------------------- | -------- | ----------------- | ------- | --------------------- | ----------- | ------------- | | **Qwen3-30B-A3B-Instruct-2507** | finance | 0.181 | 0.169 | 0.191 | 0.211 | 0.184 | | | others | 0.112 | 0.111 | 0.117 | 0.137 | 0.118 | | | overall | 0.122 | 0.119 | 0.128 | 0.148 | 0.127 | | **Tongyi DeepResearch** | finance | 0.291 | 0.282 | 0.316 | 0.313 | 0.296 | | | others | 0.270 | 0.260 | 0.289 | 0.290 | 0.274 | | | overall | 0.273 | 0.263 | 0.293 | 0.293 | 0.277 | | **Claude 3.7** | finance | 0.404 | 0.398 | 0.465 | 0.416 | 0.417 | | | others | 0.412 | 0.406 | 0.462 | 0.417 | 0.423 | | | overall | 0.411 | 0.405 | 0.462 | 0.417 | 0.422 | | **Ours** | finance | 0.476 | 0.472 | 0.488 | 0.487 | 0.479 | | | others | 0.470 | 0.470 | 0.485 | 0.484 | 0.475 | | | overall | 0.471 | 0.471 | 0.485 | 0.484 | **0.476** |

gemini-code-assist · 2026-04-06T15:43:45Z

tutorial/example_deep_finance/blog_en.md

+
+1. Xie, Q., et al. (2024). *FinBen: A Holistic Financial Benchmark for Large Language Models*. arXiv:2402.12659.  
+2. Du, M., et al. (2025). *DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents*. arXiv:2506.11763.
+3. FInance Tool API：[https://basic.10jqka.com.cn/](https://basic.10jqka.com.cn/601899/equity.html#stockpage)


There is a typo in "FInance". It should be "Finance".

Suggested change

3. FInance Tool API：[https://basic.10jqka.com.cn/](https://basic.10jqka.com.cn/601899/equity.html#stockpage)

3. Finance Tool API：[https://basic.10jqka.com.cn/](https://basic.10jqka.com.cn/601899/equity.html#stockpage)

gemini-code-assist · 2026-04-06T15:43:45Z

tutorial/example_deep_finance/README.md

-cd /path/to/AgentJet
-bash install.sh # TODO：把这部分缩减到一个install：https://yuque.alibaba-inc.com/bayotg/wxz7sb/qdesuu33621x2yhi
+# 安装ajet请使用uv
+git clone -b dev/shuchang_newjudge https://github.com/modelscope/AgentJet.git


The tutorial instructs users to clone a specific development branch (dev/shuchang_newjudge). This is not ideal for documentation, as development branches can be temporary, rebased, or deleted, which would break the instructions for future users. It's better to point to the main branch or a stable release tag.

Suggested change

git clone -b dev/shuchang_newjudge https://github.com/modelscope/AgentJet.git

git clone https://github.com/modelscope/AgentJet.git

gemini-code-assist · 2026-04-06T15:43:45Z

tutorial/example_deep_finance/README.md

-| `EBTU_WEIGHT`                 | 0.0    | 证据溯源权重（可选启用）              |
-| `AUDIT_WEIGHT`                | 0.0    | 引用逻辑审计权重（可选启用）          |
+```bash
+bash AgentJet/tutorial/example_deep_finance/deep_finance.sh


The path in this command is incorrect. The preceding instructions have the user cd into the AgentJet directory. Therefore, the AgentJet/ prefix in the path is redundant and will cause the command to fail.

Suggested change

bash AgentJet/tutorial/example_deep_finance/deep_finance.sh

bash tutorial/example_deep_finance/deep_finance.sh

gemini-code-assist · 2026-04-06T15:43:45Z

tutorial/example_deep_finance/yaml_template/deep_finance_openjudge_template.yaml

+# ------------------ OpenJudge Finance 配置 ------------------
+# 注意：Finance 评估现在使用 OpenJudge FinanceCompositionEvaluator
+# finance_llm 可单独配置，若未设置则复用 openjudge_llm
+ajet:
+  project_name: "{{PREFIX}}"
+  experiment_name: "{{SUFFIX}}"
+  # Judge 配置（嵌套结构，对应 self.config.ajet.judge.*）
+  judge:
+    openjudge_llm: {{OPENJUDGE_LLM}}     # OpenJudge 模型（用于通用评估）
+    finance_llm: {{FINANCE_LLM}}         # Finance 评估专用模型（可选，留空则复用 openjudge_llm）
+    concurrency: {{JUDGE_CONCURRENCY}}   # Judge 并发数
+    train_ref_ans_path: {{TRAIN_REF_ANS_PATH}}   # 训练集 Reference Answer 路径
+    val_ref_ans_path: {{VAL_REF_ANS_PATH}}       # 验证集 Reference Answer 路径
+  # 权重配置
+  # rm_weight: Finance 评估权重（使用 FinanceCompositionEvaluator，支持 stock_analysis/industry/macro/event/search）
+  rm_weight: {{RM_WEIGHT}}
+  presentation_quality_weight: {{PRESENTATION_QUALITY_WEIGHT}}   # 报告呈现质量评估
+  grounding_weight: {{GROUNDING_WEIGHT}}                         # 引用规范性评估
+  cgcv_weight: {{CGCV_WEIGHT}}                                   # Citation-Grounded Claim Verification
+  audit_weight: {{AUDIT_WEIGHT}}                                 # 引用逻辑审计
+  traceability_weight: {{TRACEABILITY_WEIGHT}}                   # 可追溯性/可核验性审计 (TVR)
+  ebtu_weight: {{EBTU_WEIGHT}}                                   # EBTU证据优先可追溯性审计


This YAML template file appears to be unused. The training scripts (deep_finance.sh and deep_finance_single.sh) use deepfinance_template.yaml instead. Furthermore, this file contains placeholders like {{CGCV_WEIGHT}}, {{TRACEABILITY_WEIGHT}}, and {{EBTU_WEIGHT}} which are no longer defined or substituted in the shell scripts. If this file were to be used, it would cause a configuration parsing error. To avoid confusion and prevent future errors, it's best to remove this file from the repository.

- Revise AgentJet installation steps with detailed commands and environment setup - Add installation and startup guide for Finance MCP service with API key notes - Organize README sections: dependencies, service startup, environment variables, training - Add and standardize MAX_RESPONSE_LENGTH variable in deep_finance.sh and deep_finance_single.sh - Improve script root directory detection and default variable settings - Modify YAML template to use dynamic max_response_length configuration value

TaoShuchang added 5 commits February 28, 2026 15:26

refactor(deep_finance): remove DeepFinanceJudgeByOpenJudge implementa…

ffb1f80

…tion

feat(deepfinance): add blog and readme

87d2184

Merge branch 'main' into dev/shuchang_newjudge

4ccc006

gemini-code-assist bot reviewed Apr 6, 2026

View reviewed changes

TaoShuchang added 6 commits April 6, 2026 23:44

chore: fix bug

355afc1

chore: add a little readme

b469576

chore(deepfinance): change name to README

64ffc4e

chore(deepfinance): change a little readme

7be7d6d

docs(tutorial): fix env file name in example_deep_finance README

88abfc1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: integrate FinWorldJudge with OpenJudge support & add project blogs#21

feat: integrate FinWorldJudge with OpenJudge support & add project blogs#21
TaoShuchang wants to merge 11 commits intomainfrom
dev/shuchang_newjudge

TaoShuchang commented Apr 6, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 6, 2026

Uh oh!

gemini-code-assist bot Apr 6, 2026

Uh oh!

gemini-code-assist bot Apr 6, 2026

Uh oh!

gemini-code-assist bot Apr 6, 2026

Uh oh!

gemini-code-assist bot Apr 6, 2026

Uh oh!

gemini-code-assist bot Apr 6, 2026

Uh oh!

gemini-code-assist bot Apr 6, 2026

Uh oh!

gemini-code-assist bot Apr 6, 2026

Uh oh!

gemini-code-assist bot Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		export TRAIN_DATA_PATH="/mnt/data_cpfs/taoshuchang.tsc/deepresearch/AgentJet_new/tutorial/example_deep_finance/data/train_merged_all.json"
		export TRAIN_REF_ANS_PATH="/mnt/data_cpfs/taoshuchang.tsc/deepresearch/AgentJet_new/tutorial/example_deep_finance/data/Reference_merged_all.json"

	3. FInance Tool API：[https://basic.10jqka.com.cn/](https://basic.10jqka.com.cn/601899/equity.html#stockpage)
	3. Finance Tool API：[https://basic.10jqka.com.cn/](https://basic.10jqka.com.cn/601899/equity.html#stockpage)

	git clone -b dev/shuchang_newjudge https://github.com/modelscope/AgentJet.git
	git clone https://github.com/modelscope/AgentJet.git

	bash AgentJet/tutorial/example_deep_finance/deep_finance.sh
	bash tutorial/example_deep_finance/deep_finance.sh

Conversation

TaoShuchang commented Apr 6, 2026

Description

Key Changes

1. Core Logic & Evaluation

2. Documentation & Community

3. Git Maintenance

Type of Change

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant