Skip to content

Commit ef2e6d8

Browse files
committed
Merge remote-tracking branch 'origin/main' into kajalj/transformers-5x
2 parents eea73d0 + a645786 commit ef2e6d8

26 files changed

Lines changed: 3186 additions & 468 deletions
Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
# Finance SEC Search
2+
3+
50-question financial information retrieval benchmark from the
4+
[Vals AI finance-agent](https://github.com/vals-ai/finance-agent) public
5+
dataset. Questions cover SEC EDGAR filings, financial metrics, and
6+
company analysis.
7+
8+
## Verification
9+
10+
Uses LLM-as-judge with a financial grading rubric (0/1/2 scale).
11+
Only fully correct answers (`[[2]]`) receive reward 1.0. The judge
12+
prompt and rubric are defined in the `finance_sec_search` resource
13+
server's `/prompt_templates`.
14+
15+
## Tools
16+
17+
| Tool | Description |
18+
|------|-------------|
19+
| `sec_filing_search` | Search SEC EDGAR for filing metadata by stock ticker symbol |
20+
| `parse_html_page` | Fetch and parse any HTML page (SEC URLs use disk cache), store under a key |
21+
| `retrieve_information` | Query stored documents via LLM prompt with `{{key}}` placeholders |
22+
| `submit_final_result` | Submit the final answer (required to receive a reward) |
23+
| `web_search` | Internet search via Tavily API (optional — requires `tavily_api_key` in `env.yaml`) |
24+
25+
## Data preparation
26+
27+
Without web search:
28+
29+
```bash
30+
ng_prepare_benchmark '+config_paths=[benchmarks/finance_sec_search/config_no_web_search.yaml]'
31+
```
32+
33+
With web search (requires `tavily_api_key` in `env.yaml`):
34+
35+
```bash
36+
ng_prepare_benchmark '+config_paths=[benchmarks/finance_sec_search/config_web_search.yaml]'
37+
```
38+
39+
Downloads `public.csv` from the Vals AI GitHub repo and writes benchmark
40+
JSONL to `data/`.
41+
42+
| Config | Output file |
43+
|--------|-------------|
44+
| `config_no_web_search.yaml` | `data/finance_sec_search_benchmark.jsonl` |
45+
| `config_web_search.yaml` | `data/finance_sec_search_benchmark_web_search.jsonl` |
46+
47+
## Running servers
48+
49+
```bash
50+
config_paths="responses_api_models/vllm_model/configs/vllm_model.yaml,\
51+
benchmarks/finance_sec_search/config_no_web_search.yaml"
52+
ng_run "+config_paths=[$config_paths]"
53+
```
54+
55+
## Collecting rollouts
56+
57+
```bash
58+
ng_collect_rollouts \
59+
+agent_name=finance_sec_search_benchmark_agent \
60+
+input_jsonl_fpath=benchmarks/finance_sec_search/data/finance_sec_search_benchmark.jsonl \
61+
+output_jsonl_fpath=results/finance_sec_search_rollouts.jsonl
62+
```
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# SPDX-License-Identifier: Apache-2.0
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# Chain to existing finance_sec_search resource server + agent config
2+
config_paths:
3+
- resources_servers/finance_sec_search/configs/finance_sec_search.yaml
4+
5+
# Defaults for judge model interpolation variables from finance_sec_search.yaml.
6+
# Override via env.yaml or CLI when running actual evaluations.
7+
search_judge_model_base_url: https://api.openai.com/v1
8+
search_judge_model_api_key: ""
9+
search_judge_model_name: gpt-4o
10+
11+
# Isolated copy of the resource server for this benchmark
12+
finance_sec_search_benchmark_resources_server:
13+
_inherit_from: finance_sec_search_resources_server
14+
15+
# Benchmark agent — inherits from finance_agent, overrides datasets
16+
finance_sec_search_benchmark_agent:
17+
_inherit_from: finance_agent
18+
responses_api_agents:
19+
finance_agent:
20+
resources_server:
21+
name: finance_sec_search_benchmark_resources_server
22+
datasets:
23+
- name: finance_sec_search
24+
type: benchmark
25+
jsonl_fpath: benchmarks/finance_sec_search/data/finance_sec_search_benchmark.jsonl
26+
prompt_config: null
27+
prepare_script: benchmarks/finance_sec_search/prepare.py
28+
num_repeats: 1
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
# Chain to existing finance_sec_search resource server + agent config
2+
config_paths:
3+
- resources_servers/finance_sec_search/configs/finance_sec_search.yaml
4+
5+
# Defaults for judge model interpolation variables from finance_sec_search.yaml.
6+
# Override via env.yaml or CLI when running actual evaluations.
7+
search_judge_model_base_url: https://api.openai.com/v1
8+
search_judge_model_api_key: ""
9+
search_judge_model_name: gpt-4o
10+
11+
# Isolated copy of the resource server for this benchmark
12+
finance_sec_search_web_search_benchmark_resources_server:
13+
_inherit_from: finance_sec_search_resources_server
14+
15+
# Benchmark agent — inherits from finance_agent, overrides datasets
16+
# Uses web_search variant (requires tavily_api_key in env.yaml)
17+
finance_sec_search_web_search_benchmark_agent:
18+
_inherit_from: finance_agent
19+
responses_api_agents:
20+
finance_agent:
21+
resources_server:
22+
name: finance_sec_search_web_search_benchmark_resources_server
23+
datasets:
24+
- name: finance_sec_search
25+
type: benchmark
26+
jsonl_fpath: benchmarks/finance_sec_search/data/finance_sec_search_benchmark_web_search.jsonl
27+
prompt_config: null
28+
prepare_script: benchmarks/finance_sec_search/prepare_web_search.py
29+
num_repeats: 1
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
*.jsonl

0 commit comments

Comments
 (0)