feat: PoC for Multi Turn agent loop by bxyu-nvidia · Pull Request #1023 · NVIDIA-NeMo/Gym

bxyu-nvidia · 2026-04-07T20:27:36Z

No description provided.

Provides information and background on how llm-as-a-judge works, when to use it, and a brief walkthrough. --------- Signed-off-by: Frankie Siino <fsiino@nvidia.com>

Adds a new resources server for RDKit-based chemistry verification tasks. Includes sandbox launcher for sandbox code execution, YAML config, example data, and tests. Made with [Cursor](https://cursor.com) --------- Signed-off-by: Dane Corneil <dcorneil@nvidia.com> Co-authored-by: Christian Munley <cmunley@nvidia.com>

mmlu-pro: https://wandb.ai/nvidia/fsiino-gym-dev/runs/mi6p08ns 83.90957446808511 mmlu-prox: https://wandb.ai/nvidia/fsiino-gym-dev/runs/fxhaochj 70.33903109674858 --------- Signed-off-by: Frankie Siino <fsiino@nvidia.com>

1. Multi-process benchmark preparation 2. Various benchmark data preparation refactors 3. Refactor Nemotron 3 Ultra to be easier to use 4. Delete key directive 5. Improve dummy model config handling 6. Print config yaml when erroring on almost servers 7. Try fix progress bar print with tqdm (not successful) 8. Improve broken pipe print and handling behavior 9. Adopt Cascade eval numpy.isclose for float comparison 10. LocalVLLMModel accepts py_executable 11. LocalVLLMModelProxy --------- Signed-off-by: Brian Yu <bxyu@nvidia.com>

@esarafian

… modes (#1003) - Added a new NVARC resource server that supports two agent modes: transductive (outputs grid directly) and inductive (outputs Python code). - Implemented necessary configurations and request/response models for both modes. - Included a subprocess sandbox for executing Python code safely. - Added example datasets and a .gitignore for data files. - Comprehensive unit tests for grid parsing and code execution. Signed-off-by: Elad Sarafian <esarafian@nvidia.com> dataset: https://gitlab-master.nvidia.com/fsoares/post-training-data-processing/-/issues/103 Reopening #989 for @esarafian due to branch renaming. --------- Signed-off-by: Elad Sarafian <esarafian@nvidia.com> Co-authored-by: Elad Sarafian <esarafian@nvidia.com>

Signed-off-by: Brian Yu <bxyu@nvidia.com>

copy-pr-bot · 2026-04-07T20:27:40Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Signed-off-by: Brian Yu <bxyu@nvidia.com>

fsiino-nvidia and others added 9 commits April 3, 2026 12:45

docs: llm-as-a-judge (#926)

9ca3900

Provides information and background on how llm-as-a-judge works, when to use it, and a brief walkthrough. --------- Signed-off-by: Frankie Siino <fsiino@nvidia.com>

feat: mmlu_pro and mmlu_prox benchmarks (#988)

b7b3398

mmlu-pro: https://wandb.ai/nvidia/fsiino-gym-dev/runs/mi6p08ns 83.90957446808511 mmlu-prox: https://wandb.ai/nvidia/fsiino-gym-dev/runs/fxhaochj 70.33903109674858 --------- Signed-off-by: Frankie Siino <fsiino@nvidia.com>

use 1 repeat

ab46827

Signed-off-by: Brian Yu <bxyu@nvidia.com>

3 repeats

6dee9bd

Signed-off-by: Brian Yu <bxyu@nvidia.com>

commit

b556084

Signed-off-by: Brian Yu <bxyu@nvidia.com>

try template

81eb031

Signed-off-by: Brian Yu <bxyu@nvidia.com>

bxyu-nvidia added 3 commits April 7, 2026 13:37

add something about getting state

fe61847

Signed-off-by: Brian Yu <bxyu@nvidia.com>

maybe union for user

cee0a75

Signed-off-by: Brian Yu <bxyu@nvidia.com>

start add config

7ccc2b0

Signed-off-by: Brian Yu <bxyu@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: PoC for Multi Turn agent loop#1023

feat: PoC for Multi Turn agent loop#1023
bxyu-nvidia wants to merge 12 commits intocwing/multi-turn-agentfrom
bxyu/eval-dev

bxyu-nvidia commented Apr 7, 2026

Uh oh!

copy-pr-bot Bot commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

bxyu-nvidia commented Apr 7, 2026

Uh oh!

copy-pr-bot Bot commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants