Skip to content

feat: PoC for Multi Turn agent loop#1023

Draft
bxyu-nvidia wants to merge 12 commits intocwing/multi-turn-agentfrom
bxyu/eval-dev
Draft

feat: PoC for Multi Turn agent loop#1023
bxyu-nvidia wants to merge 12 commits intocwing/multi-turn-agentfrom
bxyu/eval-dev

Conversation

@bxyu-nvidia
Copy link
Copy Markdown
Contributor

No description provided.

fsiino-nvidia and others added 9 commits April 3, 2026 12:45
Provides information and background on how llm-as-a-judge works, when to
use it, and a brief walkthrough.

---------

Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Adds a new resources server for RDKit-based chemistry verification
tasks. Includes sandbox launcher for sandbox code execution, YAML
config, example data, and tests.

Made with [Cursor](https://cursor.com)

---------

Signed-off-by: Dane Corneil <dcorneil@nvidia.com>
Co-authored-by: Christian Munley <cmunley@nvidia.com>
mmlu-pro: https://wandb.ai/nvidia/fsiino-gym-dev/runs/mi6p08ns
83.90957446808511

mmlu-prox: https://wandb.ai/nvidia/fsiino-gym-dev/runs/fxhaochj
70.33903109674858

---------

Signed-off-by: Frankie Siino <fsiino@nvidia.com>
1. Multi-process benchmark preparation
2. Various benchmark data preparation refactors
3. Refactor Nemotron 3 Ultra to be easier to use
4. Delete key directive
5. Improve dummy model config handling
6. Print config yaml when erroring on almost servers
7. Try fix progress bar print with tqdm (not successful)
8. Improve broken pipe print and handling behavior
9. Adopt Cascade eval numpy.isclose for float comparison
10. LocalVLLMModel accepts py_executable
11. LocalVLLMModelProxy

---------

Signed-off-by: Brian Yu <bxyu@nvidia.com>
… modes (#1003)

- Added a new NVARC resource server that supports two agent modes:
transductive (outputs grid directly) and inductive (outputs Python
code).
- Implemented necessary configurations and request/response models for
both modes.
- Included a subprocess sandbox for executing Python code safely.
- Added example datasets and a .gitignore for data files.
- Comprehensive unit tests for grid parsing and code execution.

Signed-off-by: Elad Sarafian <esarafian@nvidia.com>

dataset:

https://gitlab-master.nvidia.com/fsoares/post-training-data-processing/-/issues/103

Reopening #989 for @esarafian due to branch renaming.

---------

Signed-off-by: Elad Sarafian <esarafian@nvidia.com>
Co-authored-by: Elad Sarafian <esarafian@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 7, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants