Skip to content

Add GRL Sokoban Resource Server#564

Open
yixinhuang48 wants to merge 5 commits intoNVIDIA-NeMo:mainfrom
yixinhuang48:feature/grl-sokoban-final
Open

Add GRL Sokoban Resource Server#564
yixinhuang48 wants to merge 5 commits intoNVIDIA-NeMo:mainfrom
yixinhuang48:feature/grl-sokoban-final

Conversation

@yixinhuang48
Copy link
Copy Markdown
Collaborator

@yixinhuang48 yixinhuang48 commented Jan 9, 2026

GymnasiumServer Refactor Notes (PR #1072 alignment)

  • Refactored resources_servers/grl_sokoban to resources_servers/base_gymnasium.GymnasiumServer lifecycle (/reset, /step) and session-aware env metadata handling.
  • Updated resources_servers/grl_sokoban/configs/grl_sokoban.yaml to use responses_api_agents/gymnasium_agent.
  • Updated example prompt and generator for <action>...</action> format and generated example rollouts at resources_servers/grl_sokoban/data/example_rollouts.jsonl (5 rows).

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Jan 9, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@yixinhuang48
Copy link
Copy Markdown
Collaborator Author

yixinhuang48 commented Jan 9, 2026

@cmunley1 @bxyu-nvidia this is the updated PR for the one that I closed (#261).

@cmunley1
Copy link
Copy Markdown
Contributor

cmunley1 commented Jan 9, 2026

can you reupload the dataset generation script or point to a huggingface dataset in the readme? I removed this by accident on my branch.

otherwise this looks pretty good to me

@yixinhuang48
Copy link
Copy Markdown
Collaborator Author

yixinhuang48 commented Jan 9, 2026

Just uploaded the dataset generation script. The generation script includes different kinds of configurations for room size and box numbers for reward profiling, but for the training validation, I just used 6x6 room configuration and 1 box.

@cmunley1
Copy link
Copy Markdown
Contributor

cmunley1 commented Jan 9, 2026

can you regenerate example.jsonl so it include this instruction in your latest dataset generation script? "IMPORTANT: First call the step tool with an empty array [] to see the initial puzzle state. Example: step({"actions": []})"

I run the existing example and get a response like
"text": "I cannot solve the Sokoban puzzle without specific tool observations or the initial state of the puzzle. Please provide the details of the puzzle (e.g., layout, box positions, player position) so I can assist further.",

Also, with updated example, I can run rollouts that look okay, but i do get lots of these messages in ng_run logs, do you see this too, is expected?


INFO:     127.0.0.1:56362 - "POST /step HTTP/1.1" 422 Unprocessable Entity
Hit validation exception! Errors: [
    {
        "type": "model_attributes_type",
        "loc": [
            "body"
        ],
        "msg": "Input should be a valid dictionary or object to extract fields from",
        "input": [
            "Right"
        ]
    }
]
Full body: [
    "Right"
]

@cmunley1 cmunley1 requested a review from bxyu-nvidia January 9, 2026 22:34
@cmunley1
Copy link
Copy Markdown
Contributor

cmunley1 commented Jan 9, 2026

@bxyu-nvidia can u please look at simple_agent changes? I tested offline rollouts with other resources servers, seems unaffected, but want to make sure with you.

@yixinhuang48 yixinhuang48 force-pushed the feature/grl-sokoban-final branch 2 times, most recently from f1b3306 to a19c376 Compare January 10, 2026 00:07
@yixinhuang48
Copy link
Copy Markdown
Collaborator Author

@cmunley1 I've updated the example.jsonl and example_rollouts.jsonl files, and fixed the rollout message issue we encountered.

Problem

The /step endpoint was receiving 422 Unprocessable Entity errors when the model sometimes sent actions in the format ["Right"] instead of the expected {"actions": ["Right"]} format. This caused rollouts to fail with validation errors.

Solution

Updated the /step endpoint in app.py to handle both formats:

  • {"actions": ["Right"]} (expected format)
  • ["Right"] (array-only format that was causing errors)

The endpoint now automatically detects the format and normalizes it to the expected structure, making it more robust to handle cases where the model sends just the array directly. Hopefully this should eliminate the 422 errors you were seeing in the ng_run logs.

cmunley1
cmunley1 previously approved these changes Jan 10, 2026
Copy link
Copy Markdown
Contributor

@cmunley1 cmunley1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks alright to me, tested offline rollouts, seems ok.

lets wait for @bxyu-nvidia to review simple_agent changes, at least

@cmunley1 cmunley1 added the resources-server Resources servers (math, code, etc.) label Jan 10, 2026
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: cmunley1 <cmunley@nvidia.com>
@cmunley1
Copy link
Copy Markdown
Contributor

Considering refactoring this PR to fit into GymnasiumServer #1072, in case you are interested to help, or I may open a PR based on this one.

@yixinhuang48
Copy link
Copy Markdown
Collaborator Author

Yeah I can help out

Signed-off-by: Yixin Huang <yixinhuang@Yixins-MacBook-Pro-2.local>
@yixinhuang48 yixinhuang48 force-pushed the feature/grl-sokoban-final branch from d2796f4 to b8f2a16 Compare April 15, 2026 15:25
@yixinhuang48
Copy link
Copy Markdown
Collaborator Author

@cmunley1 just refactored this PR to fit into the GymnasiumServer

@yixinhuang48 yixinhuang48 requested a review from cmunley1 April 15, 2026 15:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

resources-server Resources servers (math, code, etc.)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants