Add GRL Sokoban Resource Server#564
Conversation
297a7f9 to
5491f8d
Compare
|
@cmunley1 @bxyu-nvidia this is the updated PR for the one that I closed (#261). |
|
can you reupload the dataset generation script or point to a huggingface dataset in the readme? I removed this by accident on my branch. otherwise this looks pretty good to me |
|
Just uploaded the dataset generation script. The generation script includes different kinds of configurations for room size and box numbers for reward profiling, but for the training validation, I just used 6x6 room configuration and 1 box. |
|
can you regenerate example.jsonl so it include this instruction in your latest dataset generation script? "IMPORTANT: First call the I run the existing example and get a response like Also, with updated example, I can run rollouts that look okay, but i do get lots of these messages in ng_run logs, do you see this too, is expected? |
|
@bxyu-nvidia can u please look at simple_agent changes? I tested offline rollouts with other resources servers, seems unaffected, but want to make sure with you. |
f1b3306 to
a19c376
Compare
|
@cmunley1 I've updated the ProblemThe SolutionUpdated the
The endpoint now automatically detects the format and normalizes it to the expected structure, making it more robust to handle cases where the model sends just the array directly. Hopefully this should eliminate the 422 errors you were seeing in the |
cmunley1
left a comment
There was a problem hiding this comment.
looks alright to me, tested offline rollouts, seems ok.
lets wait for @bxyu-nvidia to review simple_agent changes, at least
|
Considering refactoring this PR to fit into GymnasiumServer #1072, in case you are interested to help, or I may open a PR based on this one. |
|
Yeah I can help out |
055318d to
d2796f4
Compare
Signed-off-by: Yixin Huang <yixinhuang@Yixins-MacBook-Pro-2.local>
d2796f4 to
b8f2a16
Compare
|
@cmunley1 just refactored this PR to fit into the GymnasiumServer |
GymnasiumServer Refactor Notes (PR #1072 alignment)
resources_servers/grl_sokobantoresources_servers/base_gymnasium.GymnasiumServerlifecycle (/reset,/step) and session-aware env metadata handling.resources_servers/grl_sokoban/configs/grl_sokoban.yamlto useresponses_api_agents/gymnasium_agent.<action>...</action>format and generated example rollouts atresources_servers/grl_sokoban/data/example_rollouts.jsonl(5 rows).