feat: example multi turn env by cmunley1 · Pull Request #1332 · NVIDIA-NeMo/Gym

cmunley1 · 2026-05-15T05:10:53Z

Example multi turn environment with scripted follow ups, where each question depends on the last turn. Uses Gymnasium interface.

example input

{
  "responses_create_params": {
    "input": [
      {
        "role": "user",
        "content": "What programming language was created by Guido van Rossum?"
      }
    ]
  },
  "follow_ups": [
    "In what year was it first released?"
  ],
  "expected_answer": "1991",
  "agent_ref": {
    "type": "responses_api_agents",
    "name": "example_multi_turn_gymnasium_agent"
  }
}

design

class ExampleMultiTurnEnv(GymnasiumServer):
    session_turns: Dict[str, int] = Field(default_factory=dict)

    async def reset(self, metadata: dict, session_id: Optional[str] = None) -> tuple[Optional[str], dict]:
        """Returns (observation, info)."""
        self.session_turns[session_id] = 0
        return None, {}

    async def step(
        self, action: NeMoGymResponse, metadata: dict, session_id: Optional[str] = None
    ) -> tuple[Optional[str], float, bool, bool, dict]:
        """Returns (observation, reward, terminated, truncated, info)."""
        follow_ups = metadata.get("follow_ups", [])
        turn = self.session_turns.get(session_id, 0)

        if turn < len(follow_ups):
            self.session_turns[session_id] = turn + 1
            return follow_ups[turn], 0.0, False, False, {}

        expected = metadata.get("expected_answer", "")
        text = extract_text(action)
        reward = 1.0 if expected and expected.lower() in text.lower() else 0.0
        return None, reward, True, False, {}

Signed-off-by: cmunley1 <cmunley@nvidia.com>

Signed-off-by: Christian Munley <cmunley@nvidia.com>

copy-pr-bot · 2026-05-15T05:10:56Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: Christian Munley <cmunley@nvidia.com>

Signed-off-by: cmunley1 <cmunley@nvidia.com>

cmunley1 added 2 commits May 14, 2026 22:05

init

2c7a0e7

Signed-off-by: cmunley1 <cmunley@nvidia.com>

rename

9a0f29a

Signed-off-by: Christian Munley <cmunley@nvidia.com>

cmunley1 added 5 commits May 14, 2026 22:12

desc+val

260403a

Signed-off-by: Christian Munley <cmunley@nvidia.com>

update rollouts

a59ecc3

Signed-off-by: cmunley1 <cmunley@nvidia.com>

rename and docs

1b59383

Signed-off-by: cmunley1 <cmunley@nvidia.com>

readme

8892527

Signed-off-by: cmunley1 <cmunley@nvidia.com>

readme

05f06ee

Signed-off-by: cmunley1 <cmunley@nvidia.com>

cwing-nvidia added the usability improvements to user experience label May 18, 2026

cmunley1 requested review from adil-a and ananthsub May 20, 2026 05:01

Merge branch 'main' into cmunley1/example_multi_turn_gymnasium

bfe8f6f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: example multi turn env#1332

feat: example multi turn env#1332
cmunley1 wants to merge 8 commits into
mainfrom
cmunley1/example_multi_turn_gymnasium

cmunley1 commented May 15, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

cmunley1 commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

example input

design

Uh oh!

copy-pr-bot Bot commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cmunley1 commented May 15, 2026 •

edited

Loading