Skip to content

feat: example multi turn env#1332

Open
cmunley1 wants to merge 8 commits into
mainfrom
cmunley1/example_multi_turn_gymnasium
Open

feat: example multi turn env#1332
cmunley1 wants to merge 8 commits into
mainfrom
cmunley1/example_multi_turn_gymnasium

Conversation

@cmunley1
Copy link
Copy Markdown
Contributor

@cmunley1 cmunley1 commented May 15, 2026

Example multi turn environment with scripted follow ups, where each question depends on the last turn. Uses Gymnasium interface.

example input

{
  "responses_create_params": {
    "input": [
      {
        "role": "user",
        "content": "What programming language was created by Guido van Rossum?"
      }
    ]
  },
  "follow_ups": [
    "In what year was it first released?"
  ],
  "expected_answer": "1991",
  "agent_ref": {
    "type": "responses_api_agents",
    "name": "example_multi_turn_gymnasium_agent"
  }
}

design

class ExampleMultiTurnEnv(GymnasiumServer):
    session_turns: Dict[str, int] = Field(default_factory=dict)

    async def reset(self, metadata: dict, session_id: Optional[str] = None) -> tuple[Optional[str], dict]:
        """Returns (observation, info)."""
        self.session_turns[session_id] = 0
        return None, {}

    async def step(
        self, action: NeMoGymResponse, metadata: dict, session_id: Optional[str] = None
    ) -> tuple[Optional[str], float, bool, bool, dict]:
        """Returns (observation, reward, terminated, truncated, info)."""
        follow_ups = metadata.get("follow_ups", [])
        turn = self.session_turns.get(session_id, 0)

        if turn < len(follow_ups):
            self.session_turns[session_id] = turn + 1
            return follow_ups[turn], 0.0, False, False, {}

        expected = metadata.get("expected_answer", "")
        text = extract_text(action)
        reward = 1.0 if expected and expected.lower() in text.lower() else 0.0
        return None, reward, True, False, {}

cmunley1 added 2 commits May 14, 2026 22:05
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: Christian Munley <cmunley@nvidia.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 15, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

cmunley1 added 5 commits May 14, 2026 22:12
Signed-off-by: Christian Munley <cmunley@nvidia.com>
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: cmunley1 <cmunley@nvidia.com>
@cwing-nvidia cwing-nvidia added the usability improvements to user experience label May 18, 2026
@cmunley1 cmunley1 requested review from adil-a and ananthsub May 20, 2026 05:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

usability improvements to user experience

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants