NVIDIA-NeMo
diff --git a/‎resources_servers/grl_tetris/README.md‎
Lines changed: 70 additions & 0 deletions b/‎resources_servers/grl_tetris/README.md‎
Lines changed: 70 additions & 0 deletions
diff --git a/‎resources_servers/grl_tetris/app.py‎
Lines changed: 226 additions & 0 deletions b/‎resources_servers/grl_tetris/app.py‎
Lines changed: 226 additions & 0 deletions
diff --git a/‎resources_servers/grl_tetris/configs/grl_tetris.yaml‎
Lines changed: 27 additions & 0 deletions b/‎resources_servers/grl_tetris/configs/grl_tetris.yaml‎
Lines changed: 27 additions & 0 deletions
diff --git a/‎resources_servers/grl_tetris/data/example.jsonl‎
Lines changed: 5 additions & 0 deletions b/‎resources_servers/grl_tetris/data/example.jsonl‎
Lines changed: 5 additions & 0 deletions
diff --git a/‎resources_servers/grl_tetris/data/example_metrics.json‎
Lines changed: 8 additions & 0 deletions b/‎resources_servers/grl_tetris/data/example_metrics.json‎
Lines changed: 8 additions & 0 deletions
@@ -0,0 +1,70 @@
+# GRL Tetris Resource Server
+
+FastAPI-based resource server that exposes the GRL Tetris environment through NeMo Gym conventions. The environment logic lives under `resources_servers/grl_tetris/tetris_env` and is a standalone adaptation of the upstream GRL implementation.
+
+## Why it exists
+- **Domain**: Classic falling-block Tetris on a configurable grid.
+- **Evaluation**: Agents must clear at least one line; `/verify` rewards the cumulative score only when the environment reports success.
+- **Independence**: No runtime dependency on the GRL repository—the environment is vendored and self-contained.
+
+## Setup
+
+Please follow the setup instructions as outlined in: https://github.com/NVIDIA-NeMo/Gym/blob/main/docs/tutorials/02-setup.md#step-1-clone-and-install.
+
+## Running
+Spin up the server alongside a compatible agent:
+```bash
+config_paths="responses_api_models/openai_model/configs/openai_model.yaml,\
+resources_servers/grl_tetris/configs/grl_tetris.yaml"
+ng_run "+config_paths=[$config_paths]"
+```
+
+Collect trajectories:
+```bash
+ng_collect_rollouts +agent_name=grl_tetris_simple_agent \
+    +input_jsonl_fpath=resources_servers/grl_tetris/data/example.jsonl \
+    +output_jsonl_fpath=resources_servers/grl_tetris/data/example_rollouts.jsonl \
+    +limit=5
+```
+
+Launch the rollout viewer:
+```bash
+ng_viewer +jsonl_fpath=resources_servers/grl_tetris/data/example_rollouts.jsonl
+```
+
+## Tests
+```bash
+pytest resources_servers/grl_tetris/tests
+```
+
+## Licensing
+- Code: Apache 2.0
+- Data: Apache 2.0
+
+---
+
+## Reward Profiling Results
+
+### Qwen3-4B
+
+**Dataset**: 3,200 rollouts (200 prompts × 16 repeats)
+
+**Performance Metrics**:
+- **Success Rate**: 5.09% (163/3,200 rollouts)
+- **Mean Reward**: -0.29 (range: -2.00 to 19.20)
+- **Median Reward**: -0.80
+
+**Key Findings**:
+- Most rollouts (21%) received reward of -0.90 (piece dropped without clearing lines)
+- Successful line clears achieved rewards of ~9.0-9.2
+- Average 7.48 tool calls per rollout
+- Weak negative correlation between tool calls and reward (-0.06)
+
+**Top Reward Distribution**:
+- `-0.9`: 672 rollouts (21.0%) - piece dropped, no line clear
+- `-0.8`: 603 rollouts (18.8%)
+- `-0.7`: 495 rollouts (15.5%)
+- `9.1`: 29 rollouts (0.9%) - successful line clear
+- `8.9`: 26 rollouts (0.8%)
+
+The relatively low success rate (5.09%) suggests that Tetris line-clearing is challenging for the model, requiring precise spatial reasoning and action sequencing. Most rollouts result in pieces dropping without clearing lines (negative rewards from -0.1 per action step).
@@ -0,0 +1,226 @@
+# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import annotations
+
+from dataclasses import dataclass, field
+from typing import Any, Dict, List, Optional, Union
+
+import numpy as np
+from fastapi import FastAPI, HTTPException, Request
+from pydantic import BaseModel, Field
+
+from nemo_gym.base_resources_server import (
+    BaseResourcesServerConfig,
+    BaseSeedSessionRequest,
+    BaseSeedSessionResponse,
+    BaseVerifyRequest,
+    BaseVerifyResponse,
+    SimpleResourcesServer,
+)
+from nemo_gym.server_utils import SESSION_ID_KEY, ServerClient
+from resources_servers.grl_tetris.tetris_env import TetrisEnv
+
+
+DEFAULT_GRID_LOOKUP = {0: "_", 1: "#", 2: "X"}
+DEFAULT_ACTION_LOOKUP = {0: "Left", 1: "Right", 2: "Down"}
+
+
+class GrlTetrisResourcesServerConfig(BaseResourcesServerConfig):
+    env_config: Dict[str, Any] = Field(
+        default_factory=lambda: {
+            "grid_lookup": DEFAULT_GRID_LOOKUP,
+            "action_lookup": DEFAULT_ACTION_LOOKUP,
+            "render_mode": "text",
+            "dim_x": 4,
+            "dim_y": 4,
+            "box_type": 3,
+        }
+    )
+
+
+class GrlTetrisSeedSessionRequest(BaseSeedSessionRequest):
+    seed: Optional[int] = None
+
+
+class GrlTetrisSeedSessionResponse(BaseSeedSessionResponse):
+    observation: str
+
+
+class GrlTetrisStepRequest(BaseModel):
+    actions: List[Union[str, int]] = Field(default_factory=list)
+
+
+class GrlTetrisStepTrace(BaseModel):
+    action_id: int
+    action_label: str
+    reward: float
+    done: bool
+    info: Dict[str, Any]
+
+
+class GrlTetrisStepResponse(BaseModel):
+    observation: str
+    reward: float
+    total_reward: float
+    done: bool
+    steps: List[GrlTetrisStepTrace]
+    history: List[GrlTetrisStepTrace] = Field(default_factory=list)
+
+
+class GrlTetrisVerifyResponse(BaseVerifyResponse):
+    success: bool
+
+
+@dataclass
+class TetrisSessionState:
+    env: Any
+    observation: str
+    total_reward: float = 0.0
+    done: bool = False
+    last_info: Dict[str, Any] = field(default_factory=dict)
+    history: List[GrlTetrisStepTrace] = field(default_factory=list)
+
+
+class GrlTetrisResourcesServer(SimpleResourcesServer):
+    config: GrlTetrisResourcesServerConfig
+    server_client: ServerClient
+    session_id_to_state: Dict[str, TetrisSessionState] = Field(default_factory=dict)
+
+    def setup_webserver(self) -> FastAPI:
+        app = super().setup_webserver()
+        app.post("/step")(self.step)
+        return app
+
+    def _create_env(self) -> TetrisEnv:
+        return TetrisEnv(self.config.env_config)
+
+    async def seed_session(self, request: Request, body: GrlTetrisSeedSessionRequest) -> GrlTetrisSeedSessionResponse:
+        session_id = request.session[SESSION_ID_KEY]
+        env = self._create_env()
+        observation = env.reset(seed=body.seed)
+
+        self.session_id_to_state[session_id] = TetrisSessionState(
+            env=env,
+            observation=observation,
+        )
+        return GrlTetrisSeedSessionResponse(observation=observation)
+
+    async def step(self, request: Request, body: GrlTetrisStepRequest) -> GrlTetrisStepResponse:
+        session_id = request.session.get(SESSION_ID_KEY)
+        if session_id is None or session_id not in self.session_id_to_state:
+            raise HTTPException(status_code=400, detail="Session not initialized. Call /seed_session first.")
+
+        session_state = self.session_id_to_state[session_id]
+        env = session_state.env
+
+        reverse_lookup = {label.lower(): idx for idx, label in env.ACTION_LOOKUP.items()}
+        total_step_reward = 0.0
+        steps: List[GrlTetrisStepTrace] = []
+
+        if session_state.done:
+            return GrlTetrisStepResponse(
+                observation=session_state.observation,
+                reward=0.0,
+                total_reward=session_state.total_reward,
+                done=True,
+                steps=[],
+                history=list(session_state.history),
+            )
+
+        for action in body.actions:
+            action_id = self._parse_action(action, reverse_lookup)
+            if action_id not in env.ACTION_LOOKUP:
+                raise HTTPException(status_code=400, detail=f"Invalid action identifier: {action}")
+
+            next_obs, reward, done, info = env.step(action_id)
+            info = self._to_python_types(info)
+            total_step_reward += reward
+            session_state.total_reward += reward
+            session_state.observation = next_obs
+            session_state.last_info = info
+            session_state.done = bool(done)
+
+            step = GrlTetrisStepTrace(
+                action_id=action_id,
+                action_label=env.ACTION_LOOKUP[action_id],
+                reward=reward,
+                done=session_state.done,
+                info=info,
+            )
+            session_state.history.append(step)
+            steps.append(step)
+
+            if session_state.done:
+                break
+
+        return GrlTetrisStepResponse(
+            observation=session_state.observation,
+            reward=total_step_reward,
+            total_reward=session_state.total_reward,
+            done=session_state.done,
+            steps=steps,
+            history=list(session_state.history),
+        )
+
+    async def verify(self, request: Request, body: BaseVerifyRequest) -> GrlTetrisVerifyResponse:
+        session_id = request.session.get(SESSION_ID_KEY)
+        session_state = self.session_id_to_state.get(session_id)
+
+        success = False
+        reward = 0.0
+        if session_state is not None:
+            success = bool(session_state.last_info.get("success"))
+            reward = session_state.total_reward
+
+        if session_id in self.session_id_to_state:
+            try:
+                session_state.env.close()  # type: ignore[union-attr]
+            except Exception:  # pragma: no cover - defensive cleanup
+                pass
+            del self.session_id_to_state[session_id]
+
+        return GrlTetrisVerifyResponse(
+            **body.model_dump(),
+            reward=reward,
+            success=success,
+        )
+
+    @staticmethod
+    def _parse_action(action: Union[str, int], reverse_lookup: Dict[str, int]) -> int:
+        if isinstance(action, int):
+            return action
+
+        candidate = action.strip()
+        lower_candidate = candidate.lower()
+        if lower_candidate in reverse_lookup:
+            return reverse_lookup[lower_candidate]
+
+        try:
+            return int(candidate)
+        except ValueError as exc:
+            raise HTTPException(status_code=400, detail=f"Unable to parse action: {action}") from exc
+
+    @staticmethod
+    def _to_python_types(obj: Any) -> Any:
+        if isinstance(obj, dict):
+            return {k: GrlTetrisResourcesServer._to_python_types(v) for k, v in obj.items()}
+        if isinstance(obj, list):
+            return [GrlTetrisResourcesServer._to_python_types(v) for v in obj]
+        if isinstance(obj, np.generic):
+            return obj.item()
+        return obj
+
+
+if __name__ == "__main__":
+    GrlTetrisResourcesServer.run_webserver()
@@ -0,0 +1,27 @@
+grl_tetris_resources_server:
+  resources_servers:
+    grl_tetris:
+      entrypoint: app.py
+      domain: games
+      verified: false
+grl_tetris_simple_agent:
+  responses_api_agents:
+    simple_agent:
+      entrypoint: app.py
+      max_steps: 10
+      resources_server:
+        type: resources_servers
+        name: grl_tetris_resources_server
+      model_server:
+        type: responses_api_models
+        name: policy_model
+      datasets:
+      - name: example
+        type: example
+        jsonl_fpath: resources_servers/grl_tetris/data/example.jsonl
+        num_repeats: 1
+        gitlab_identifier:
+          dataset_name: grl_tetris
+          version: 0.0.1
+          artifact_fpath: example.jsonl
+        license: Apache 2.0
@@ -0,0 +1,5 @@
+{"game_id": 1, "seed": 93810, "dim_board": [5, 5], "box_type": 0, "responses_create_params": {"input": [{"role": "developer", "content": "You are a Tetris-playing assistant. IMPORTANT: First call the `step` tool with an empty array [] to see the initial board state and active piece. Example: step({\"actions\": []}). The tool will return an ASCII board using '_' for empty cells, '#' for locked blocks, and 'X' for the active piece. Then continue calling `step` with valid actions (Left, Right, Down) until you clear a line or the board locks out. At the end, respond with <answer>Action1 || Action2 || ...</answer> summarizing all moves you made."}, {"role": "user", "content": "Call the step tool to see the board, then play Tetris to clear at least one line if possible."}], "tools": [{"name": "step", "type": "function", "description": "Execute Tetris moves sequentially. Call with empty array [] to see current board state without moving.", "strict": true, "parameters": {"type": "object", "properties": {"actions": {"type": "array", "items": {"type": "string"}, "description": "Sequence of actions, e.g. ['Left', 'Down']. Use empty array [] to view current state."}}, "required": ["actions"], "additionalProperties": false}}]}}
+{"game_id": 2, "seed": 46185, "dim_board": [4, 6], "box_type": 1, "responses_create_params": {"input": [{"role": "developer", "content": "You are a Tetris-playing assistant. IMPORTANT: First call the `step` tool with an empty array [] to see the initial board state and active piece. Example: step({\"actions\": []}). The tool will return an ASCII board using '_' for empty cells, '#' for locked blocks, and 'X' for the active piece. Then continue calling `step` with valid actions (Left, Right, Down) until you clear a line or the board locks out. At the end, respond with <answer>Action1 || Action2 || ...</answer> summarizing all moves you made."}, {"role": "user", "content": "Call the step tool to see the board, then play Tetris to clear at least one line if possible."}], "tools": [{"name": "step", "type": "function", "description": "Execute Tetris moves sequentially. Call with empty array [] to see current board state without moving.", "strict": true, "parameters": {"type": "object", "properties": {"actions": {"type": "array", "items": {"type": "string"}, "description": "Sequence of actions, e.g. ['Left', 'Down']. Use empty array [] to view current state."}}, "required": ["actions"], "additionalProperties": false}}]}}
+{"game_id": 3, "seed": 28563, "dim_board": [5, 5], "box_type": 0, "responses_create_params": {"input": [{"role": "developer", "content": "You are a Tetris-playing assistant. IMPORTANT: First call the `step` tool with an empty array [] to see the initial board state and active piece. Example: step({\"actions\": []}). The tool will return an ASCII board using '_' for empty cells, '#' for locked blocks, and 'X' for the active piece. Then continue calling `step` with valid actions (Left, Right, Down) until you clear a line or the board locks out. At the end, respond with <answer>Action1 || Action2 || ...</answer> summarizing all moves you made."}, {"role": "user", "content": "Call the step tool to see the board, then play Tetris to clear at least one line if possible."}], "tools": [{"name": "step", "type": "function", "description": "Execute Tetris moves sequentially. Call with empty array [] to see current board state without moving.", "strict": true, "parameters": {"type": "object", "properties": {"actions": {"type": "array", "items": {"type": "string"}, "description": "Sequence of actions, e.g. ['Left', 'Down']. Use empty array [] to view current state."}}, "required": ["actions"], "additionalProperties": false}}]}}
+{"game_id": 4, "seed": 87808, "dim_board": [6, 5], "box_type": 0, "responses_create_params": {"input": [{"role": "developer", "content": "You are a Tetris-playing assistant. IMPORTANT: First call the `step` tool with an empty array [] to see the initial board state and active piece. Example: step({\"actions\": []}). The tool will return an ASCII board using '_' for empty cells, '#' for locked blocks, and 'X' for the active piece. Then continue calling `step` with valid actions (Left, Right, Down) until you clear a line or the board locks out. At the end, respond with <answer>Action1 || Action2 || ...</answer> summarizing all moves you made."}, {"role": "user", "content": "Call the step tool to see the board, then play Tetris to clear at least one line if possible."}], "tools": [{"name": "step", "type": "function", "description": "Execute Tetris moves sequentially. Call with empty array [] to see current board state without moving.", "strict": true, "parameters": {"type": "object", "properties": {"actions": {"type": "array", "items": {"type": "string"}, "description": "Sequence of actions, e.g. ['Left', 'Down']. Use empty array [] to view current state."}}, "required": ["actions"], "additionalProperties": false}}]}}
+{"game_id": 5, "seed": 14453, "dim_board": [5, 5], "box_type": 1, "responses_create_params": {"input": [{"role": "developer", "content": "You are a Tetris-playing assistant. IMPORTANT: First call the `step` tool with an empty array [] to see the initial board state and active piece. Example: step({\"actions\": []}). The tool will return an ASCII board using '_' for empty cells, '#' for locked blocks, and 'X' for the active piece. Then continue calling `step` with valid actions (Left, Right, Down) until you clear a line or the board locks out. At the end, respond with <answer>Action1 || Action2 || ...</answer> summarizing all moves you made."}, {"role": "user", "content": "Call the step tool to see the board, then play Tetris to clear at least one line if possible."}], "tools": [{"name": "step", "type": "function", "description": "Execute Tetris moves sequentially. Call with empty array [] to see current board state without moving.", "strict": true, "parameters": {"type": "object", "properties": {"actions": {"type": "array", "items": {"type": "string"}, "description": "Sequence of actions, e.g. ['Left', 'Down']. Use empty array [] to view current state."}}, "required": ["actions"], "additionalProperties": false}}]}}
@@ -0,0 +1,8 @@
+{
+  "name": "example",
+  "type": "example",
+  "jsonl_fpath": "resources_servers/grl_tetris/data/example.jsonl",
+  "gitlab_identifier": null,
+  "license": "Apache 2.0",
+  "Number of examples": 5
+}