Skip to content

Commit 5cae458

Browse files
committed
Add GRL Tetris resource server
- resources_servers/grl_tetris: environment, config, tests, data - Tetris game environment with step/verify endpoints - Example data and test examples generator Verified DCO and cryptographic signing. Signed-off-by: yixin <yixinhuang48@gmail.com>
1 parent 7a1374f commit 5cae458

10 files changed

Lines changed: 1038 additions & 0 deletions

File tree

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
# GRL Tetris Resource Server
2+
3+
FastAPI-based resource server that exposes the GRL Tetris environment through NeMo Gym conventions. The environment logic lives under `resources_servers/grl_tetris/tetris_env` and is a standalone adaptation of the upstream GRL implementation.
4+
5+
## Why it exists
6+
- **Domain**: Classic falling-block Tetris on a configurable grid.
7+
- **Evaluation**: Agents must clear at least one line; `/verify` rewards the cumulative score only when the environment reports success.
8+
- **Independence**: No runtime dependency on the GRL repository—the environment is vendored and self-contained.
9+
10+
## Setup
11+
12+
Please follow the setup instructions as outlined in: https://github.com/NVIDIA-NeMo/Gym/blob/main/docs/tutorials/02-setup.md#step-1-clone-and-install.
13+
14+
## Running
15+
Spin up the server alongside a compatible agent:
16+
```bash
17+
config_paths="responses_api_models/openai_model/configs/openai_model.yaml,\
18+
resources_servers/grl_tetris/configs/grl_tetris.yaml"
19+
ng_run "+config_paths=[$config_paths]"
20+
```
21+
22+
Collect trajectories:
23+
```bash
24+
ng_collect_rollouts +agent_name=grl_tetris_simple_agent \
25+
+input_jsonl_fpath=resources_servers/grl_tetris/data/example.jsonl \
26+
+output_jsonl_fpath=resources_servers/grl_tetris/data/example_rollouts.jsonl \
27+
+limit=5
28+
```
29+
30+
Launch the rollout viewer:
31+
```bash
32+
ng_viewer +jsonl_fpath=resources_servers/grl_tetris/data/example_rollouts.jsonl
33+
```
34+
35+
## Tests
36+
```bash
37+
pytest resources_servers/grl_tetris/tests
38+
```
39+
40+
## Licensing
41+
- Code: Apache 2.0
42+
- Data: Apache 2.0
43+
44+
---
45+
46+
## Reward Profiling Results
47+
48+
### Qwen3-4B
49+
50+
**Dataset**: 3,200 rollouts (200 prompts × 16 repeats)
51+
52+
**Performance Metrics**:
53+
- **Success Rate**: 5.09% (163/3,200 rollouts)
54+
- **Mean Reward**: -0.29 (range: -2.00 to 19.20)
55+
- **Median Reward**: -0.80
56+
57+
**Key Findings**:
58+
- Most rollouts (21%) received reward of -0.90 (piece dropped without clearing lines)
59+
- Successful line clears achieved rewards of ~9.0-9.2
60+
- Average 7.48 tool calls per rollout
61+
- Weak negative correlation between tool calls and reward (-0.06)
62+
63+
**Top Reward Distribution**:
64+
- `-0.9`: 672 rollouts (21.0%) - piece dropped, no line clear
65+
- `-0.8`: 603 rollouts (18.8%)
66+
- `-0.7`: 495 rollouts (15.5%)
67+
- `9.1`: 29 rollouts (0.9%) - successful line clear
68+
- `8.9`: 26 rollouts (0.8%)
69+
70+
The relatively low success rate (5.09%) suggests that Tetris line-clearing is challenging for the model, requiring precise spatial reasoning and action sequencing. Most rollouts result in pieces dropping without clearing lines (negative rewards from -0.1 per action step).
Lines changed: 226 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,226 @@
1+
# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
from __future__ import annotations
15+
16+
from dataclasses import dataclass, field
17+
from typing import Any, Dict, List, Optional, Union
18+
19+
import numpy as np
20+
from fastapi import FastAPI, HTTPException, Request
21+
from pydantic import BaseModel, Field
22+
23+
from nemo_gym.base_resources_server import (
24+
BaseResourcesServerConfig,
25+
BaseSeedSessionRequest,
26+
BaseSeedSessionResponse,
27+
BaseVerifyRequest,
28+
BaseVerifyResponse,
29+
SimpleResourcesServer,
30+
)
31+
from nemo_gym.server_utils import SESSION_ID_KEY, ServerClient
32+
from resources_servers.grl_tetris.tetris_env import TetrisEnv
33+
34+
35+
DEFAULT_GRID_LOOKUP = {0: "_", 1: "#", 2: "X"}
36+
DEFAULT_ACTION_LOOKUP = {0: "Left", 1: "Right", 2: "Down"}
37+
38+
39+
class GrlTetrisResourcesServerConfig(BaseResourcesServerConfig):
40+
env_config: Dict[str, Any] = Field(
41+
default_factory=lambda: {
42+
"grid_lookup": DEFAULT_GRID_LOOKUP,
43+
"action_lookup": DEFAULT_ACTION_LOOKUP,
44+
"render_mode": "text",
45+
"dim_x": 4,
46+
"dim_y": 4,
47+
"box_type": 3,
48+
}
49+
)
50+
51+
52+
class GrlTetrisSeedSessionRequest(BaseSeedSessionRequest):
53+
seed: Optional[int] = None
54+
55+
56+
class GrlTetrisSeedSessionResponse(BaseSeedSessionResponse):
57+
observation: str
58+
59+
60+
class GrlTetrisStepRequest(BaseModel):
61+
actions: List[Union[str, int]] = Field(default_factory=list)
62+
63+
64+
class GrlTetrisStepTrace(BaseModel):
65+
action_id: int
66+
action_label: str
67+
reward: float
68+
done: bool
69+
info: Dict[str, Any]
70+
71+
72+
class GrlTetrisStepResponse(BaseModel):
73+
observation: str
74+
reward: float
75+
total_reward: float
76+
done: bool
77+
steps: List[GrlTetrisStepTrace]
78+
history: List[GrlTetrisStepTrace] = Field(default_factory=list)
79+
80+
81+
class GrlTetrisVerifyResponse(BaseVerifyResponse):
82+
success: bool
83+
84+
85+
@dataclass
86+
class TetrisSessionState:
87+
env: Any
88+
observation: str
89+
total_reward: float = 0.0
90+
done: bool = False
91+
last_info: Dict[str, Any] = field(default_factory=dict)
92+
history: List[GrlTetrisStepTrace] = field(default_factory=list)
93+
94+
95+
class GrlTetrisResourcesServer(SimpleResourcesServer):
96+
config: GrlTetrisResourcesServerConfig
97+
server_client: ServerClient
98+
session_id_to_state: Dict[str, TetrisSessionState] = Field(default_factory=dict)
99+
100+
def setup_webserver(self) -> FastAPI:
101+
app = super().setup_webserver()
102+
app.post("/step")(self.step)
103+
return app
104+
105+
def _create_env(self) -> TetrisEnv:
106+
return TetrisEnv(self.config.env_config)
107+
108+
async def seed_session(self, request: Request, body: GrlTetrisSeedSessionRequest) -> GrlTetrisSeedSessionResponse:
109+
session_id = request.session[SESSION_ID_KEY]
110+
env = self._create_env()
111+
observation = env.reset(seed=body.seed)
112+
113+
self.session_id_to_state[session_id] = TetrisSessionState(
114+
env=env,
115+
observation=observation,
116+
)
117+
return GrlTetrisSeedSessionResponse(observation=observation)
118+
119+
async def step(self, request: Request, body: GrlTetrisStepRequest) -> GrlTetrisStepResponse:
120+
session_id = request.session.get(SESSION_ID_KEY)
121+
if session_id is None or session_id not in self.session_id_to_state:
122+
raise HTTPException(status_code=400, detail="Session not initialized. Call /seed_session first.")
123+
124+
session_state = self.session_id_to_state[session_id]
125+
env = session_state.env
126+
127+
reverse_lookup = {label.lower(): idx for idx, label in env.ACTION_LOOKUP.items()}
128+
total_step_reward = 0.0
129+
steps: List[GrlTetrisStepTrace] = []
130+
131+
if session_state.done:
132+
return GrlTetrisStepResponse(
133+
observation=session_state.observation,
134+
reward=0.0,
135+
total_reward=session_state.total_reward,
136+
done=True,
137+
steps=[],
138+
history=list(session_state.history),
139+
)
140+
141+
for action in body.actions:
142+
action_id = self._parse_action(action, reverse_lookup)
143+
if action_id not in env.ACTION_LOOKUP:
144+
raise HTTPException(status_code=400, detail=f"Invalid action identifier: {action}")
145+
146+
next_obs, reward, done, info = env.step(action_id)
147+
info = self._to_python_types(info)
148+
total_step_reward += reward
149+
session_state.total_reward += reward
150+
session_state.observation = next_obs
151+
session_state.last_info = info
152+
session_state.done = bool(done)
153+
154+
step = GrlTetrisStepTrace(
155+
action_id=action_id,
156+
action_label=env.ACTION_LOOKUP[action_id],
157+
reward=reward,
158+
done=session_state.done,
159+
info=info,
160+
)
161+
session_state.history.append(step)
162+
steps.append(step)
163+
164+
if session_state.done:
165+
break
166+
167+
return GrlTetrisStepResponse(
168+
observation=session_state.observation,
169+
reward=total_step_reward,
170+
total_reward=session_state.total_reward,
171+
done=session_state.done,
172+
steps=steps,
173+
history=list(session_state.history),
174+
)
175+
176+
async def verify(self, request: Request, body: BaseVerifyRequest) -> GrlTetrisVerifyResponse:
177+
session_id = request.session.get(SESSION_ID_KEY)
178+
session_state = self.session_id_to_state.get(session_id)
179+
180+
success = False
181+
reward = 0.0
182+
if session_state is not None:
183+
success = bool(session_state.last_info.get("success"))
184+
reward = session_state.total_reward
185+
186+
if session_id in self.session_id_to_state:
187+
try:
188+
session_state.env.close() # type: ignore[union-attr]
189+
except Exception: # pragma: no cover - defensive cleanup
190+
pass
191+
del self.session_id_to_state[session_id]
192+
193+
return GrlTetrisVerifyResponse(
194+
**body.model_dump(),
195+
reward=reward,
196+
success=success,
197+
)
198+
199+
@staticmethod
200+
def _parse_action(action: Union[str, int], reverse_lookup: Dict[str, int]) -> int:
201+
if isinstance(action, int):
202+
return action
203+
204+
candidate = action.strip()
205+
lower_candidate = candidate.lower()
206+
if lower_candidate in reverse_lookup:
207+
return reverse_lookup[lower_candidate]
208+
209+
try:
210+
return int(candidate)
211+
except ValueError as exc:
212+
raise HTTPException(status_code=400, detail=f"Unable to parse action: {action}") from exc
213+
214+
@staticmethod
215+
def _to_python_types(obj: Any) -> Any:
216+
if isinstance(obj, dict):
217+
return {k: GrlTetrisResourcesServer._to_python_types(v) for k, v in obj.items()}
218+
if isinstance(obj, list):
219+
return [GrlTetrisResourcesServer._to_python_types(v) for v in obj]
220+
if isinstance(obj, np.generic):
221+
return obj.item()
222+
return obj
223+
224+
225+
if __name__ == "__main__":
226+
GrlTetrisResourcesServer.run_webserver()
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
grl_tetris_resources_server:
2+
resources_servers:
3+
grl_tetris:
4+
entrypoint: app.py
5+
domain: games
6+
verified: false
7+
grl_tetris_simple_agent:
8+
responses_api_agents:
9+
simple_agent:
10+
entrypoint: app.py
11+
max_steps: 10
12+
resources_server:
13+
type: resources_servers
14+
name: grl_tetris_resources_server
15+
model_server:
16+
type: responses_api_models
17+
name: policy_model
18+
datasets:
19+
- name: example
20+
type: example
21+
jsonl_fpath: resources_servers/grl_tetris/data/example.jsonl
22+
num_repeats: 1
23+
gitlab_identifier:
24+
dataset_name: grl_tetris
25+
version: 0.0.1
26+
artifact_fpath: example.jsonl
27+
license: Apache 2.0
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
{"game_id": 1, "seed": 93810, "dim_board": [5, 5], "box_type": 0, "responses_create_params": {"input": [{"role": "developer", "content": "You are a Tetris-playing assistant. IMPORTANT: First call the `step` tool with an empty array [] to see the initial board state and active piece. Example: step({\"actions\": []}). The tool will return an ASCII board using '_' for empty cells, '#' for locked blocks, and 'X' for the active piece. Then continue calling `step` with valid actions (Left, Right, Down) until you clear a line or the board locks out. At the end, respond with <answer>Action1 || Action2 || ...</answer> summarizing all moves you made."}, {"role": "user", "content": "Call the step tool to see the board, then play Tetris to clear at least one line if possible."}], "tools": [{"name": "step", "type": "function", "description": "Execute Tetris moves sequentially. Call with empty array [] to see current board state without moving.", "strict": true, "parameters": {"type": "object", "properties": {"actions": {"type": "array", "items": {"type": "string"}, "description": "Sequence of actions, e.g. ['Left', 'Down']. Use empty array [] to view current state."}}, "required": ["actions"], "additionalProperties": false}}]}}
2+
{"game_id": 2, "seed": 46185, "dim_board": [4, 6], "box_type": 1, "responses_create_params": {"input": [{"role": "developer", "content": "You are a Tetris-playing assistant. IMPORTANT: First call the `step` tool with an empty array [] to see the initial board state and active piece. Example: step({\"actions\": []}). The tool will return an ASCII board using '_' for empty cells, '#' for locked blocks, and 'X' for the active piece. Then continue calling `step` with valid actions (Left, Right, Down) until you clear a line or the board locks out. At the end, respond with <answer>Action1 || Action2 || ...</answer> summarizing all moves you made."}, {"role": "user", "content": "Call the step tool to see the board, then play Tetris to clear at least one line if possible."}], "tools": [{"name": "step", "type": "function", "description": "Execute Tetris moves sequentially. Call with empty array [] to see current board state without moving.", "strict": true, "parameters": {"type": "object", "properties": {"actions": {"type": "array", "items": {"type": "string"}, "description": "Sequence of actions, e.g. ['Left', 'Down']. Use empty array [] to view current state."}}, "required": ["actions"], "additionalProperties": false}}]}}
3+
{"game_id": 3, "seed": 28563, "dim_board": [5, 5], "box_type": 0, "responses_create_params": {"input": [{"role": "developer", "content": "You are a Tetris-playing assistant. IMPORTANT: First call the `step` tool with an empty array [] to see the initial board state and active piece. Example: step({\"actions\": []}). The tool will return an ASCII board using '_' for empty cells, '#' for locked blocks, and 'X' for the active piece. Then continue calling `step` with valid actions (Left, Right, Down) until you clear a line or the board locks out. At the end, respond with <answer>Action1 || Action2 || ...</answer> summarizing all moves you made."}, {"role": "user", "content": "Call the step tool to see the board, then play Tetris to clear at least one line if possible."}], "tools": [{"name": "step", "type": "function", "description": "Execute Tetris moves sequentially. Call with empty array [] to see current board state without moving.", "strict": true, "parameters": {"type": "object", "properties": {"actions": {"type": "array", "items": {"type": "string"}, "description": "Sequence of actions, e.g. ['Left', 'Down']. Use empty array [] to view current state."}}, "required": ["actions"], "additionalProperties": false}}]}}
4+
{"game_id": 4, "seed": 87808, "dim_board": [6, 5], "box_type": 0, "responses_create_params": {"input": [{"role": "developer", "content": "You are a Tetris-playing assistant. IMPORTANT: First call the `step` tool with an empty array [] to see the initial board state and active piece. Example: step({\"actions\": []}). The tool will return an ASCII board using '_' for empty cells, '#' for locked blocks, and 'X' for the active piece. Then continue calling `step` with valid actions (Left, Right, Down) until you clear a line or the board locks out. At the end, respond with <answer>Action1 || Action2 || ...</answer> summarizing all moves you made."}, {"role": "user", "content": "Call the step tool to see the board, then play Tetris to clear at least one line if possible."}], "tools": [{"name": "step", "type": "function", "description": "Execute Tetris moves sequentially. Call with empty array [] to see current board state without moving.", "strict": true, "parameters": {"type": "object", "properties": {"actions": {"type": "array", "items": {"type": "string"}, "description": "Sequence of actions, e.g. ['Left', 'Down']. Use empty array [] to view current state."}}, "required": ["actions"], "additionalProperties": false}}]}}
5+
{"game_id": 5, "seed": 14453, "dim_board": [5, 5], "box_type": 1, "responses_create_params": {"input": [{"role": "developer", "content": "You are a Tetris-playing assistant. IMPORTANT: First call the `step` tool with an empty array [] to see the initial board state and active piece. Example: step({\"actions\": []}). The tool will return an ASCII board using '_' for empty cells, '#' for locked blocks, and 'X' for the active piece. Then continue calling `step` with valid actions (Left, Right, Down) until you clear a line or the board locks out. At the end, respond with <answer>Action1 || Action2 || ...</answer> summarizing all moves you made."}, {"role": "user", "content": "Call the step tool to see the board, then play Tetris to clear at least one line if possible."}], "tools": [{"name": "step", "type": "function", "description": "Execute Tetris moves sequentially. Call with empty array [] to see current board state without moving.", "strict": true, "parameters": {"type": "object", "properties": {"actions": {"type": "array", "items": {"type": "string"}, "description": "Sequence of actions, e.g. ['Left', 'Down']. Use empty array [] to view current state."}}, "required": ["actions"], "additionalProperties": false}}]}}
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
{
2+
"name": "example",
3+
"type": "example",
4+
"jsonl_fpath": "resources_servers/grl_tetris/data/example.jsonl",
5+
"gitlab_identifier": null,
6+
"license": "Apache 2.0",
7+
"Number of examples": 5
8+
}

0 commit comments

Comments
 (0)