Tstuyck/feature/dreamzero remote policy#843
Conversation
Port the DreamZero VLA policy client into IsaacLab-Arena as a new first-party extension package. - Implements DreamZeroRemotePolicy (WebSocket + MessagePack, Pi0-style per-env action-chunk cache, UUID session management) - Adds DreamZeroRemotePolicyConfig dataclass with CLI arg support - Provides letterbox image resize utility (resize_with_pad) - Covers 29 unit tests; no Isaac Sim dependency required to run them Signed-off-by: tstuyck <tstuyck@nvidia.com>
Signed-off-by: tstuyck <tstuyck@nvidia.com>
Three bugs found running against a live server forwarded to port 5000:
1. Server greeting not consumed — the DreamZero server sends a metadata
message immediately after the WebSocket handshake. _connect() now reads
and discards it so the first inference recv() gets an action response,
not the greeting.
2. Wrong msgpack-numpy encoding — the server expects the RoboLab wire
format ({b"__ndarray__": True, b"data": ..., b"dtype": ..., b"shape": ...})
not the msgpack_numpy library format. Replaced msgpack_numpy with a
matching custom _msgpack_encode/_msgpack_decode pair.
3. Stale reset ack polluting next inference recv() — after multi-episode
runs the server's "reset successful" text frame sometimes arrives after
_send_reset's recv timeout (now raised from 5s to 60s). _call_server_with_retry
now drains up to 3 stale string frames before treating a string response
as a server error.
Tests updated to match the new encoding and _connect() greeting behaviour.
Signed-off-by: tstuyck <tstuyck@nvidia.com>
Switch cam_exterior_left (over_shoulder_left_camera → external_camera_rgb), cam_exterior_right (over_shoulder_right_camera → external_camera_2_rgb), and cam_wrist (wrist_cam → wrist_camera_rgb) to match the observation keys used by Arena environments, so no --dreamzero_cam_* flags are needed for typical runs. Signed-off-by: tstuyck <tstuyck@nvidia.com>
- docker/Dockerfile: builds dreamzero_inference_server from pytorch:25.04-py3 - docker/push_to_ngc.sh: builds and pushes the image to nvcr.io/nvidian - docker/dreamzero_inference_server.yaml: OSMO workflow to download the DreamZero-DROID checkpoint and serve it on 2×H100s - README.md: add "Running the inference server on OSMO" section Signed-off-by: tstuyck <tstuyck@nvidia.com>
There was a problem hiding this comment.
🤖 Isaac Lab-Arena Review Bot
Summary
Adds a new first-party isaaclab_arena_dreamzero extension that wraps the DreamZero VLA server as a remote PolicyBase. The package is well-scoped: vendor-specific code stays in its own extension, the policy logic is sim-agnostic so the 29 unit tests run without Isaac Sim, num_envs is read via env.unwrapped, and it's the first remote policy to actually self-register via @register_policy. The main thing worth discussing is duplication of the remote chunk-cache/reconnect machinery.
Design, Boundaries & Scope
The per-env chunk-replay + reconnect/flush logic (get_action loop, _maybe_init_per_env_state, reset, _call_server_with_retry) is nearly identical to isaaclab_arena_openpi/policy/pi0_remote_policy.py, which already carries a standing TODO(cvolk, 2026-05-18): add a RemotePolicy base class. This PR makes it a third copy of that machinery. Would it be worth extracting the shared replay/retry into a RemotePolicy base now, so what remains here is only the genuinely DreamZero-specific parts (per-env session UUIDs, the msgpack __ndarray__ envelope, greeting/stale-drain handling, 7→8-dim padding)? Fully reasonable to defer to a follow-up if you'd rather land DreamZero first — but each added copy raises the cost of that eventual refactor, so flagging it now.
Same theme, lower stakes: resize_with_pad is a third letterbox implementation in the repo (openpi calls upstream openpi_client.image_tools.resize_with_pad; gr00t has resize_frames_with_padding). There's no shared image util in core today, so there's nothing to reuse yet — but if/when a RemotePolicy base lands, an isaaclab_arena/utils letterbox helper would let all three converge.
Findings
See inline comments.
Test Coverage
Strong. The 29 tests cover config validation, wire-format assembly, all four cam2_source modes, session lifecycle, chunk replay/refetch-on-exhaustion, reconnect/retry + give-up, and idempotent close — all against a fake WebSocket with no Isaac Sim dependency, so they correctly land in Phase 1 (no markers needed) and don't need the inner/outer sim pattern. No gaps worth blocking on.
Verdict
Minor fixes needed
|
|
||
| actions = [] | ||
| for env_id in range(num_envs): | ||
| chunk_exhausted = ( |
There was a problem hiding this comment.
🟡 Warning — This per-env chunk-cache + refetch loop (together with _maybe_init_per_env_state, reset, and _call_server_with_retry below) is a near-verbatim copy of Pi0RemotePolicy in isaaclab_arena_openpi, which already has # TODO(cvolk, 2026-05-18): add a RemotePolicy base class. This is now the third copy of that machinery. Could the shared replay/retry move into a RemotePolicy base (or reuse isaaclab_arena.policy.action_scheduling) so only the DreamZero-specific bits — session UUIDs, the msgpack __ndarray__ envelope, greeting/drain, 7→8-dim padding — live here? See the Design section; fine to defer if you'd prefer a follow-up.
There was a problem hiding this comment.
+1 on @cvolkcvolk's leftover for remotepolicy base class. Could be done in followup MR with a TODO inline.
There was a problem hiding this comment.
Agreed — fine to defer here. Third copy is the tipping point though, so worth a tracked follow-up rather than another loose TODO. Whoever picks up the RemotePolicy base class can fold this in alongside the openpi one.
Greptile SummaryThis PR adds a new
Confidence Score: 4/5The new module is self-contained and adds no changes to existing code; the main risk area is the WebSocket retry and drain path in the inference hot-loop. The overall structure is clean and the test suite is thorough. The inference hot-loop does have some edge cases in how stale reset acknowledgements are drained and how reconnect exceptions are suppressed; these can surface as unexpected errors during multi-episode runs, particularly when consecutive episode resets coincide with slow server acks. isaaclab_arena_dreamzero/policy/dreamzero_remote_policy.py — specifically the _call_server_with_retry drain loop and the contextlib.suppress scope on reconnect. Important Files Changed
Sequence Diagram%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
participant PR as policy_runner (local)
participant DZP as DreamZeroRemotePolicy
participant WS as WebSocket (ws_sync)
participant SRV as DreamZero server (OSMO / port-forward)
PR->>DZP: __init__(config)
DZP->>WS: connect(uri)
WS->>SRV: WebSocket handshake
SRV-->>WS: greeting (msgpack)
WS-->>DZP: greeting consumed in _connect()
DZP-->>PR: ready
loop Each episode step
PR->>DZP: get_action(env, observation)
alt chunk exhausted
DZP->>DZP: _build_request(obs, env_id)
DZP->>WS: send(msgpack payload)
WS->>SRV: inference request
SRV-->>WS: action chunk (msgpack bytes)
WS-->>DZP: raw bytes
DZP->>DZP: drain stale string acks if any
DZP->>DZP: _unpack → _parse_action_chunk
DZP->>DZP: cache chunk, reset step counter
else chunk available
DZP->>DZP: replay next row from cache
end
DZP-->>PR: torch.Tensor (num_envs, 8)
end
PR->>DZP: reset(env_ids)
DZP->>WS: send reset request
WS->>SRV: "{endpoint: reset, session_ids: [...]}"
SRV-->>WS: ack string
WS-->>DZP: ack consumed
PR->>DZP: close()
DZP->>WS: send reset (best-effort)
DZP->>WS: close()
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
participant PR as policy_runner (local)
participant DZP as DreamZeroRemotePolicy
participant WS as WebSocket (ws_sync)
participant SRV as DreamZero server (OSMO / port-forward)
PR->>DZP: __init__(config)
DZP->>WS: connect(uri)
WS->>SRV: WebSocket handshake
SRV-->>WS: greeting (msgpack)
WS-->>DZP: greeting consumed in _connect()
DZP-->>PR: ready
loop Each episode step
PR->>DZP: get_action(env, observation)
alt chunk exhausted
DZP->>DZP: _build_request(obs, env_id)
DZP->>WS: send(msgpack payload)
WS->>SRV: inference request
SRV-->>WS: action chunk (msgpack bytes)
WS-->>DZP: raw bytes
DZP->>DZP: drain stale string acks if any
DZP->>DZP: _unpack → _parse_action_chunk
DZP->>DZP: cache chunk, reset step counter
else chunk available
DZP->>DZP: replay next row from cache
end
DZP-->>PR: torch.Tensor (num_envs, 8)
end
PR->>DZP: reset(env_ids)
DZP->>WS: send reset request
WS->>SRV: "{endpoint: reset, session_ids: [...]}"
SRV-->>WS: ack string
WS-->>DZP: ack consumed
PR->>DZP: close()
DZP->>WS: send reset (best-effort)
DZP->>WS: close()
Reviews (2): Last reviewed commit: "Update isaaclab_arena_dreamzero/policy/d..." | Re-trigger Greptile |
| rm -rf /var/lib/apt/lists/* | ||
|
|
||
| # Clone dreamzero repo | ||
| RUN git clone https://github.com/dreamzero0/dreamzero.git /workspace/dreamzero |
There was a problem hiding this comment.
Unpinned
git clone makes the image non-reproducible and mutable
The DreamZero repo is cloned from the default branch at build time with no commit SHA or tag. Any push to that branch after the initial build would produce a different image from the same Dockerfile, making it impossible to reproduce a known-good image and creating a silent supply chain risk. Pinning to a specific commit (git clone ... && git checkout <sha>) or a tagged release ensures the image content is stable across rebuilds.
Co-authored-by: arena-review-bot[bot] <290456231+arena-review-bot[bot]@users.noreply.github.com>
There was a problem hiding this comment.
🤖 Isaac Lab-Arena Review Bot
Summary
Adds a new first-party isaaclab_arena_dreamzero extension implementing DreamZeroRemotePolicy, a WebSocket/MessagePack client that streams observations to a remote DreamZero VLA server and replays per-env action chunks. Vendor- and server-specific code is correctly kept in its own extension package, the core stays untouched, and the policy comes with a solid 29-test suite that runs without Isaac Sim. The implementation is clean; my main feedback is a structural one about duplication with the existing remote policies.
Design, Boundaries & Scope
The per-env scaffolding here — _maybe_init_per_env_state, the get_action chunk-cache/replay loop, reset, close, and the _call_server_with_retry skeleton — is near-identical to isaaclab_arena_openpi/policy/pi0_remote_policy.py (and overlaps isaaclab_arena_gr00t's closed-loop policy). Pi0 already carries an explicit TODO(cvolk, 2026-05-18): add a RemotePolicy base class to unify this and other remote policies. — this PR makes that the third copy of the same replay machinery. Could the shared chunk-cache/replay + reconnect logic move into a RemotePolicy base in core policy/, leaving DreamZero to supply only its wire format (_build_request/_parse_action_chunk/_pack/_unpack)? That would realize the existing TODO rather than grow the duplication. Not a blocker, but worth deciding now while there are exactly three to fold together.
Findings
🔵 Improvement — Six of the newly-added files carry copyright 2025-2026, while the (also new) config and docker files in this same PR correctly use 2026. A brand-new file can't span an earlier year; please make all of them 2026. Affected: __init__.py, policy/__init__.py, policy/dreamzero_remote_policy.py, policy/image_utils.py, tests/__init__.py, tests/test_dreamzero_remote_policy.py. (Inline on the policy file.)
🔵 Improvement — The server Dockerfile clones dreamzero at HEAD (and installs websockets unpinned while everything else is version-locked), so the published inference image isn't reproducible — a silent upstream change could shift eval behavior. Worth pinning to a commit/tag. (Inline.)
Test Coverage
Coverage is strong for a non-sim client: config validation, wire-format/key shaping, image resize, cam2 modes, session lifecycle, partial/full reset, chunk replay/exhaustion/padding/truncation, reconnect-and-refresh, and idempotent close are all exercised with a _FakeWs stand-in. No sim is required so the inner/outer run_simulation_app_function pattern correctly doesn't apply, and these land in Phase 1 (not with_cameras and not with_subprocess) as expected. Nothing missing.
Verdict
Ship it (after the copyright-year fix; the duplication question is worth a reply but isn't a blocker).
| @@ -0,0 +1,458 @@ | |||
| # Copyright (c) 2025-2026, The Isaac Lab Arena Project Developers (https://github.com/isaac-sim/IsaacLab-Arena/blob/main/CONTRIBUTORS.md). | |||
There was a problem hiding this comment.
🔵 Improvement — This file is newly added, so the copyright header should be a single year (2026), not a range — a new file can't span 2025. The same applies to policy/__init__.py, image_utils.py, __init__.py, tests/__init__.py, and tests/test_dreamzero_remote_policy.py; the new config/docker files in this PR already use 2026.
| # Copyright (c) 2025-2026, The Isaac Lab Arena Project Developers (https://github.com/isaac-sim/IsaacLab-Arena/blob/main/CONTRIBUTORS.md). | |
| # Copyright (c) 2026, The Isaac Lab Arena Project Developers (https://github.com/isaac-sim/IsaacLab-Arena/blob/main/CONTRIBUTORS.md). |
| rm -rf /var/lib/apt/lists/* | ||
|
|
||
| # Clone dreamzero repo | ||
| RUN git clone https://github.com/dreamzero0/dreamzero.git /workspace/dreamzero |
There was a problem hiding this comment.
🔵 Improvement — Cloning at HEAD (plus the unpinned websockets install below, while all other deps are version-locked) makes the published inference image non-reproducible: an upstream change to dreamzero could silently alter eval behavior between two builds of the "same" image. Could we pin to a specific commit or tag?
| TARGET_W: int = 320 | ||
|
|
||
|
|
||
| def resize_with_pad(img: np.ndarray, height: int = TARGET_H, width: int = TARGET_W) -> np.ndarray: |
There was a problem hiding this comment.
Can we unify with GR00T's image resize padding func?
https://github.com/isaac-sim/IsaacLab-Arena/blob/main/isaaclab_arena_gr00t/utils/image_conversion.py#L14
| SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" | ||
|
|
||
| echo "Building image: ${IMAGE_NAME}:${TAG_NAME}" | ||
| docker build \ |
There was a problem hiding this comment.
Can we unify with docker/push_to_ngc.sh?
https://github.com/isaac-sim/IsaacLab-Arena/blob/main/docker/push_to_ngc.sh
| cd /workspace/dreamzero | ||
| hf download GEAR-Dreams/DreamZero-DROID \ | ||
| --repo-type model \ | ||
| --local-dir ./checkpoints/DreamZero-DROID |
|
|
||
| workflow: | ||
| name: dreamzero-inference-server | ||
| pool: isaac-dev-h100-01 |
There was a problem hiding this comment.
Does it only fit 2 x h100? Can it fit a few l40s?
Can you put a inline note/link explaining gpu reqs?
Asking because sometimes l40s pool is less crowded than h100 pool.
| cam_exterior_left: str = "external_camera_rgb" | ||
| """Arena camera key that maps to observation/exterior_image_0_left.""" | ||
|
|
||
| cam2_source: str = "black" |
There was a problem hiding this comment.
Nit.
How about Literal["black", "right", "head", "duplicate"] = "black"?
| returns action chunks that are replayed step-by-step, querying for new chunks | ||
| only when the current one is exhausted. | ||
|
|
||
| The server is stateful: it maintains a temporal observation history per session |
| ) | ||
| return resize_with_pad(tensor.numpy(), TARGET_H, TARGET_W) | ||
|
|
||
| def _resolve_cam2(self, cam_obs: dict[str, Any], env_id: int, img0: np.ndarray) -> np.ndarray: |
There was a problem hiding this comment.
Shall we add a few lines in the file containing VALID_CAM2_SOURCES explaining what they are? And how to set them?
It's not clear to me until this func.
| @@ -0,0 +1,152 @@ | |||
| # isaaclab_arena_dreamzero | |||
There was a problem hiding this comment.
I know this is the 1st part of your train of MRs, at least I hope 🤩 . Can we park those instructions into doc/...rst? So we have a centralized place to go thru instructions.
How about here?
| open_loop_horizon: int = 24 | ||
| """Number of action steps to execute per server inference call.""" | ||
|
|
||
| num_arm_joints: int = 7 |
There was a problem hiding this comment.
What if the joint order does not match between our Sim vs. their model outputs? For DROID, I assume it's less likely and they basically hardcoded that ordering into the model(?). How about other embodiments, like for G1/GR1, GN1.6 has to do some joint reorderings here.
If this is only for DROID, plz add asseration somewhere.
If this can support other embodiments, plz explain and add missing joint conversion placeholders with TODO.
| rm -rf /var/lib/apt/lists/* | ||
|
|
||
| # Clone dreamzero repo | ||
| RUN git clone https://github.com/dreamzero0/dreamzero.git /workspace/dreamzero |
There was a problem hiding this comment.
Can we lock a commit HASH?
|
Congrats on your 1st MR to Arena! Thanks for doing that! |
Summary
Add support for DeamZero. DreamZero runs on Osmo while policy_runner runs locally with access to DreamZero through port forwarding.
Detailed description
osmo workflow submit isaaclab_arena_dreamzero/docker/dreamzero_inference_server.yaml
--pool isaac-dev-h100-01
--set hf_token=<YOUR_HF_TOKEN>
--set port=5000
Watch the logs until you see the server is listening:
osmo workflow logs <workflow_id> serve
Look for a line like Starting WebsocketPolicyServer on 0.0.0.0:5000.
osmo workflow port-forward <workflow_id> serve --port 5000
This tunnels localhost:5000 to the server inside the OSMO task. Leave this terminal open while running the policy.
/isaac-sim/python.sh isaaclab_arena/evaluation/policy_runner.py --policy_type isaaclab_arena_dreamzero.policy.dreamzero_remote_policy.DreamZeroRemotePolicy --enable_cameras --num_episodes 5 --viz kit --dreamzero_host localhost --dreamzero_port 5000 --dreamzero_num_arm_joints 7 --dreamzero_open_loop_horizon 24 --dreamzero_cam_exterior_left external_camera_rgb --dreamzero_cam_wrist wrist_camera_rgb --language_instruction 'Pick up the cube and place it in the bowl' pick_and_place_maple_table --embodiment droid_abs_joint_pos
rl-video-step-0.mp4