Skip to content

Tstuyck/feature/dreamzero remote policy#843

Open
TuurStuyck wants to merge 6 commits into
mainfrom
tstuyck/feature/dreamzero-remote-policy
Open

Tstuyck/feature/dreamzero remote policy#843
TuurStuyck wants to merge 6 commits into
mainfrom
tstuyck/feature/dreamzero-remote-policy

Conversation

@TuurStuyck

@TuurStuyck TuurStuyck commented Jun 30, 2026

Copy link
Copy Markdown
Collaborator

Summary

Add support for DeamZero. DreamZero runs on Osmo while policy_runner runs locally with access to DreamZero through port forwarding.

Detailed description

  1. Submit the OSMO job

osmo workflow submit isaaclab_arena_dreamzero/docker/dreamzero_inference_server.yaml
--pool isaac-dev-h100-01
--set hf_token=<YOUR_HF_TOKEN>
--set port=5000

  1. Wait for the server to be ready

Watch the logs until you see the server is listening:
osmo workflow logs <workflow_id> serve
Look for a line like Starting WebsocketPolicyServer on 0.0.0.0:5000.

  1. Port-forward to your local machine

osmo workflow port-forward <workflow_id> serve --port 5000

This tunnels localhost:5000 to the server inside the OSMO task. Leave this terminal open while running the policy.

  1. Run the policy_runner

/isaac-sim/python.sh isaaclab_arena/evaluation/policy_runner.py --policy_type isaaclab_arena_dreamzero.policy.dreamzero_remote_policy.DreamZeroRemotePolicy --enable_cameras --num_episodes 5 --viz kit --dreamzero_host localhost --dreamzero_port 5000 --dreamzero_num_arm_joints 7 --dreamzero_open_loop_horizon 24 --dreamzero_cam_exterior_left external_camera_rgb --dreamzero_cam_wrist wrist_camera_rgb --language_instruction 'Pick up the cube and place it in the bowl' pick_and_place_maple_table --embodiment droid_abs_joint_pos

rl-video-step-0.mp4

Port the DreamZero VLA policy client into IsaacLab-Arena as a new
first-party extension package.

- Implements DreamZeroRemotePolicy (WebSocket + MessagePack, Pi0-style
  per-env action-chunk cache, UUID session management)
- Adds DreamZeroRemotePolicyConfig dataclass with CLI arg support
- Provides letterbox image resize utility (resize_with_pad)
- Covers 29 unit tests; no Isaac Sim dependency required to run them

Signed-off-by: tstuyck <tstuyck@nvidia.com>
Signed-off-by: tstuyck <tstuyck@nvidia.com>
Three bugs found running against a live server forwarded to port 5000:

1. Server greeting not consumed — the DreamZero server sends a metadata
   message immediately after the WebSocket handshake. _connect() now reads
   and discards it so the first inference recv() gets an action response,
   not the greeting.

2. Wrong msgpack-numpy encoding — the server expects the RoboLab wire
   format ({b"__ndarray__": True, b"data": ..., b"dtype": ..., b"shape": ...})
   not the msgpack_numpy library format. Replaced msgpack_numpy with a
   matching custom _msgpack_encode/_msgpack_decode pair.

3. Stale reset ack polluting next inference recv() — after multi-episode
   runs the server's "reset successful" text frame sometimes arrives after
   _send_reset's recv timeout (now raised from 5s to 60s). _call_server_with_retry
   now drains up to 3 stale string frames before treating a string response
   as a server error.

Tests updated to match the new encoding and _connect() greeting behaviour.

Signed-off-by: tstuyck <tstuyck@nvidia.com>
Switch cam_exterior_left (over_shoulder_left_camera → external_camera_rgb),
cam_exterior_right (over_shoulder_right_camera → external_camera_2_rgb), and
cam_wrist (wrist_cam → wrist_camera_rgb) to match the observation keys used
by Arena environments, so no --dreamzero_cam_* flags are needed for typical runs.

Signed-off-by: tstuyck <tstuyck@nvidia.com>
- docker/Dockerfile: builds dreamzero_inference_server from pytorch:25.04-py3
- docker/push_to_ngc.sh: builds and pushes the image to nvcr.io/nvidian
- docker/dreamzero_inference_server.yaml: OSMO workflow to download the
  DreamZero-DROID checkpoint and serve it on 2×H100s
- README.md: add "Running the inference server on OSMO" section

Signed-off-by: tstuyck <tstuyck@nvidia.com>

@arena-review-bot arena-review-bot Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Isaac Lab-Arena Review Bot

Summary

Adds a new first-party isaaclab_arena_dreamzero extension that wraps the DreamZero VLA server as a remote PolicyBase. The package is well-scoped: vendor-specific code stays in its own extension, the policy logic is sim-agnostic so the 29 unit tests run without Isaac Sim, num_envs is read via env.unwrapped, and it's the first remote policy to actually self-register via @register_policy. The main thing worth discussing is duplication of the remote chunk-cache/reconnect machinery.

Design, Boundaries & Scope

The per-env chunk-replay + reconnect/flush logic (get_action loop, _maybe_init_per_env_state, reset, _call_server_with_retry) is nearly identical to isaaclab_arena_openpi/policy/pi0_remote_policy.py, which already carries a standing TODO(cvolk, 2026-05-18): add a RemotePolicy base class. This PR makes it a third copy of that machinery. Would it be worth extracting the shared replay/retry into a RemotePolicy base now, so what remains here is only the genuinely DreamZero-specific parts (per-env session UUIDs, the msgpack __ndarray__ envelope, greeting/stale-drain handling, 7→8-dim padding)? Fully reasonable to defer to a follow-up if you'd rather land DreamZero first — but each added copy raises the cost of that eventual refactor, so flagging it now.

Same theme, lower stakes: resize_with_pad is a third letterbox implementation in the repo (openpi calls upstream openpi_client.image_tools.resize_with_pad; gr00t has resize_frames_with_padding). There's no shared image util in core today, so there's nothing to reuse yet — but if/when a RemotePolicy base lands, an isaaclab_arena/utils letterbox helper would let all three converge.

Findings

See inline comments.

Test Coverage

Strong. The 29 tests cover config validation, wire-format assembly, all four cam2_source modes, session lifecycle, chunk replay/refetch-on-exhaustion, reconnect/retry + give-up, and idempotent close — all against a fake WebSocket with no Isaac Sim dependency, so they correctly land in Phase 1 (no markers needed) and don't need the inner/outer sim pattern. No gaps worth blocking on.

Verdict

Minor fixes needed


actions = []
for env_id in range(num_envs):
chunk_exhausted = (

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Warning — This per-env chunk-cache + refetch loop (together with _maybe_init_per_env_state, reset, and _call_server_with_retry below) is a near-verbatim copy of Pi0RemotePolicy in isaaclab_arena_openpi, which already has # TODO(cvolk, 2026-05-18): add a RemotePolicy base class. This is now the third copy of that machinery. Could the shared replay/retry move into a RemotePolicy base (or reuse isaaclab_arena.policy.action_scheduling) so only the DreamZero-specific bits — session UUIDs, the msgpack __ndarray__ envelope, greeting/drain, 7→8-dim padding — live here? See the Design section; fine to defer if you'd prefer a follow-up.

@xyao-nv xyao-nv Jul 1, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on @cvolkcvolk's leftover for remotepolicy base class. Could be done in followup MR with a TODO inline.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — fine to defer here. Third copy is the tipping point though, so worth a tracked follow-up rather than another loose TODO. Whoever picks up the RemotePolicy base class can fold this in alongside the openpi one.

Comment thread isaaclab_arena_dreamzero/policy/dreamzero_remote_config.py Outdated
@greptile-apps

greptile-apps Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds a new isaaclab_arena_dreamzero module that integrates the DreamZero remote policy with Isaac Lab-Arena. The policy communicates with a DreamZero inference server over a synchronous WebSocket connection using MessagePack serialization, running the server on OSMO (cloud compute) while the policy runner executes locally with port-forwarding.

  • Policy client (dreamzero_remote_policy.py): manages per-environment session UUIDs, caches action chunks up to open_loop_horizon, handles reconnect logic on connection failures, and formats observations into DreamZero's flat wire format.
  • Docker / OSMO infra (docker/): Dockerfile, OSMO workflow YAML, and NGC push script for running the DreamZero inference server on H100 GPUs.
  • Tests (tests/test_dreamzero_remote_policy.py): ~500 lines of unit tests covering config validation, image resizing, wire-format correctness, session management, chunk caching, and reconnect behaviour.

Confidence Score: 4/5

The new module is self-contained and adds no changes to existing code; the main risk area is the WebSocket retry and drain path in the inference hot-loop.

The overall structure is clean and the test suite is thorough. The inference hot-loop does have some edge cases in how stale reset acknowledgements are drained and how reconnect exceptions are suppressed; these can surface as unexpected errors during multi-episode runs, particularly when consecutive episode resets coincide with slow server acks.

isaaclab_arena_dreamzero/policy/dreamzero_remote_policy.py — specifically the _call_server_with_retry drain loop and the contextlib.suppress scope on reconnect.

Important Files Changed

Filename Overview
isaaclab_arena_dreamzero/policy/dreamzero_remote_policy.py Core policy implementation: WebSocket lifecycle, action chunk caching, per-env session management, and retry logic. Well-structured overall; some edge-case reliability concerns in the drain/reconnect path.
isaaclab_arena_dreamzero/policy/dreamzero_remote_config.py Configuration dataclass with CLI parsing. Clean, validated defaults; cam2_source and horizon/joint constraints are enforced in post_init.
isaaclab_arena_dreamzero/policy/image_utils.py Letterbox-pad resize using OpenCV. Correct aspect-ratio scaling and canvas placement logic.
isaaclab_arena_dreamzero/tests/test_dreamzero_remote_policy.py Comprehensive test suite with monkeypatched WebSocket. One test accepts AttributeError despite the docstring stating the goal is to prevent it, weakening the regression guard.
isaaclab_arena_dreamzero/docker/Dockerfile Builds the inference server image from the PyTorch NGC base. Installs a broad set of pinned Python dependencies; git clone is unversioned (already flagged in a prior review round).
isaaclab_arena_dreamzero/docker/dreamzero_inference_server.yaml OSMO workflow definition for the DreamZero H100 server job. HF token and port are properly templated; default hf_token is empty string (requires --set at submit time).
isaaclab_arena_dreamzero/docker/push_to_ngc.sh Build-and-push shell script for the NGC registry. Correct use of set -euo pipefail; argument parsing is solid.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant PR as policy_runner (local)
    participant DZP as DreamZeroRemotePolicy
    participant WS as WebSocket (ws_sync)
    participant SRV as DreamZero server (OSMO / port-forward)

    PR->>DZP: __init__(config)
    DZP->>WS: connect(uri)
    WS->>SRV: WebSocket handshake
    SRV-->>WS: greeting (msgpack)
    WS-->>DZP: greeting consumed in _connect()
    DZP-->>PR: ready

    loop Each episode step
        PR->>DZP: get_action(env, observation)
        alt chunk exhausted
            DZP->>DZP: _build_request(obs, env_id)
            DZP->>WS: send(msgpack payload)
            WS->>SRV: inference request
            SRV-->>WS: action chunk (msgpack bytes)
            WS-->>DZP: raw bytes
            DZP->>DZP: drain stale string acks if any
            DZP->>DZP: _unpack → _parse_action_chunk
            DZP->>DZP: cache chunk, reset step counter
        else chunk available
            DZP->>DZP: replay next row from cache
        end
        DZP-->>PR: torch.Tensor (num_envs, 8)
    end

    PR->>DZP: reset(env_ids)
    DZP->>WS: send reset request
    WS->>SRV: "{endpoint: reset, session_ids: [...]}"
    SRV-->>WS: ack string
    WS-->>DZP: ack consumed

    PR->>DZP: close()
    DZP->>WS: send reset (best-effort)
    DZP->>WS: close()
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant PR as policy_runner (local)
    participant DZP as DreamZeroRemotePolicy
    participant WS as WebSocket (ws_sync)
    participant SRV as DreamZero server (OSMO / port-forward)

    PR->>DZP: __init__(config)
    DZP->>WS: connect(uri)
    WS->>SRV: WebSocket handshake
    SRV-->>WS: greeting (msgpack)
    WS-->>DZP: greeting consumed in _connect()
    DZP-->>PR: ready

    loop Each episode step
        PR->>DZP: get_action(env, observation)
        alt chunk exhausted
            DZP->>DZP: _build_request(obs, env_id)
            DZP->>WS: send(msgpack payload)
            WS->>SRV: inference request
            SRV-->>WS: action chunk (msgpack bytes)
            WS-->>DZP: raw bytes
            DZP->>DZP: drain stale string acks if any
            DZP->>DZP: _unpack → _parse_action_chunk
            DZP->>DZP: cache chunk, reset step counter
        else chunk available
            DZP->>DZP: replay next row from cache
        end
        DZP-->>PR: torch.Tensor (num_envs, 8)
    end

    PR->>DZP: reset(env_ids)
    DZP->>WS: send reset request
    WS->>SRV: "{endpoint: reset, session_ids: [...]}"
    SRV-->>WS: ack string
    WS-->>DZP: ack consumed

    PR->>DZP: close()
    DZP->>WS: send reset (best-effort)
    DZP->>WS: close()
Loading

Reviews (2): Last reviewed commit: "Update isaaclab_arena_dreamzero/policy/d..." | Re-trigger Greptile

rm -rf /var/lib/apt/lists/*

# Clone dreamzero repo
RUN git clone https://github.com/dreamzero0/dreamzero.git /workspace/dreamzero

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Unpinned git clone makes the image non-reproducible and mutable

The DreamZero repo is cloned from the default branch at build time with no commit SHA or tag. Any push to that branch after the initial build would produce a different image from the same Dockerfile, making it impossible to reproduce a known-good image and creating a silent supply chain risk. Pinning to a specific commit (git clone ... && git checkout <sha>) or a tagged release ensures the image content is stable across rebuilds.

Co-authored-by: arena-review-bot[bot] <290456231+arena-review-bot[bot]@users.noreply.github.com>

@arena-review-bot arena-review-bot Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Isaac Lab-Arena Review Bot

Summary

Adds a new first-party isaaclab_arena_dreamzero extension implementing DreamZeroRemotePolicy, a WebSocket/MessagePack client that streams observations to a remote DreamZero VLA server and replays per-env action chunks. Vendor- and server-specific code is correctly kept in its own extension package, the core stays untouched, and the policy comes with a solid 29-test suite that runs without Isaac Sim. The implementation is clean; my main feedback is a structural one about duplication with the existing remote policies.

Design, Boundaries & Scope

The per-env scaffolding here — _maybe_init_per_env_state, the get_action chunk-cache/replay loop, reset, close, and the _call_server_with_retry skeleton — is near-identical to isaaclab_arena_openpi/policy/pi0_remote_policy.py (and overlaps isaaclab_arena_gr00t's closed-loop policy). Pi0 already carries an explicit TODO(cvolk, 2026-05-18): add a RemotePolicy base class to unify this and other remote policies. — this PR makes that the third copy of the same replay machinery. Could the shared chunk-cache/replay + reconnect logic move into a RemotePolicy base in core policy/, leaving DreamZero to supply only its wire format (_build_request/_parse_action_chunk/_pack/_unpack)? That would realize the existing TODO rather than grow the duplication. Not a blocker, but worth deciding now while there are exactly three to fold together.

Findings

🔵 Improvement — Six of the newly-added files carry copyright 2025-2026, while the (also new) config and docker files in this same PR correctly use 2026. A brand-new file can't span an earlier year; please make all of them 2026. Affected: __init__.py, policy/__init__.py, policy/dreamzero_remote_policy.py, policy/image_utils.py, tests/__init__.py, tests/test_dreamzero_remote_policy.py. (Inline on the policy file.)

🔵 Improvement — The server Dockerfile clones dreamzero at HEAD (and installs websockets unpinned while everything else is version-locked), so the published inference image isn't reproducible — a silent upstream change could shift eval behavior. Worth pinning to a commit/tag. (Inline.)

Test Coverage

Coverage is strong for a non-sim client: config validation, wire-format/key shaping, image resize, cam2 modes, session lifecycle, partial/full reset, chunk replay/exhaustion/padding/truncation, reconnect-and-refresh, and idempotent close are all exercised with a _FakeWs stand-in. No sim is required so the inner/outer run_simulation_app_function pattern correctly doesn't apply, and these land in Phase 1 (not with_cameras and not with_subprocess) as expected. Nothing missing.

Verdict

Ship it (after the copyright-year fix; the duplication question is worth a reply but isn't a blocker).

@@ -0,0 +1,458 @@
# Copyright (c) 2025-2026, The Isaac Lab Arena Project Developers (https://github.com/isaac-sim/IsaacLab-Arena/blob/main/CONTRIBUTORS.md).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 Improvement — This file is newly added, so the copyright header should be a single year (2026), not a range — a new file can't span 2025. The same applies to policy/__init__.py, image_utils.py, __init__.py, tests/__init__.py, and tests/test_dreamzero_remote_policy.py; the new config/docker files in this PR already use 2026.

Suggested change
# Copyright (c) 2025-2026, The Isaac Lab Arena Project Developers (https://github.com/isaac-sim/IsaacLab-Arena/blob/main/CONTRIBUTORS.md).
# Copyright (c) 2026, The Isaac Lab Arena Project Developers (https://github.com/isaac-sim/IsaacLab-Arena/blob/main/CONTRIBUTORS.md).

rm -rf /var/lib/apt/lists/*

# Clone dreamzero repo
RUN git clone https://github.com/dreamzero0/dreamzero.git /workspace/dreamzero

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 Improvement — Cloning at HEAD (plus the unpinned websockets install below, while all other deps are version-locked) makes the published inference image non-reproducible: an upstream change to dreamzero could silently alter eval behavior between two builds of the "same" image. Could we pin to a specific commit or tag?

TARGET_W: int = 320


def resize_with_pad(img: np.ndarray, height: int = TARGET_H, width: int = TARGET_W) -> np.ndarray:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"

echo "Building image: ${IMAGE_NAME}:${TAG_NAME}"
docker build \

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cd /workspace/dreamzero
hf download GEAR-Dreams/DreamZero-DROID \
--repo-type model \
--local-dir ./checkpoints/DreamZero-DROID

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, shall it be baked into somewhere static in case there is version changes? And we are not dependent on HF API every eval job.
Like the sever docker image itself, or swiftstack/S3 bucket (how-to) + link to the inputs field (how)?


workflow:
name: dreamzero-inference-server
pool: isaac-dev-h100-01

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it only fit 2 x h100? Can it fit a few l40s?
Can you put a inline note/link explaining gpu reqs?
Asking because sometimes l40s pool is less crowded than h100 pool.

cam_exterior_left: str = "external_camera_rgb"
"""Arena camera key that maps to observation/exterior_image_0_left."""

cam2_source: str = "black"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit.
How about Literal["black", "right", "head", "duplicate"] = "black"?

returns action chunks that are replayed step-by-step, querying for new chunks
only when the current one is exhausted.

The server is stateful: it maintains a temporal observation history per session

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

)
return resize_with_pad(tensor.numpy(), TARGET_H, TARGET_W)

def _resolve_cam2(self, cam_obs: dict[str, Any], env_id: int, img0: np.ndarray) -> np.ndarray:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we add a few lines in the file containing VALID_CAM2_SOURCES explaining what they are? And how to set them?
It's not clear to me until this func.

@@ -0,0 +1,152 @@
# isaaclab_arena_dreamzero

@xyao-nv xyao-nv Jul 1, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is the 1st part of your train of MRs, at least I hope 🤩 . Can we park those instructions into doc/...rst? So we have a centralized place to go thru instructions.
How about here?

open_loop_horizon: int = 24
"""Number of action steps to execute per server inference call."""

num_arm_joints: int = 7

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the joint order does not match between our Sim vs. their model outputs? For DROID, I assume it's less likely and they basically hardcoded that ordering into the model(?). How about other embodiments, like for G1/GR1, GN1.6 has to do some joint reorderings here.

If this is only for DROID, plz add asseration somewhere.
If this can support other embodiments, plz explain and add missing joint conversion placeholders with TODO.

rm -rf /var/lib/apt/lists/*

# Clone dreamzero repo
RUN git clone https://github.com/dreamzero0/dreamzero.git /workspace/dreamzero

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we lock a commit HASH?

@xyao-nv

xyao-nv commented Jul 1, 2026

Copy link
Copy Markdown
Collaborator

Congrats on your 1st MR to Arena! Thanks for doing that!
For running DREAMZERO as a sidecar with Arena on OSMO, maybe this GR00T example MR could be helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants