Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -152,6 +152,7 @@ IsaacLab-Arena/
├── isaaclab_arena_environments/ # Concrete environment definitions
├── isaaclab_arena_examples/ # Policy and relation examples
├── isaaclab_arena_g1/ # Unitree G1 humanoid embodiment + examples
├── isaaclab_arena_dreamzero/ # DreamZero policy integration
├── isaaclab_arena_gr00t/ # GR00T policy integration
├── isaaclab_arena_openpi/ # OpenPi (pi0 / pi05) policy integration
├── docker/ # Docker configurations and launch scripts
Expand Down
152 changes: 152 additions & 0 deletions isaaclab_arena_dreamzero/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
# isaaclab_arena_dreamzero

@xyao-nv xyao-nv Jul 1, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is the 1st part of your train of MRs, at least I hope 🤩 . Can we park those instructions into doc/...rst? So we have a centralized place to go thru instructions.
How about here?


DreamZero remote policy integration for Isaac Lab-Arena.

`DreamZeroRemotePolicy` connects to a running DreamZero inference server over WebSocket + MessagePack, sends observations in DreamZero's flat wire format, and replays the returned action chunks step-by-step.

## Prerequisites

The DreamZero inference server must be running and reachable before launching the policy runner. The client connects eagerly at construction time and will raise a `ConnectionRefusedError` if the server is not up.

## Running

All global and policy-specific flags must appear **before** the environment name (subcommand). Flags like `--embodiment` that are specific to the environment go after it.

```bash
/isaac-sim/python.sh isaaclab_arena/evaluation/policy_runner.py \
--policy_type isaaclab_arena_dreamzero.policy.dreamzero_remote_policy.DreamZeroRemotePolicy \
--enable_cameras \
--num_episodes 5 \
--headless \
--language_instruction "Pick up the cube and place it in the bowl." \
pick_and_place_maple_table \
--embodiment droid_abs_joint_pos
```

With the Kit viewport open, omit `--headless` and add `--viz kit`:

```bash
/isaac-sim/python.sh isaaclab_arena/evaluation/policy_runner.py \
--policy_type isaaclab_arena_dreamzero.policy.dreamzero_remote_policy.DreamZeroRemotePolicy \
--enable_cameras \
--num_episodes 5 \
--viz kit \
--language_instruction "Pick up the cube and place it in the bowl." \
pick_and_place_maple_table \
--embodiment droid_abs_joint_pos
```

Run inside the container:

```bash
docker exec "$ARENA_CONTAINER" su $(id -un) -c \
"cd /workspaces/isaaclab_arena && <command above>"
```

## Configuration

All options have defaults matching the DreamZero wire protocol. Only override what differs from your setup.

| Flag | Default | Description |
|------|---------|-------------|
| `--dreamzero_host` | `localhost` | Hostname of the DreamZero inference server |
| `--dreamzero_port` | `5000` | Port the server listens on |
| `--dreamzero_open_loop_horizon` | `24` | Action steps replayed per server inference call |
| `--dreamzero_num_arm_joints` | `7` | Arm DOF count; remainder of `robot_joint_pos` is treated as gripper |
| `--dreamzero_cam_exterior_left` | `external_camera_rgb` | Arena camera key → `observation/exterior_image_0_left` |
| `--dreamzero_cam2_source` | `black` | Source for `observation/exterior_image_1_left`: `black`, `duplicate`, `right`, or `head` |
| `--dreamzero_cam_exterior_right` | `external_camera_2_rgb` | Camera used when `cam2_source=right` |
| `--dreamzero_cam_head` | `head_camera` | Camera used when `cam2_source=head` |
| `--dreamzero_cam_wrist` | `wrist_camera_rgb` | Arena camera key → `observation/wrist_image_left` |
| `--policy_device` | `cuda` | Torch device for the returned action tensor |

## Batch evaluation (eval_runner)

Use the dotted import path as `policy_type` and pass config fields directly in `policy_config_dict`:

```json
{
"name": "dreamzero_pick_and_place",
"arena_env_args": {
"enable_cameras": true,
"environment": "pick_and_place_maple_table",
"embodiment": "droid_abs_joint_pos"
},
"num_episodes": 5,
"language_instruction": "Pick up the cube and place it in the bowl.",
"policy_type": "isaaclab_arena_dreamzero.policy.dreamzero_remote_policy.DreamZeroRemotePolicy",
"policy_config_dict": {
"remote_host": "localhost",
"remote_port": 5000
}
}
```

Pass this file to:

```bash
/isaac-sim/python.sh isaaclab_arena/evaluation/eval_runner.py \
--eval_jobs_config <path/to/config.json>
```

## Running the inference server on OSMO

The `docker/` directory contains everything needed to build the DreamZero inference server image and launch it as an OSMO job.

### 1. Build and push the Docker image

Log in to the NGC registry once:

```bash
docker login nvcr.io -u '$oauthtoken' -p <YOUR_NGC_API_KEY>
```

Then build and push:

```bash
./isaaclab_arena_dreamzero/docker/push_to_ngc.sh
# Optional overrides:
# -t <tag> Image tag (default: latest)
# -n <image-name> Override image name (default: dreamzero_inference_server)
```

This produces `nvcr.io/nvidian/dreamzero_inference_server:<tag>`.

### 2. Submit the OSMO job

```bash
osmo workflow submit isaaclab_arena_dreamzero/docker/dreamzero_inference_server.yaml \
--set hf_token=<YOUR_HF_TOKEN> \
--set port=5000
```

The job downloads the `GEAR-Dreams/DreamZero-DROID` checkpoint from HuggingFace and starts the WebSocket inference server on the requested port using 2 H100 GPUs.

> **No pre-built image?** If you haven't pushed an image yet, use `dreamzero/dreamzero_inference_server.yaml` from the upstream dreamzero repo instead — it installs all dependencies at runtime from the base `nvcr.io/nvidia/pytorch:25.04-py3` image (slower startup, ~5–10 min).
### 3. Connect the Arena policy

Once the server is running, find its IP from the OSMO job logs and pass it to the policy:

```bash
/isaac-sim/python.sh isaaclab_arena/evaluation/policy_runner.py \
--policy_type isaaclab_arena_dreamzero.policy.dreamzero_remote_policy.DreamZeroRemotePolicy \
--dreamzero_host <OSMO_JOB_IP> \
--dreamzero_port 5000 \
--enable_cameras \
--num_episodes 5 \
--headless \
--language_instruction "Pick up the cube and place it in the bowl." \
pick_and_place_maple_table \
--embodiment droid_abs_joint_pos
```

## Observation requirements

The environment must expose these keys in its observation dict:

- `observation["camera_obs"][cam_exterior_left]` — uint8 RGB tensor `(num_envs, H, W, 3)`
- `observation["camera_obs"][cam_wrist]` — uint8 RGB tensor `(num_envs, H, W, 3)`
- `observation["policy"]["robot_joint_pos"]` — float tensor `(num_envs, num_arm_joints + 1)`

Images are resized to `180 × 320` with letterbox padding before being sent to the server.
4 changes: 4 additions & 0 deletions isaaclab_arena_dreamzero/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Copyright (c) 2025-2026, The Isaac Lab Arena Project Developers (https://github.com/isaac-sim/IsaacLab-Arena/blob/main/CONTRIBUTORS.md).
# All rights reserved.
#
# SPDX-License-Identifier: Apache-2.0
50 changes: 50 additions & 0 deletions isaaclab_arena_dreamzero/docker/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Copyright (c) 2026, The Isaac Lab Arena Project Developers (https://github.com/isaac-sim/IsaacLab-Arena/blob/main/CONTRIBUTORS.md).
# All rights reserved.
#
# SPDX-License-Identifier: Apache-2.0

FROM nvcr.io/nvidia/pytorch:25.04-py3

# Install system deps needed by av (PyAV) and other packages
RUN apt-get update && apt-get install -y \
ffmpeg libavcodec-dev libavformat-dev libavutil-dev \
libswscale-dev libswresample-dev \
git curl && \
rm -rf /var/lib/apt/lists/*

# Clone dreamzero repo
RUN git clone https://github.com/dreamzero0/dreamzero.git /workspace/dreamzero

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Unpinned git clone makes the image non-reproducible and mutable

The DreamZero repo is cloned from the default branch at build time with no commit SHA or tag. Any push to that branch after the initial build would produce a different image from the same Dockerfile, making it impossible to reproduce a known-good image and creating a silent supply chain risk. Pinning to a specific commit (git clone ... && git checkout <sha>) or a tagged release ensures the image content is stable across rebuilds.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 Improvement — Cloning at HEAD (plus the unpinned websockets install below, while all other deps are version-locked) makes the published inference image non-reproducible: an upstream change to dreamzero could silently alter eval behavior between two builds of the "same" image. Could we pin to a specific commit or tag?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we lock a commit HASH?


WORKDIR /workspace/dreamzero

# Install dreamzero package without re-resolving deps (container already has torch etc.)
RUN pip install -e . --no-deps

# Install all runtime deps
RUN pip install -U huggingface_hub && \
pip install \
tyro \
"tianshou==0.5.1" \
"transformers==4.51.3" \
"python-socketio>=5.13.0" \
websockets \
"einops==0.8.1" \
hydra-core \
"openpi-client==0.1.1" \
"imageio==2.34.2" \
imageio-ffmpeg \
"diffusers==0.30.2" \
"peft==0.5.0" \
sentencepiece \
"albumentations==1.4.18" \
ftfy \
tiktoken \
wandb \
loguru \
msgpack \
msgpack-numpy \
lark \
termcolor \
av

ENV NO_ALBUMENTATIONS_UPDATE=1
49 changes: 49 additions & 0 deletions isaaclab_arena_dreamzero/docker/dreamzero_inference_server.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# Copyright (c) 2026, The Isaac Lab Arena Project Developers (https://github.com/isaac-sim/IsaacLab-Arena/blob/main/CONTRIBUTORS.md).
# All rights reserved.
#
# SPDX-License-Identifier: Apache-2.0

workflow:
name: dreamzero-inference-server
pool: isaac-dev-h100-01

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it only fit 2 x h100? Can it fit a few l40s?
Can you put a inline note/link explaining gpu reqs?
Asking because sometimes l40s pool is less crowded than h100 pool.


resources:
default:
cpu: 16
gpu: 2
memory: 256Gi
storage: 250Gi

tasks:
- name: serve
image: nvcr.io/nvidian/dreamzero_inference_server:latest
credentials:
nvcr-nvidian: {}
command: ["bash", "-c"]
args:
- |
set -e
echo "=== Download checkpoint ==="
cd /workspace/dreamzero
hf download GEAR-Dreams/DreamZero-DROID \
--repo-type model \
--local-dir ./checkpoints/DreamZero-DROID

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, shall it be baked into somewhere static in case there is version changes? And we are not dependent on HF API every eval job.
Like the sever docker image itself, or swiftstack/S3 bucket (how-to) + link to the inputs field (how)?

echo "=== Launch inference server ==="
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.run \
--standalone \
--nproc_per_node=2 \
socket_test_optimized_AR.py \
--port {{ port }} \
--enable-dit-cache \
--model-path ./checkpoints/DreamZero-DROID
environment:
HF_TOKEN: "{{ hf_token }}"
NCCL_DEBUG: INFO
NO_ALBUMENTATIONS_UPDATE: "1"

default-values:
port: "5000"
hf_token: ""
64 changes: 64 additions & 0 deletions isaaclab_arena_dreamzero/docker/push_to_ngc.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
#!/usr/bin/env bash
# Copyright (c) 2026, The Isaac Lab Arena Project Developers (https://github.com/isaac-sim/IsaacLab-Arena/blob/main/CONTRIBUTORS.md).
# All rights reserved.
#
# SPDX-License-Identifier: Apache-2.0
set -euo pipefail

IMAGE_NAME=dreamzero_inference_server
TAG_NAME=latest
NGC_ORG=nvidian

usage() {
cat <<EOF
Usage: $0 [OPTIONS]
Builds the DreamZero inference server image and pushes it to NGC.
Requires docker login to nvcr.io first:
docker login nvcr.io -u '\$oauthtoken' -p <YOUR_NGC_API_KEY>
Options:
-t, --tag TAG Image tag. Default: latest.
-n, --image-name NAME Override image name. Default: dreamzero_inference_server.
-h, --help Show this help message.
EOF
}

while [ "$#" -gt 0 ]; do
case "$1" in
-t|--tag)
TAG_NAME="${2:?Missing value for $1}"
shift 2
;;
-n|--image-name)
IMAGE_NAME="${2:?Missing value for $1}"
shift 2
;;
-h|--help)
usage
exit 0
;;
*)
echo "Unexpected argument: $1" >&2
usage >&2
exit 1
;;
esac
done

NGC_PATH=nvcr.io/${NGC_ORG}/${IMAGE_NAME}:${TAG_NAME}
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"

echo "Building image: ${IMAGE_NAME}:${TAG_NAME}"
docker build \

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-t "${IMAGE_NAME}:${TAG_NAME}" \
-f "${SCRIPT_DIR}/Dockerfile" \
"${SCRIPT_DIR}"

echo "Tagging as ${NGC_PATH}"
docker tag "${IMAGE_NAME}:${TAG_NAME}" "${NGC_PATH}"

echo "Pushing to ${NGC_PATH}"
docker push "${NGC_PATH}"

echo "Done. Update dreamzero_inference_server.yaml to use image: ${NGC_PATH}"
4 changes: 4 additions & 0 deletions isaaclab_arena_dreamzero/policy/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Copyright (c) 2025-2026, The Isaac Lab Arena Project Developers (https://github.com/isaac-sim/IsaacLab-Arena/blob/main/CONTRIBUTORS.md).
# All rights reserved.
#
# SPDX-License-Identifier: Apache-2.0
Loading
Loading