Skip to content

Commit 3da5b1c

Browse files
committed
Add VLN benchmark: H1 navigation in Matterport 3D with NaVILA VLM
Two-level hierarchical policy for Vision-Language Navigation: - High-level: NaVILA VLM generates velocity commands from RGB images - Low-level: RSL-RL locomotion policy converts to joint actions Code organization follows Arena patterns: - isaaclab_arena/embodiments/h1/ Standard H1 + VLN extension - isaaclab_arena/tasks/ VlnR2rMatterportTask - isaaclab_arena/metrics/ SPL, Success, PathLength, DTG (XY) - isaaclab_arena/assets/ MatterportBackground with lighting - isaaclab_arena/policy/vln/ VlnVlmLocomotionPolicy (client) - isaaclab_arena_navila/ NaVilaServerPolicy (server) - isaaclab_arena_environments/ h1_vln_matterport environment Key features: - Auto scene-episode matching via scene_filter - Full image history + uniform sampling for VLM stop detection - Configurable head + follow cameras - Docker VLM server (docker/Dockerfile.vln_server) - VLN-CE R2R dataset support (11 Matterport scenes, 1077 episodes) Verified: success=1.0, SPL=0.77 on zsNo4HB9uLZ scene
1 parent 1f62bd6 commit 3da5b1c

27 files changed

Lines changed: 4623 additions & 3 deletions

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,9 @@
3939
*.*~
4040
TAGS
4141

42+
# Vim swap files
43+
*.sw*
44+
4245
# Build files
4346
*build
4447
*install

.vscode/settings.json

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,17 +2,28 @@
22
"python.analysis.extraPaths": [
33
"${workspaceFolder}",
44
"${workspaceFolder}/submodules/IsaacLab/source/isaaclab",
5+
"${workspaceFolder}/submodules/IsaacLab/source/isaaclab_tasks",
56
"${workspaceFolder}/submodules/IsaacLab/source/isaaclab_assets",
67
"${workspaceFolder}/submodules/IsaacLab/source/isaaclab_rl",
78
"${workspaceFolder}/submodules/IsaacLab/source/isaaclab_mimic",
9+
"${workspaceFolder}/submodules/Isaac-GR00T"
10+
],
11+
"python.autoComplete.extraPaths": [
12+
"${workspaceFolder}",
13+
"${workspaceFolder}/submodules/IsaacLab/source/isaaclab",
814
"${workspaceFolder}/submodules/IsaacLab/source/isaaclab_tasks",
15+
"${workspaceFolder}/submodules/IsaacLab/source/isaaclab_assets",
16+
"${workspaceFolder}/submodules/IsaacLab/source/isaaclab_rl",
17+
"${workspaceFolder}/submodules/IsaacLab/source/isaaclab_mimic",
18+
"${workspaceFolder}/submodules/Isaac-GR00T"
919
],
1020
"cursorpyright.analysis.extraPaths": [
1121
"${workspaceFolder}",
1222
"${workspaceFolder}/submodules/IsaacLab/source/isaaclab",
23+
"${workspaceFolder}/submodules/IsaacLab/source/isaaclab_tasks",
1324
"${workspaceFolder}/submodules/IsaacLab/source/isaaclab_assets",
1425
"${workspaceFolder}/submodules/IsaacLab/source/isaaclab_rl",
1526
"${workspaceFolder}/submodules/IsaacLab/source/isaaclab_mimic",
16-
"${workspaceFolder}/submodules/IsaacLab/source/isaaclab_tasks",
27+
"${workspaceFolder}/submodules/Isaac-GR00T"
1728
]
1829
}

docker/Dockerfile.vln_server

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
FROM nvcr.io/nvidia/pytorch:24.02-py3
2+
3+
WORKDIR /workspace
4+
5+
RUN apt-get update && apt-get install -y git && rm -rf /var/lib/apt/lists/*
6+
7+
RUN pip install --no-cache-dir pyzmq msgpack Pillow
8+
9+
# NaVILA / LLaVA — clone and install per upstream instructions:
10+
# https://github.com/yang-zj1026/NaVILA-Bench#vla-evaluation
11+
RUN git clone https://github.com/AnjieCheng/NaVILA.git /workspace/NaVILA && \
12+
cd /workspace/NaVILA && \
13+
pip install --no-cache-dir -e . && \
14+
pip install --no-cache-dir -e ".[train]" && \
15+
pip install --no-cache-dir -e ".[eval]" && \
16+
pip install --no-cache-dir git+https://github.com/huggingface/transformers@v4.37.2 && \
17+
SITE_PKG=$(python -c 'import site; print(site.getsitepackages()[0])') && \
18+
cp -rv ./llava/train/transformers_replace/* "$SITE_PKG/transformers/" && \
19+
cp -rv ./llava/train/deepspeed_replace/* "$SITE_PKG/deepspeed/"
20+
21+
# NaVILA pins torch==2.3.0 (PyPI release) which replaces the pre-release
22+
# torch 2.3.0a0 from the base image, breaking any native extensions that
23+
# were compiled against the pre-release. Remove them and rebuild.
24+
RUN pip uninstall -y transformer-engine transformer-engine-extensions 2>/dev/null; true
25+
26+
# flash-attn must be installed AFTER NaVILA so it compiles against the
27+
# correct PyTorch version (2.3.0 release, not the pre-release 2.3.0a0).
28+
RUN pip install --no-cache-dir flash-attn==2.5.8 || \
29+
echo "WARNING: flash-attn build failed, VLM will run without it"
30+
31+
# Arena code last so that code-only changes rebuild in seconds
32+
COPY isaaclab_arena/remote_policy /workspace/isaaclab_arena/remote_policy
33+
COPY isaaclab_arena/__init__.py /workspace/isaaclab_arena/__init__.py
34+
COPY isaaclab_arena_navila /workspace/isaaclab_arena_navila
35+
36+
ENV PYTHONPATH=/workspace
37+
38+
ENTRYPOINT ["python", "-u", "-m", "isaaclab_arena.remote_policy.remote_policy_server_runner"]

docker/run_vln_server.sh

Lines changed: 200 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,200 @@
1+
#!/usr/bin/env bash
2+
set -euo pipefail
3+
4+
# -------------------------
5+
# User-configurable defaults
6+
# -------------------------
7+
8+
# Default mount directories on the host machine
9+
DATASETS_DIR="${DATASETS_DIR:-$HOME/datasets}"
10+
MODELS_DIR="${MODELS_DIR:-$HOME/models}"
11+
EVAL_DIR="${EVAL_DIR:-$HOME/eval}"
12+
13+
# Docker image name and tag for the VLN policy server
14+
DOCKER_IMAGE_NAME="${DOCKER_IMAGE_NAME:-vln_policy_server}"
15+
DOCKER_VERSION_TAG="${DOCKER_VERSION_TAG:-latest}"
16+
17+
# Rebuild controls
18+
FORCE_REBUILD="${FORCE_REBUILD:-false}"
19+
NO_CACHE=""
20+
21+
# Server parameters (can also be overridden via environment variables)
22+
HOST="${HOST:-0.0.0.0}"
23+
PORT="${PORT:-5555}"
24+
API_TOKEN="${API_TOKEN:-}"
25+
TIMEOUT_MS="${TIMEOUT_MS:-15000}"
26+
POLICY_TYPE="${POLICY_TYPE:-isaaclab_arena_navila.navila_server_policy.NaVilaServerPolicy}"
27+
28+
# GPU selection for docker --gpus (can also be overridden via environment variables)
29+
# Examples:
30+
# all -> use all GPUs
31+
# 1 -> use 1 GPU (count)
32+
# "device=0" -> use GPU 0
33+
# "device=0,1" -> use GPU 0 and 1
34+
GPUS="${GPUS:-all}"
35+
36+
# -------------------------
37+
# Help message
38+
# -------------------------
39+
usage() {
40+
script_name=$(basename "$0")
41+
cat <<EOF
42+
Helper script to build and run the VLN policy server Docker environment.
43+
44+
Usage:
45+
$script_name [options] [-- server-args...]
46+
47+
Options (Docker / paths; env vars with the same name take precedence):
48+
-v Verbose output (set -x).
49+
-d <datasets directory> Path to datasets on the host. Default: "$DATASETS_DIR".
50+
-m <models directory> Path to models on the host. Default: "$MODELS_DIR".
51+
-e <eval directory> Path to evaluation data on the host. Default: "$EVAL_DIR".
52+
-n <docker name> Docker image name. Default: "$DOCKER_IMAGE_NAME".
53+
-g <gpus> GPU selection for docker --gpus. Default: "all".
54+
Examples: "all", "1", "device=0", "device=0,1".
55+
-r Force rebuilding of the Docker image.
56+
-R Force rebuilding of the Docker image, without cache.
57+
58+
Server-specific options (passed through to the policy server entrypoint):
59+
--host HOST
60+
--port PORT
61+
--api_token TOKEN
62+
--timeout_ms MS
63+
--policy_type TYPE
64+
--model_path PATH
65+
--num_video_frames N
66+
--conv_mode MODE
67+
68+
Examples:
69+
# Minimal: use all defaults (model at default path, all GPUs, port 5555)
70+
bash $script_name
71+
72+
# Custom model path and single GPU
73+
bash $script_name -m /path/to/navila-checkpoint -g "device=0"
74+
75+
# Custom port
76+
bash $script_name --port 6000
77+
EOF
78+
}
79+
80+
# -------------------------
81+
# Parse all options
82+
# -------------------------
83+
# Server parameters that can be overridden individually via --flags.
84+
# Unset values will be filled with defaults after parsing.
85+
MODEL_PATH=""
86+
87+
while [[ $# -gt 0 ]]; do
88+
case "$1" in
89+
-v) set -x; shift 1 ;;
90+
-d) DATASETS_DIR="$2"; shift 2 ;;
91+
-m) MODELS_DIR="$2"; shift 2 ;;
92+
-e) EVAL_DIR="$2"; shift 2 ;;
93+
-n) DOCKER_IMAGE_NAME="$2"; shift 2 ;;
94+
-g) GPUS="$2"; shift 2 ;;
95+
-r) FORCE_REBUILD="true"; shift 1 ;;
96+
-R) FORCE_REBUILD="true"; NO_CACHE="--no-cache"; shift 1 ;;
97+
-h|--help) usage; exit 0 ;;
98+
--host) HOST="$2"; shift 2 ;;
99+
--port) PORT="$2"; shift 2 ;;
100+
--api_token) API_TOKEN="$2"; shift 2 ;;
101+
--timeout_ms) TIMEOUT_MS="$2"; shift 2 ;;
102+
--policy_type) POLICY_TYPE="$2"; shift 2 ;;
103+
--model_path) MODEL_PATH="$2"; shift 2 ;;
104+
--num_video_frames) NUM_VIDEO_FRAMES="$2"; shift 2 ;;
105+
--conv_mode) CONV_MODE="$2"; shift 2 ;;
106+
--policy_device) POLICY_DEVICE="$2"; shift 2 ;;
107+
*)
108+
echo "Unknown option: $1" >&2
109+
usage
110+
exit 1
111+
;;
112+
esac
113+
done
114+
115+
# Build server args: always include all parameters, using defaults for
116+
# anything the user did not override.
117+
SERVER_ARGS=(
118+
--host "${HOST}"
119+
--port "${PORT}"
120+
--timeout_ms "${TIMEOUT_MS}"
121+
--policy_type "${POLICY_TYPE}"
122+
--model_path "${MODEL_PATH:-/models}"
123+
)
124+
[ -n "${API_TOKEN}" ] && SERVER_ARGS+=(--api_token "${API_TOKEN}")
125+
[ -n "${NUM_VIDEO_FRAMES+x}" ] && SERVER_ARGS+=(--num_video_frames "${NUM_VIDEO_FRAMES}")
126+
[ -n "${CONV_MODE+x}" ] && SERVER_ARGS+=(--conv_mode "${CONV_MODE}")
127+
[ -n "${POLICY_DEVICE+x}" ] && SERVER_ARGS+=(--policy_device "${POLICY_DEVICE}")
128+
129+
echo "Host paths:"
130+
echo " DATASETS_DIR = ${DATASETS_DIR}"
131+
echo " MODELS_DIR = ${MODELS_DIR}"
132+
echo " EVAL_DIR = ${EVAL_DIR}"
133+
echo "Docker image:"
134+
echo " ${DOCKER_IMAGE_NAME}:${DOCKER_VERSION_TAG}"
135+
echo "GPU:"
136+
echo " --gpus ${GPUS}"
137+
echo "Rebuild:"
138+
echo " FORCE_REBUILD = ${FORCE_REBUILD}, NO_CACHE = '${NO_CACHE}'"
139+
echo "Server args:"
140+
printf ' %q ' "${SERVER_ARGS[@]}"; echo
141+
142+
# -------------------------
143+
# 1) Build the Docker image
144+
# -------------------------
145+
146+
IMAGE_TAG_FULL="${DOCKER_IMAGE_NAME}:${DOCKER_VERSION_TAG}"
147+
148+
SHOULD_BUILD=false
149+
150+
if [ "${FORCE_REBUILD}" = "true" ]; then
151+
SHOULD_BUILD=true
152+
else
153+
if [ -z "$(docker images -q "${IMAGE_TAG_FULL}")" ]; then
154+
SHOULD_BUILD=true
155+
fi
156+
fi
157+
158+
if [ "${SHOULD_BUILD}" = "true" ]; then
159+
echo "Building Docker image ${IMAGE_TAG_FULL}..."
160+
# Use existing image layers as cache source (BuildKit may GC intermediate
161+
# layer cache, but the final image layers can still be reused).
162+
CACHE_FROM_ARGS=""
163+
if [ -n "$(docker images -q "${IMAGE_TAG_FULL}" 2>/dev/null)" ]; then
164+
CACHE_FROM_ARGS="--cache-from ${IMAGE_TAG_FULL}"
165+
fi
166+
docker build \
167+
${NO_CACHE} \
168+
${CACHE_FROM_ARGS} \
169+
--network host \
170+
-f docker/Dockerfile.vln_server \
171+
-t "${IMAGE_TAG_FULL}" \
172+
.
173+
else
174+
echo "Docker image ${IMAGE_TAG_FULL} already exists. Skipping rebuild."
175+
echo "Use -r or -R to force rebuilding the image."
176+
fi
177+
178+
# -------------------------
179+
# 2) Run the container
180+
# -------------------------
181+
182+
DOCKER_RUN_ARGS=(
183+
--rm
184+
--gpus "${GPUS}"
185+
--net host
186+
--name vln_policy_server_container
187+
-v "${MODELS_DIR}":/models
188+
)
189+
190+
if [ -d "${DATASETS_DIR}" ]; then
191+
DOCKER_RUN_ARGS+=(-v "${DATASETS_DIR}":/datasets)
192+
fi
193+
194+
if [ -d "${EVAL_DIR}" ]; then
195+
DOCKER_RUN_ARGS+=(-v "${EVAL_DIR}":/eval)
196+
fi
197+
198+
docker run "${DOCKER_RUN_ARGS[@]}" \
199+
"${IMAGE_TAG_FULL}" \
200+
"${SERVER_ARGS[@]}"
Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,97 @@
1+
# Copyright (c) 2025-2026, The Isaac Lab Arena Project Developers.
2+
# All rights reserved.
3+
#
4+
# SPDX-License-Identifier: Apache-2.0
5+
6+
"""Matterport 3D scene background for VLN tasks.
7+
8+
Uses standard ``spawn_from_usd`` for visual rendering of the Matterport
9+
scene, plus an invisible ground plane at z=0 for collision. This provides
10+
both correct rendering and physics without relying on mesh collision
11+
(which doesn't work with the current IsaacLab GPU physics).
12+
"""
13+
14+
from __future__ import annotations
15+
16+
import isaaclab.sim as sim_utils
17+
from isaaclab.sim.utils import clone
18+
19+
from isaaclab_arena.assets.background_library import LibraryBackground
20+
from isaaclab_arena.assets.register import register_asset
21+
from isaaclab_arena.utils.pose import Pose
22+
23+
24+
@clone
25+
def _spawn_matterport_with_ground(prim_path, cfg, *args, **kwargs):
26+
"""Spawn Matterport USD + ground plane + indoor lighting.
27+
28+
Matterport 3D scans contain geometry and textures but no light
29+
sources. This function adds everything needed to render and
30+
simulate inside a Matterport scene:
31+
- The USD scene itself (visual mesh with textures).
32+
- An invisible ground plane at z=0 for physics collision.
33+
- Indoor lighting matching NaVILA-Bench:
34+
DomeLight(500) — ambient fill under ceilings
35+
DistantLight(1000) — directional light
36+
DiskLight x2(10000) — area lights at ceiling height
37+
"""
38+
from isaaclab.sim.spawners.from_files.from_files import _spawn_from_usd_file
39+
40+
prim = _spawn_from_usd_file(prim_path, cfg.usd_path, cfg, *args, **kwargs)
41+
42+
# Invisible ground plane at z=0 for physics collision
43+
ground_cfg = sim_utils.GroundPlaneCfg(
44+
physics_material=sim_utils.RigidBodyMaterialCfg(
45+
static_friction=1.0,
46+
dynamic_friction=1.0,
47+
friction_combine_mode="multiply",
48+
restitution_combine_mode="multiply",
49+
),
50+
visible=False,
51+
)
52+
ground_cfg.func("/World/GroundPlane", ground_cfg)
53+
54+
# Indoor lighting — Matterport scenes have no built-in lights.
55+
# Matches NaVILA-Bench h1_matterport_base_cfg.py lighting setup.
56+
sim_utils.DomeLightCfg(intensity=500.0, color=(1.0, 1.0, 1.0)).func(
57+
"/World/MatterportDomeLight", sim_utils.DomeLightCfg(intensity=500.0, color=(1.0, 1.0, 1.0))
58+
)
59+
sim_utils.DistantLightCfg(intensity=1000.0, color=(1.0, 1.0, 1.0)).func(
60+
"/World/MatterportDistantLight", sim_utils.DistantLightCfg(intensity=1000.0, color=(1.0, 1.0, 1.0))
61+
)
62+
disk_cfg = sim_utils.DiskLightCfg(intensity=10000.0, color=(1.0, 1.0, 1.0), radius=50.0)
63+
disk_cfg.func("/World/MatterportDisk1", disk_cfg)
64+
disk_cfg.func("/World/MatterportDisk2", disk_cfg)
65+
# Position the disk lights at ceiling height
66+
from pxr import UsdGeom, Gf
67+
stage = prim.GetStage()
68+
for path, pos in [("/World/MatterportDisk1", (0.0, 0.0, 2.6)), ("/World/MatterportDisk2", (-1.0, 0.0, 2.6))]:
69+
disk_prim = stage.GetPrimAtPath(path)
70+
if disk_prim.IsValid():
71+
xformable = UsdGeom.Xformable(disk_prim)
72+
ops = xformable.GetOrderedXformOps()
73+
for op in ops:
74+
if op.GetOpName() == "xformOp:translate":
75+
op.Set(Gf.Vec3d(*pos))
76+
break
77+
else:
78+
xformable.AddTranslateOp().Set(Gf.Vec3d(*pos))
79+
80+
print(f"[MatterportBackground] Scene at {prim_path} + ground plane + lighting")
81+
return prim
82+
83+
84+
@register_asset
85+
class MatterportBackground(LibraryBackground):
86+
"""Matterport 3D scene with invisible ground plane for collision."""
87+
88+
name = "matterport"
89+
tags = ["background"]
90+
usd_path = None
91+
initial_pose = Pose.identity()
92+
object_min_z = -0.5
93+
94+
def __init__(self, usd_path: str):
95+
self.usd_path = usd_path
96+
self.spawn_cfg_addon = {"func": _spawn_matterport_with_ground}
97+
super().__init__()

isaaclab_arena/embodiments/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,3 +9,4 @@
99
from .g1.g1 import *
1010
from .galbot.galbot import *
1111
from .gr1t2.gr1t2 import *
12+
from .h1.h1 import *
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
# Copyright (c) 2025-2026, The Isaac Lab Arena Project Developers.
2+
# All rights reserved.
3+
#
4+
# SPDX-License-Identifier: Apache-2.0

0 commit comments

Comments
 (0)