Skip to content

feat(examples): MuJoCo + 3DGS hybrid-render example (SO-101 + agentic GR00T-on-LIBERO)#58

Open
yinsong1986 wants to merge 5 commits into
strands-labs:mainfrom
yinsong1986:feat/mujoco-gs-example
Open

feat(examples): MuJoCo + 3DGS hybrid-render example (SO-101 + agentic GR00T-on-LIBERO)#58
yinsong1986 wants to merge 5 commits into
strands-labs:mainfrom
yinsong1986:feat/mujoco-gs-example

Conversation

@yinsong1986
Copy link
Copy Markdown
Contributor

What

Adds examples/mujoco_gs/ — a Python take on
MuJoCo-GS-Web (MuJoCo physics
composited against a photoreal panorama / 3DGS background) on top of the
strands-robots Simulation AgentTool, plus a real, agentic
GR00T-on-LIBERO
demo.

SO-101 hybrid-render demo — app.py

  • Strands Agent (natural language → Simulation actions) + scripted motion.
  • Depth-aware GS/panorama compositing (HybridCompositor).
  • Near-real-time MJPEG live view (/live route, proxy-friendly) + MP4 clips.
  • Optional real 3DGS background via gsplat (.ply), behind an extra.

Agentic GR00T + LIBERO demo — app_groot_libero.py / groot_libero.py / libero_groot.py

  • A Strands Agent picks evaluate_benchmark off the Simulation tool surface
    and a real NVIDIA GR00T N1.7 policy drives a Franka Panda through a
    LIBERO task; live view + clip + success_rate.
  • Verified success_rate = 1.00 on
    libero-10-LIVING_ROOM_SCENE5_put_the_white_mug_… against
    nvidia/GR00T-N1.7-LIBERO (libero_10).

Success recipe (documented in the README)

Match the task suite to the served checkpoint; leave max_steps at the adapter
default (LIBERO-Long needs ~500 steps); pre-warm the scene
(generate BDDL → load_sceneprewarm); auto-pick the robot; default
action_horizon. The compositor applies the LIBERO viz_option so the arm
renders clean (no collision-geom/site debug patches).

Notes

  • GR00T is embodiment-locked (Panda/LIBERO); ZMQ-only client; libero +
    robosuite required for that demo.
  • The two demos are independent (separate apps / ports); the SO-101 hybrid GS
    look shines on open scenes, less so behind enclosed LIBERO scenes.
  • Rendering runs on a single thread (EGL-safe across the Gradio worker / agent
    threads).

Closes #57.

… GR00T-on-LIBERO)

Adds examples/mujoco_gs/: a Python take on MuJoCo-GS-Web (MuJoCo physics
composited against a photoreal panorama / 3DGS background) on top of the
strands-robots Simulation AgentTool, plus a real, agentic GR00T-on-LIBERO demo.

- SO-101 hybrid-render app (app.py): Strands agent + scripted motion, depth-
  aware GS/panorama compositing, near-real-time MJPEG live view, MP4 clips,
  optional gsplat .ply background.
- Agentic GR00T + LIBERO (app_groot_libero.py / groot_libero.py / libero_groot.py):
  an Agent picks evaluate_benchmark off the Simulation tool surface and a real
  GR00T N1.7 policy drives a Franka Panda; live view + clip + success_rate.
  Verified success_rate=1.00 on libero-10 LIVING_ROOM_SCENE5 (white mug).
- HybridCompositor renders on a single thread (EGL-safe) and applies the LIBERO
  viz_option so the arm renders without collision-geom/site debug patches.

Success recipe documented in README: match task suite to the served checkpoint,
leave max_steps at the adapter default, pre-warm the scene, auto-pick the robot,
default action_horizon.
@yinsong1986 yinsong1986 requested a review from cagataycali May 31, 2026 01:51
The agentic GrootLiberoRunner streamed the live MJPEG via a concurrent
render thread, which contends with the eval and stalls the MJPEG-serving
thread — the live view froze the instant a run started. Capture frames
synchronously inside evaluate_benchmark's on_frame (wrapped in a one-shot
run_libero_eval @tool the agent invokes) instead; the /live stream now
updates continuously (verified: frames grow ~15fps with visible motion).
…API only

Drop the custom `animate` and `hybrid_render` agent tools. The agent now
gets only the real strands-robots `Simulation` AgentTool, so 'have the arm
wave' goes through the genuine policy engine — `run_policy(robot_name='arm',
policy_provider='mock', duration=4.0, control_frequency=20.0)` — not a
scripted helper. The 3DGS/panorama compositing stays as the example's display
layer (live MJPEG view + still preview via HybridCompositor), not an agent
tool. Removes the scripted-trajectory helpers and the 'Record motion clip'
button/video panel; motion is shown in the live view. README updated.
Remove leftover state from the old concurrent live_loop (replaced by the
synchronous on_frame): self._lock, self.latest_rgb, and self.running were
set but never read. Drop the now-unused threading import. latest_jpeg (the
live MJPEG buffer) is kept. No behavior change.
…esets

Fix GsplatBackground so it actually renders a real .ply (verified on the
bonsai Mip-NeRF360 scene): correct the rasterization output unpacking
(RGB+D returns (H,W,4) in render_colors) and add the MuJoCo/OpenGL ->
gsplat/OpenCV camera-convention flip (was rendering all-black).

Add downloadable scene presets (GSPLAT_SCENES: bonsai/bicycle/stump from
HuggingFace), download_gsplat_scene() with on-demand caching, and an
optional auto 'backdrop' transform (PCA upright + scale + centre). NOTE:
auto-alignment is approximate — a captured scene's frame doesn't match the
SO-101 camera viewpoints, so a clean backdrop still needs per-scene tuning.
Copy link
Copy Markdown
Member

@cagataycali cagataycali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed end-to-end — README, app.py, agent.py, compositor.py, backgrounds.py, groot_libero.py, pyproject.toml. Genuinely impressive scope: real 3DGS rasterization with CV/GL convention flip + auto-backdrop PCA fit, single-thread EGL render executor, MJPEG live route to bypass Gradio's SSE buffering, agentic LIBERO via a one-shot wrapping tool, scene preset HF downloads. The single-render-thread design with cached (W,H) renderers + cached CameraParams to keep mj_forward off the per-frame path is the right call — very clean.

Three concrete findings worth a follow-up. None block merge as an example, but #1 and #2 are real bugs.

1. _extract_agent_text ternary precedence drops the attribute-message path (agent.py:261)

content = getattr(msg, "content", None) or msg.get("content") if isinstance(msg, dict) else None

Python parses this as (getattr(...) or msg.get(...)) if isinstance(msg, dict) else None, so when msg is a Strands AgentResult.message attribute-style object (the common case, not a dict), isinstance(msg, dict) is False and the whole expression returns None — the getattr branch the comment two lines above advertises is never reached. Verified locally:

class M:
    content = [{"text": "hi"}]
msg = M()
print((getattr(msg, "content", None) or msg.get("content") if isinstance(msg, dict) else None))
# -> None  (expected: [{"text": "hi"}])

User-visible effect: chat panel sometimes falls through to str(result) and shows a repr instead of clean text. Note groot_libero.py:_extract_text (line ~250) does this correctly with an explicit if msg is not None: ... if isinstance(msg, dict): content = msg.get("content") — worth porting that shape back here. One-line fix:

if isinstance(msg, dict):
    content = msg.get("content")
else:
    content = getattr(msg, "content", None)

2. GrootLiberoRunner.run leaks the Simulation on the exception path (groot_libero.py:~210)

The except Exception branch returns early without calling compositor.close() / sim.destroy(); cleanup is only on the success path. If evaluate_benchmark raises (GR00T server unreachable, BDDL gen fails, EGL context error — all observed in similar examples), the MuJoCo model + the compositor's render-executor thread + cached renderers leak across runs of the Gradio app. After a few failures the process holds N stale EGL contexts and MJ models.

Standard try/finally pattern:

sim = Simulation(tool_name="libero_sim", mesh=False)
compositor = HybridCompositor(sim, background=PanoramaBackground())
try:
    # ...build, eval, encode...
    return {"task": ..., "success_rate": ..., ...}
except Exception as e:
    logger.exception("Agentic GR00T run failed.")
    return {"error": f"{type(e).__name__}: {e}", "task": task, "instruction": instruction}
finally:
    try:
        compositor.close()
    except Exception:
        pass
    try:
        sim.destroy()
    except Exception:
        pass

Matches MujocoGsAgent.close() discipline at agent.py:~150. Same shape would benefit app.py's top-level holder lifecycle if the Gradio app ever crashes while a clip is mid-render.

3. GsplatBackground(device="cuda") hard-defaults with no graceful CPU/availability check (backgrounds.py:~265)

The README [gsplat] extra installs torch>=2.1 without a CUDA constraint, so a user on a CPU-only machine (or a CUDA box where torch was installed for the wrong CUDA version) will pip-install successfully, then hit a confusing RuntimeError: CUDA error: no CUDA-capable device is detected deep inside gsplat.rasterization on the first frame after they upload a .ply.

A cheap pre-check in _load():

if self._device.startswith("cuda"):
    try:
        import torch
        if not torch.cuda.is_available():
            raise RuntimeError(
                "GsplatBackground(device='cuda') was requested but torch.cuda.is_available() "
                "is False. Install a CUDA-matched torch build or pass device='cpu' "
                "(slow but functional). See README → Limitations."
            )
    except ImportError:
        pass  # the import-checked one below will give the canonical hint

Docstring already says "CUDA-only in practice"; this just makes the failure mode loud at the right layer instead of generic-PyTorch-error two stack frames deep. Same posture as your _load() already takes for the gsplat/torch ImportError.


Smaller observations (informational, no action requested)

  • app.py:~365 mounts /live after demo.launch(prevent_thread_lock=True). Works because Gradio's uvicorn instance picks up new routes before the first inbound request lands, but it's order-fragile: a user who refreshes the page faster than demo.launch() returns gets a 404 on /live once. Mounting before launch (Gradio exposes demo.app after Blocks(...) is finalized) would be slightly more robust.

  • The SCENE_DESCRIPTION import + make_scene_description rebuild in agent.py:build() means the system prompt has the actual loaded robot config baked in — nice touch, fixes the "agent thinks it has SO-101 when it really got Panda fallback" failure mode.

  • README pip install '.[gsplat]' and requirements.txt install paths are both documented — worth adding a one-liner that the .[gsplat] path also installs gradio etc. via the strands-robots dependency chain, so users don't have to do requirements.txt plus [gsplat].

LGTM as an example contribution — #1 and #2 are worth a follow-up commit, #3 is hardening polish.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

Add MuJoCo + 3DGS hybrid-render example (SO-101 + agentic GR00T-on-LIBERO)

2 participants