feat(examples): MuJoCo + 3DGS hybrid-render example (SO-101 + agentic GR00T-on-LIBERO)#58
feat(examples): MuJoCo + 3DGS hybrid-render example (SO-101 + agentic GR00T-on-LIBERO)#58yinsong1986 wants to merge 5 commits into
Conversation
… GR00T-on-LIBERO) Adds examples/mujoco_gs/: a Python take on MuJoCo-GS-Web (MuJoCo physics composited against a photoreal panorama / 3DGS background) on top of the strands-robots Simulation AgentTool, plus a real, agentic GR00T-on-LIBERO demo. - SO-101 hybrid-render app (app.py): Strands agent + scripted motion, depth- aware GS/panorama compositing, near-real-time MJPEG live view, MP4 clips, optional gsplat .ply background. - Agentic GR00T + LIBERO (app_groot_libero.py / groot_libero.py / libero_groot.py): an Agent picks evaluate_benchmark off the Simulation tool surface and a real GR00T N1.7 policy drives a Franka Panda; live view + clip + success_rate. Verified success_rate=1.00 on libero-10 LIVING_ROOM_SCENE5 (white mug). - HybridCompositor renders on a single thread (EGL-safe) and applies the LIBERO viz_option so the arm renders without collision-geom/site debug patches. Success recipe documented in README: match task suite to the served checkpoint, leave max_steps at the adapter default, pre-warm the scene, auto-pick the robot, default action_horizon.
The agentic GrootLiberoRunner streamed the live MJPEG via a concurrent render thread, which contends with the eval and stalls the MJPEG-serving thread — the live view froze the instant a run started. Capture frames synchronously inside evaluate_benchmark's on_frame (wrapped in a one-shot run_libero_eval @tool the agent invokes) instead; the /live stream now updates continuously (verified: frames grow ~15fps with visible motion).
…API only Drop the custom `animate` and `hybrid_render` agent tools. The agent now gets only the real strands-robots `Simulation` AgentTool, so 'have the arm wave' goes through the genuine policy engine — `run_policy(robot_name='arm', policy_provider='mock', duration=4.0, control_frequency=20.0)` — not a scripted helper. The 3DGS/panorama compositing stays as the example's display layer (live MJPEG view + still preview via HybridCompositor), not an agent tool. Removes the scripted-trajectory helpers and the 'Record motion clip' button/video panel; motion is shown in the live view. README updated.
Remove leftover state from the old concurrent live_loop (replaced by the synchronous on_frame): self._lock, self.latest_rgb, and self.running were set but never read. Drop the now-unused threading import. latest_jpeg (the live MJPEG buffer) is kept. No behavior change.
…esets Fix GsplatBackground so it actually renders a real .ply (verified on the bonsai Mip-NeRF360 scene): correct the rasterization output unpacking (RGB+D returns (H,W,4) in render_colors) and add the MuJoCo/OpenGL -> gsplat/OpenCV camera-convention flip (was rendering all-black). Add downloadable scene presets (GSPLAT_SCENES: bonsai/bicycle/stump from HuggingFace), download_gsplat_scene() with on-demand caching, and an optional auto 'backdrop' transform (PCA upright + scale + centre). NOTE: auto-alignment is approximate — a captured scene's frame doesn't match the SO-101 camera viewpoints, so a clean backdrop still needs per-scene tuning.
cagataycali
left a comment
There was a problem hiding this comment.
Reviewed end-to-end — README, app.py, agent.py, compositor.py, backgrounds.py, groot_libero.py, pyproject.toml. Genuinely impressive scope: real 3DGS rasterization with CV/GL convention flip + auto-backdrop PCA fit, single-thread EGL render executor, MJPEG live route to bypass Gradio's SSE buffering, agentic LIBERO via a one-shot wrapping tool, scene preset HF downloads. The single-render-thread design with cached (W,H) renderers + cached CameraParams to keep mj_forward off the per-frame path is the right call — very clean.
Three concrete findings worth a follow-up. None block merge as an example, but #1 and #2 are real bugs.
1. _extract_agent_text ternary precedence drops the attribute-message path (agent.py:261)
content = getattr(msg, "content", None) or msg.get("content") if isinstance(msg, dict) else NonePython parses this as (getattr(...) or msg.get(...)) if isinstance(msg, dict) else None, so when msg is a Strands AgentResult.message attribute-style object (the common case, not a dict), isinstance(msg, dict) is False and the whole expression returns None — the getattr branch the comment two lines above advertises is never reached. Verified locally:
class M:
content = [{"text": "hi"}]
msg = M()
print((getattr(msg, "content", None) or msg.get("content") if isinstance(msg, dict) else None))
# -> None (expected: [{"text": "hi"}])
User-visible effect: chat panel sometimes falls through to str(result) and shows a repr instead of clean text. Note groot_libero.py:_extract_text (line ~250) does this correctly with an explicit if msg is not None: ... if isinstance(msg, dict): content = msg.get("content") — worth porting that shape back here. One-line fix:
if isinstance(msg, dict):
content = msg.get("content")
else:
content = getattr(msg, "content", None)2. GrootLiberoRunner.run leaks the Simulation on the exception path (groot_libero.py:~210)
The except Exception branch returns early without calling compositor.close() / sim.destroy(); cleanup is only on the success path. If evaluate_benchmark raises (GR00T server unreachable, BDDL gen fails, EGL context error — all observed in similar examples), the MuJoCo model + the compositor's render-executor thread + cached renderers leak across runs of the Gradio app. After a few failures the process holds N stale EGL contexts and MJ models.
Standard try/finally pattern:
sim = Simulation(tool_name="libero_sim", mesh=False)
compositor = HybridCompositor(sim, background=PanoramaBackground())
try:
# ...build, eval, encode...
return {"task": ..., "success_rate": ..., ...}
except Exception as e:
logger.exception("Agentic GR00T run failed.")
return {"error": f"{type(e).__name__}: {e}", "task": task, "instruction": instruction}
finally:
try:
compositor.close()
except Exception:
pass
try:
sim.destroy()
except Exception:
passMatches MujocoGsAgent.close() discipline at agent.py:~150. Same shape would benefit app.py's top-level holder lifecycle if the Gradio app ever crashes while a clip is mid-render.
3. GsplatBackground(device="cuda") hard-defaults with no graceful CPU/availability check (backgrounds.py:~265)
The README [gsplat] extra installs torch>=2.1 without a CUDA constraint, so a user on a CPU-only machine (or a CUDA box where torch was installed for the wrong CUDA version) will pip-install successfully, then hit a confusing RuntimeError: CUDA error: no CUDA-capable device is detected deep inside gsplat.rasterization on the first frame after they upload a .ply.
A cheap pre-check in _load():
if self._device.startswith("cuda"):
try:
import torch
if not torch.cuda.is_available():
raise RuntimeError(
"GsplatBackground(device='cuda') was requested but torch.cuda.is_available() "
"is False. Install a CUDA-matched torch build or pass device='cpu' "
"(slow but functional). See README → Limitations."
)
except ImportError:
pass # the import-checked one below will give the canonical hintDocstring already says "CUDA-only in practice"; this just makes the failure mode loud at the right layer instead of generic-PyTorch-error two stack frames deep. Same posture as your _load() already takes for the gsplat/torch ImportError.
Smaller observations (informational, no action requested)
-
app.py:~365mounts/liveafterdemo.launch(prevent_thread_lock=True). Works because Gradio's uvicorn instance picks up new routes before the first inbound request lands, but it's order-fragile: a user who refreshes the page faster thandemo.launch()returns gets a 404 on/liveonce. Mounting before launch (Gradio exposesdemo.appafterBlocks(...)is finalized) would be slightly more robust. -
The
SCENE_DESCRIPTIONimport +make_scene_descriptionrebuild inagent.py:build()means the system prompt has the actual loaded robot config baked in — nice touch, fixes the "agent thinks it has SO-101 when it really got Panda fallback" failure mode. -
README
pip install '.[gsplat]'andrequirements.txtinstall paths are both documented — worth adding a one-liner that the.[gsplat]path also installsgradioetc. via the strands-robots dependency chain, so users don't have to dorequirements.txtplus[gsplat].
LGTM as an example contribution — #1 and #2 are worth a follow-up commit, #3 is hardening polish.
What
Adds
examples/mujoco_gs/— a Python take onMuJoCo-GS-Web (MuJoCo physics
composited against a photoreal panorama / 3DGS background) on top of the
strands-robotsSimulationAgentTool, plus a real, agenticGR00T-on-LIBERO demo.
SO-101 hybrid-render demo —
app.pyAgent(natural language →Simulationactions) + scripted motion.HybridCompositor)./liveroute, proxy-friendly) + MP4 clips.gsplat(.ply), behind an extra.Agentic GR00T + LIBERO demo —
app_groot_libero.py/groot_libero.py/libero_groot.pyAgentpicksevaluate_benchmarkoff theSimulationtool surfaceand a real NVIDIA GR00T N1.7 policy drives a Franka Panda through a
LIBERO task; live view + clip +
success_rate.success_rate = 1.00onlibero-10-LIVING_ROOM_SCENE5_put_the_white_mug_…againstnvidia/GR00T-N1.7-LIBERO(libero_10).Success recipe (documented in the README)
Match the task suite to the served checkpoint; leave
max_stepsat the adapterdefault (LIBERO-Long needs ~500 steps); pre-warm the scene
(generate BDDL →
load_scene→prewarm); auto-pick the robot; defaultaction_horizon. The compositor applies the LIBEROviz_optionso the armrenders clean (no collision-geom/site debug patches).
Notes
libero+robosuiterequired for that demo.look shines on open scenes, less so behind enclosed LIBERO scenes.
threads).
Closes #57.