|
1 | | -# Isaac-Lab Arena |
2 | | - |
3 | | -**A scalable environment creation and evaluation framework for robotics simulations built on top of NVIDIA Isaac Lab** |
4 | | - |
5 | | -</div> |
6 | | - |
7 | | -Isaac-Lab Arena is a comprehensive robotics simulation framework that enhances NVIDIA Isaac Lab by providing a composable, scalable system for creating diverse simulation environments and evaluating robot learning policies. The framework enables researchers and developers to rapidly prototype and test robotic tasks with various robot embodiments, objects, and environments. |
8 | | - |
9 | | -To get started with Isaac-Lab Arena, see our [documentation site](https://isaac-sim.github.io/IsaacLab-Arena/release/0.1.1/index.html). |
10 | | - |
11 | | -<div align="center"> |
12 | | - |
13 | | -**Isaac-Lab Arena** - Scaling robotic simulation and evaluation for the future |
14 | | - |
15 | | -Made with ❤️ by the NVIDIA Robotics Team |
16 | | - |
17 | | -</div> |
| 1 | +# RoboGate Benchmark for Isaac Lab-Arena |
| 2 | + |
| 3 | +Adversarial 68-scenario pick-and-place validation benchmark with 5 safety metrics and deployment confidence scoring. Contributes the [RoboGate](https://robogate.io) evaluation suite to [Isaac Lab-Arena](https://github.com/isaac-sim/IsaacLab-Arena). |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +RoboGate validates robot manipulation policies before deployment by testing them against 68 progressively harder scenarios across 4 difficulty categories: |
| 8 | + |
| 9 | +| Category | Count | Target SR | Description | |
| 10 | +|----------|-------|-----------|-------------| |
| 11 | +| Nominal | 20 | 95-100% | Standard objects, lighting, centered placement | |
| 12 | +| Edge Cases | 15 | 70-85% | Small/heavy/edge/occluded/transparent objects | |
| 13 | +| Adversarial | 10 | 40-60% | Low light, clutter, slippery, disturbances | |
| 14 | +| Domain Rand | 23 | 85-95% | Lighting/color/position/camera variations | |
| 15 | + |
| 16 | +## Quick Start |
| 17 | + |
| 18 | +### Mock Mode (No GPU Required) |
| 19 | + |
| 20 | +```bash |
| 21 | +cd contrib/isaaclab-arena |
| 22 | + |
| 23 | +# Run scripted policy benchmark |
| 24 | +python scripts/run_benchmark.py --mock --output results/mock_results.json |
| 25 | + |
| 26 | +# Run VLA evaluation |
| 27 | +python scripts/run_vla_eval.py --model octo-small --mock |
| 28 | +``` |
| 29 | + |
| 30 | +### Isaac Lab-Arena Integration |
| 31 | + |
| 32 | +```bash |
| 33 | +# Install |
| 34 | +pip install -e . |
| 35 | + |
| 36 | +# Run with Franka Panda |
| 37 | +python scripts/run_benchmark.py --embodiment franka --config configs/robogate_68.yaml |
| 38 | + |
| 39 | +# Run VLA evaluation with real physics |
| 40 | +python scripts/run_vla_eval.py --model octo-small --embodiment franka --enable-cameras |
| 41 | +``` |
| 42 | + |
| 43 | +### As Isaac Lab-Arena Environment |
| 44 | + |
| 45 | +```python |
| 46 | +from isaaclab_arena.assets.asset_registry import AssetRegistry |
| 47 | +from isaaclab_arena.environments.arena_env_builder import ArenaEnvBuilder |
| 48 | +from robogate_benchmark.environments import RoboGateBenchmarkEnvironment |
| 49 | + |
| 50 | +env_def = RoboGateBenchmarkEnvironment() |
| 51 | +arena_env = env_def.get_env(args_cli) |
| 52 | +builder = ArenaEnvBuilder(arena_env, args_cli) |
| 53 | +env = builder.make_registered() |
| 54 | + |
| 55 | +obs, info = env.reset() |
| 56 | +# ... run your policy ... |
| 57 | +``` |
| 58 | + |
| 59 | +## 5 Safety Metrics |
| 60 | + |
| 61 | +| Metric | Threshold | Weight | |
| 62 | +|--------|-----------|--------| |
| 63 | +| Grasp Success Rate | >= 92% | 0.30 | |
| 64 | +| Cycle Time | <= baseline x 1.1 | 0.20 | |
| 65 | +| Collision Count | == 0 | 0.25 | |
| 66 | +| Drop Rate | <= 3% | 0.15* | |
| 67 | +| Grasp Miss Rate | <= baseline x 1.2 | 0.10* | |
| 68 | + |
| 69 | +*Edge case performance (0.15) and baseline delta (0.10) are computed from scenario summaries. |
| 70 | + |
| 71 | +## Confidence Score (0-100) |
| 72 | + |
| 73 | +Weighted sum of 5 component scores: |
| 74 | + |
| 75 | +- **76-100**: PASS — safe to deploy |
| 76 | +- **51-75**: WARN — deploy with monitoring |
| 77 | +- **0-50**: FAIL — do not deploy |
| 78 | + |
| 79 | +## Baseline & VLA Results |
| 80 | + |
| 81 | +| Model | Params | SR | Confidence | Collisions | Grasp Miss | |
| 82 | +|-------|--------|-----|-----------|------------|-----------| |
| 83 | +| Scripted (IK) | — | **100%** (68/68) | 76/100 | 0 | 0 | |
| 84 | +| OpenVLA (Stanford+TRI) | 7B | 0% (0/68) | 27/100 | 0 | 68 | |
| 85 | +| Octo-Base (UC Berkeley) | 93M | 0% (0/68) | 1/100 | 14 | 54 | |
| 86 | +| Octo-Small (UC Berkeley) | 27M | 0% (0/68) | 1/100 | 14 | 54 | |
| 87 | + |
| 88 | +The 100-point gap across three VLA models (27M→7B, 260× scale) validates RoboGate's ability to discriminate safe vs unsafe policies. Model size is not the bottleneck — training-deployment distribution mismatch is. |
| 89 | + |
| 90 | +## HuggingFace Failure Dictionary |
| 91 | + |
| 92 | +30,720 boundary-focused episodes available at: |
| 93 | +[liveplex/robogate-failure-dictionary](https://huggingface.co/datasets/liveplex/robogate-failure-dictionary) |
| 94 | + |
| 95 | +```python |
| 96 | +from robogate_benchmark.failure_dictionary import download_dataset, analyze_failures |
| 97 | + |
| 98 | +ds = download_dataset(split="test") |
| 99 | +stats = analyze_failures(ds) |
| 100 | +print(stats.success_rate) # ~0.82 |
| 101 | +``` |
| 102 | + |
| 103 | +## VLA Model Support |
| 104 | + |
| 105 | +| Model | Params | Framework | Image Size | Quantization | |
| 106 | +|-------|--------|-----------|------------|--------------| |
| 107 | +| octo-small | 27M | JAX | 256x256 | - | |
| 108 | +| octo-base | 93M | JAX | 256x256 | - | |
| 109 | +| openvla-7b | 7B | PyTorch | 224x224 | 4-bit NF4 | |
| 110 | + |
| 111 | +## File Structure |
| 112 | + |
| 113 | +``` |
| 114 | +contrib/isaaclab-arena/ |
| 115 | +├── README.md |
| 116 | +├── setup.py |
| 117 | +├── robogate_benchmark/ |
| 118 | +│ ├── __init__.py |
| 119 | +│ ├── scenarios.py # 68 scenarios (4 categories x 16 variants) |
| 120 | +│ ├── environments.py # ArenaEnvBuilder integration |
| 121 | +│ ├── metrics.py # 5 safety metrics |
| 122 | +│ ├── confidence_scorer.py # Deployment confidence (0-100) |
| 123 | +│ ├── failure_dictionary.py # HuggingFace 30K dataset |
| 124 | +│ ├── vla_evaluator.py # VLA evaluation pipeline |
| 125 | +│ └── report_generator.py # JSON + text reports |
| 126 | +├── configs/ |
| 127 | +│ ├── robogate_68.yaml # 68-scenario config |
| 128 | +│ ├── franka_panda.yaml # Franka embodiment config |
| 129 | +│ └── ur5e.yaml # UR5e embodiment config |
| 130 | +├── scripts/ |
| 131 | +│ ├── run_benchmark.py # Scripted policy benchmark |
| 132 | +│ └── run_vla_eval.py # VLA model evaluation |
| 133 | +└── results/ |
| 134 | + └── baseline_results.json # Scripted controller baseline |
| 135 | +``` |
| 136 | + |
| 137 | +## Citation |
| 138 | + |
| 139 | +```bibtex |
| 140 | +@misc{agentai2026robogate, |
| 141 | + title = {ROBOGATE: Adaptive Failure Discovery for Safe Robot |
| 142 | + Policy Deployment via Two-Stage Boundary-Focused Sampling}, |
| 143 | + author = {{AgentAI Co., Ltd.}}, |
| 144 | + year = {2026}, |
| 145 | + eprint = {2603.22126}, |
| 146 | + archivePrefix = {arXiv}, |
| 147 | + primaryClass = {cs.RO}, |
| 148 | + doi = {10.5281/zenodo.19166967}, |
| 149 | + url = {https://robogate.io/paper} |
| 150 | +} |
| 151 | +``` |
| 152 | + |
| 153 | +## License |
| 154 | + |
| 155 | +Apache 2.0 |
0 commit comments