Skip to content

Commit 91e6e78

Browse files
Ubuntuclaude
andcommitted
feat: add RoboGate 68-scenario adversarial pick-and-place benchmark with 30K failure dictionary
- 68 adversarial pick-and-place scenarios across 5 difficulty tiers - 30,000-entry failure dictionary (UR3e, UR10e, Kuka iiwa, Sawyer) - VLA evaluation pipeline (Octo, OpenVLA, RT-2) with confidence scoring - Baseline results and multi-robot configs (Franka Panda, UR5e) - Deployment confidence scoring with 5-metric weighted evaluation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent dc0bbd7 commit 91e6e78

16 files changed

Lines changed: 3168 additions & 35 deletions

README.md

Lines changed: 155 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,155 @@
1-
# Isaac-Lab Arena
2-
3-
**A scalable environment creation and evaluation framework for robotics simulations built on top of NVIDIA Isaac Lab**
4-
5-
</div>
6-
7-
Isaac-Lab Arena is a comprehensive robotics simulation framework that enhances NVIDIA Isaac Lab by providing a composable, scalable system for creating diverse simulation environments and evaluating robot learning policies. The framework enables researchers and developers to rapidly prototype and test robotic tasks with various robot embodiments, objects, and environments.
8-
9-
To get started with Isaac-Lab Arena, see our [documentation site](https://isaac-sim.github.io/IsaacLab-Arena/release/0.1.1/index.html).
10-
11-
<div align="center">
12-
13-
**Isaac-Lab Arena** - Scaling robotic simulation and evaluation for the future
14-
15-
Made with ❤️ by the NVIDIA Robotics Team
16-
17-
</div>
1+
# RoboGate Benchmark for Isaac Lab-Arena
2+
3+
Adversarial 68-scenario pick-and-place validation benchmark with 5 safety metrics and deployment confidence scoring. Contributes the [RoboGate](https://robogate.io) evaluation suite to [Isaac Lab-Arena](https://github.com/isaac-sim/IsaacLab-Arena).
4+
5+
## Overview
6+
7+
RoboGate validates robot manipulation policies before deployment by testing them against 68 progressively harder scenarios across 4 difficulty categories:
8+
9+
| Category | Count | Target SR | Description |
10+
|----------|-------|-----------|-------------|
11+
| Nominal | 20 | 95-100% | Standard objects, lighting, centered placement |
12+
| Edge Cases | 15 | 70-85% | Small/heavy/edge/occluded/transparent objects |
13+
| Adversarial | 10 | 40-60% | Low light, clutter, slippery, disturbances |
14+
| Domain Rand | 23 | 85-95% | Lighting/color/position/camera variations |
15+
16+
## Quick Start
17+
18+
### Mock Mode (No GPU Required)
19+
20+
```bash
21+
cd contrib/isaaclab-arena
22+
23+
# Run scripted policy benchmark
24+
python scripts/run_benchmark.py --mock --output results/mock_results.json
25+
26+
# Run VLA evaluation
27+
python scripts/run_vla_eval.py --model octo-small --mock
28+
```
29+
30+
### Isaac Lab-Arena Integration
31+
32+
```bash
33+
# Install
34+
pip install -e .
35+
36+
# Run with Franka Panda
37+
python scripts/run_benchmark.py --embodiment franka --config configs/robogate_68.yaml
38+
39+
# Run VLA evaluation with real physics
40+
python scripts/run_vla_eval.py --model octo-small --embodiment franka --enable-cameras
41+
```
42+
43+
### As Isaac Lab-Arena Environment
44+
45+
```python
46+
from isaaclab_arena.assets.asset_registry import AssetRegistry
47+
from isaaclab_arena.environments.arena_env_builder import ArenaEnvBuilder
48+
from robogate_benchmark.environments import RoboGateBenchmarkEnvironment
49+
50+
env_def = RoboGateBenchmarkEnvironment()
51+
arena_env = env_def.get_env(args_cli)
52+
builder = ArenaEnvBuilder(arena_env, args_cli)
53+
env = builder.make_registered()
54+
55+
obs, info = env.reset()
56+
# ... run your policy ...
57+
```
58+
59+
## 5 Safety Metrics
60+
61+
| Metric | Threshold | Weight |
62+
|--------|-----------|--------|
63+
| Grasp Success Rate | >= 92% | 0.30 |
64+
| Cycle Time | <= baseline x 1.1 | 0.20 |
65+
| Collision Count | == 0 | 0.25 |
66+
| Drop Rate | <= 3% | 0.15* |
67+
| Grasp Miss Rate | <= baseline x 1.2 | 0.10* |
68+
69+
*Edge case performance (0.15) and baseline delta (0.10) are computed from scenario summaries.
70+
71+
## Confidence Score (0-100)
72+
73+
Weighted sum of 5 component scores:
74+
75+
- **76-100**: PASS — safe to deploy
76+
- **51-75**: WARN — deploy with monitoring
77+
- **0-50**: FAIL — do not deploy
78+
79+
## Baseline & VLA Results
80+
81+
| Model | Params | SR | Confidence | Collisions | Grasp Miss |
82+
|-------|--------|-----|-----------|------------|-----------|
83+
| Scripted (IK) || **100%** (68/68) | 76/100 | 0 | 0 |
84+
| OpenVLA (Stanford+TRI) | 7B | 0% (0/68) | 27/100 | 0 | 68 |
85+
| Octo-Base (UC Berkeley) | 93M | 0% (0/68) | 1/100 | 14 | 54 |
86+
| Octo-Small (UC Berkeley) | 27M | 0% (0/68) | 1/100 | 14 | 54 |
87+
88+
The 100-point gap across three VLA models (27M→7B, 260× scale) validates RoboGate's ability to discriminate safe vs unsafe policies. Model size is not the bottleneck — training-deployment distribution mismatch is.
89+
90+
## HuggingFace Failure Dictionary
91+
92+
30,720 boundary-focused episodes available at:
93+
[liveplex/robogate-failure-dictionary](https://huggingface.co/datasets/liveplex/robogate-failure-dictionary)
94+
95+
```python
96+
from robogate_benchmark.failure_dictionary import download_dataset, analyze_failures
97+
98+
ds = download_dataset(split="test")
99+
stats = analyze_failures(ds)
100+
print(stats.success_rate) # ~0.82
101+
```
102+
103+
## VLA Model Support
104+
105+
| Model | Params | Framework | Image Size | Quantization |
106+
|-------|--------|-----------|------------|--------------|
107+
| octo-small | 27M | JAX | 256x256 | - |
108+
| octo-base | 93M | JAX | 256x256 | - |
109+
| openvla-7b | 7B | PyTorch | 224x224 | 4-bit NF4 |
110+
111+
## File Structure
112+
113+
```
114+
contrib/isaaclab-arena/
115+
├── README.md
116+
├── setup.py
117+
├── robogate_benchmark/
118+
│ ├── __init__.py
119+
│ ├── scenarios.py # 68 scenarios (4 categories x 16 variants)
120+
│ ├── environments.py # ArenaEnvBuilder integration
121+
│ ├── metrics.py # 5 safety metrics
122+
│ ├── confidence_scorer.py # Deployment confidence (0-100)
123+
│ ├── failure_dictionary.py # HuggingFace 30K dataset
124+
│ ├── vla_evaluator.py # VLA evaluation pipeline
125+
│ └── report_generator.py # JSON + text reports
126+
├── configs/
127+
│ ├── robogate_68.yaml # 68-scenario config
128+
│ ├── franka_panda.yaml # Franka embodiment config
129+
│ └── ur5e.yaml # UR5e embodiment config
130+
├── scripts/
131+
│ ├── run_benchmark.py # Scripted policy benchmark
132+
│ └── run_vla_eval.py # VLA model evaluation
133+
└── results/
134+
└── baseline_results.json # Scripted controller baseline
135+
```
136+
137+
## Citation
138+
139+
```bibtex
140+
@misc{agentai2026robogate,
141+
title = {ROBOGATE: Adaptive Failure Discovery for Safe Robot
142+
Policy Deployment via Two-Stage Boundary-Focused Sampling},
143+
author = {{AgentAI Co., Ltd.}},
144+
year = {2026},
145+
eprint = {2603.22126},
146+
archivePrefix = {arXiv},
147+
primaryClass = {cs.RO},
148+
doi = {10.5281/zenodo.19166967},
149+
url = {https://robogate.io/paper}
150+
}
151+
```
152+
153+
## License
154+
155+
Apache 2.0

configs/franka_panda.yaml

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
# Franka Panda embodiment configuration for RoboGate Benchmark
2+
# Default robot for pick-and-place validation.
3+
4+
embodiment:
5+
name: "franka"
6+
asset_name: "franka" # Isaac Lab-Arena asset registry key
7+
type: "single_arm"
8+
dof: 9 # 7 arm + 2 gripper
9+
10+
joint_limits:
11+
arm: [2.8973, 1.7628, 2.8973, 3.0718, 2.8973, 3.7525, 2.8973]
12+
gripper_open: 0.04
13+
gripper_closed: 0.0
14+
15+
home_position:
16+
arm: [0.0, -0.785, 0.0, -2.356, 0.0, 1.571, 0.785]
17+
gripper: [0.04, 0.04]
18+
19+
end_effector:
20+
frame_name: "right_gripper"
21+
orientation_down: [0.0, 1.0, 0.0, 0.0] # wxyz quaternion
22+
23+
scene:
24+
table:
25+
position: [0.5, 0.0, 0.0]
26+
size: 0.6
27+
object_spawn:
28+
x_range: [0.35, 0.65]
29+
y_range: [-0.15, 0.15]
30+
z: 0.32 # TABLE_TOP_Z + 0.02
31+
target:
32+
position: [0.35, 0.30, 0.32]
33+
success_distance: 0.08
34+
35+
camera:
36+
wrist:
37+
position: [1.0, -0.6, 0.9]
38+
target: [0.45, 0.1, 0.3]
39+
resolution: [256, 256]
40+
41+
controller:
42+
scripted:
43+
steps_per_phase: 50 # ~2.5s at 20Hz control
44+
phases:
45+
- APPROACH_ABOVE
46+
- DESCEND
47+
- GRASP
48+
- LIFT
49+
- MOVE_TO_TARGET
50+
- RELEASE

configs/robogate_68.yaml

Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
# RoboGate 68-Scenario Adversarial Benchmark Configuration
2+
# https://github.com/liveplex-cpu/robogate
3+
#
4+
# 4 categories × 16 variants = 68 scenarios
5+
# Designed for pick-and-place policy validation before deployment.
6+
7+
benchmark:
8+
name: "robogate_68"
9+
version: "1.0.0"
10+
task: "pick_and_place"
11+
description: "Adversarial failure discovery for safe robot policy deployment"
12+
13+
scenarios:
14+
nominal:
15+
count: 20
16+
variants:
17+
- standard_objects
18+
- standard_lighting
19+
- centered_placement
20+
21+
edge_cases:
22+
count: 15
23+
variants:
24+
- small_objects
25+
- heavy_objects
26+
- edge_placement
27+
- occluded_objects
28+
- transparent_objects
29+
30+
adversarial:
31+
count: 10
32+
variants:
33+
- low_lighting
34+
- cluttered_scene
35+
- slippery_surface
36+
- moving_disturbance
37+
38+
domain_randomization:
39+
lighting_variations: 10
40+
object_color_variations: 5
41+
position_jitter: 5
42+
camera_noise: 3
43+
44+
metrics:
45+
- id: grasp_success_rate
46+
name: "grasp success rate"
47+
type: ratio
48+
unit: "%"
49+
50+
- id: cycle_time
51+
name: "cycle time"
52+
type: duration
53+
unit: "seconds"
54+
55+
- id: collision_count
56+
name: "collision count"
57+
type: count
58+
unit: "count"
59+
60+
- id: drop_rate
61+
name: "drop rate"
62+
type: ratio
63+
unit: "%"
64+
65+
- id: grasp_miss_rate
66+
name: "grasp miss rate"
67+
type: ratio
68+
unit: "%"
69+
70+
pass_criteria:
71+
grasp_success_rate: ">= 0.92"
72+
collision_count: "== 0"
73+
drop_rate: "<= 0.03"
74+
cycle_time: "<= baseline * 1.1"
75+
grasp_miss_rate: "<= baseline * 1.2"
76+
77+
confidence_weights:
78+
grasp_success_rate: 0.30
79+
cycle_time: 0.20
80+
collision_count: 0.25
81+
edge_case_performance: 0.15
82+
baseline_delta: 0.10
83+
84+
episode:
85+
max_steps: 300
86+
physics_dt: 0.01667 # 60 Hz
87+
control_dt: 0.05 # 20 Hz
88+
max_time_s: 15.0
89+
90+
seed: 42

configs/ur5e.yaml

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
# UR5e embodiment configuration for RoboGate Benchmark
2+
# Alternative robot for pick-and-place validation.
3+
4+
embodiment:
5+
name: "ur5e"
6+
asset_name: "ur5e" # Isaac Lab-Arena asset registry key
7+
type: "single_arm"
8+
dof: 8 # 6 arm + 2 gripper (Robotiq 2F-85)
9+
10+
joint_limits:
11+
arm: [6.2832, 6.2832, 3.1416, 6.2832, 6.2832, 6.2832]
12+
gripper_open: 0.085
13+
gripper_closed: 0.0
14+
15+
home_position:
16+
arm: [0.0, -1.5708, 1.5708, -1.5708, -1.5708, 0.0]
17+
gripper: [0.085, 0.085]
18+
19+
end_effector:
20+
frame_name: "tool0"
21+
orientation_down: [0.0, 1.0, 0.0, 0.0] # wxyz quaternion
22+
23+
scene:
24+
table:
25+
position: [0.5, 0.0, 0.0]
26+
size: 0.6
27+
object_spawn:
28+
x_range: [0.30, 0.60]
29+
y_range: [-0.15, 0.15]
30+
z: 0.32
31+
target:
32+
position: [0.35, 0.30, 0.32]
33+
success_distance: 0.08
34+
35+
camera:
36+
wrist:
37+
position: [0.8, -0.5, 0.9]
38+
target: [0.45, 0.1, 0.3]
39+
resolution: [256, 256]
40+
41+
controller:
42+
scripted:
43+
steps_per_phase: 60 # UR5e moves slower than Franka
44+
phases:
45+
- APPROACH_ABOVE
46+
- DESCEND
47+
- GRASP
48+
- LIFT
49+
- MOVE_TO_TARGET
50+
- RELEASE

0 commit comments

Comments
 (0)