|
| 1 | +--- |
| 2 | +name: HomeSafe-Bench |
| 3 | +description: VLM indoor safety hazard detection benchmark inspired by HomeSafeBench (arXiv 2509.23690) |
| 4 | +version: 1.0.0 |
| 5 | +category: analysis |
| 6 | +runtime: node |
| 7 | +entry: scripts/run-benchmark.cjs |
| 8 | +install: npm |
| 9 | + |
| 10 | +requirements: |
| 11 | + node: ">=18" |
| 12 | + npm_install: true |
| 13 | + platforms: ["linux", "macos", "windows"] |
| 14 | +--- |
| 15 | + |
| 16 | +# HomeSafe-Bench — Indoor Safety Hazard Detection |
| 17 | + |
| 18 | +VLM benchmark evaluating indoor home safety hazard detection from static camera frames. Inspired by [HomeSafeBench (arXiv 2509.23690)](https://arxiv.org/abs/2509.23690), adapted for **fixed indoor security cameras** with AI-generated fixture images. |
| 19 | + |
| 20 | +**40 tests** across **5 hazard categories**: |
| 21 | + |
| 22 | +| Category | Tests | Example Hazards | |
| 23 | +|----------|-------|-----------------| |
| 24 | +| 🔥 Fire / Smoke | 8 | Stove smoke, candle near curtain, iron left face-down | |
| 25 | +| ⚡ Electrical | 8 | Overloaded power strip, exposed wire, frayed cord | |
| 26 | +| 🚶 Trip / Fall | 8 | Toys on staircase, wet floor, loose rug edge | |
| 27 | +| 👶 Child Safety | 8 | Open cabinet with chemicals, uncovered outlet, blind cord | |
| 28 | +| 📦 Falling Objects | 8 | Heavy items on high shelf, leaning bookshelf, loose ceiling fan | |
| 29 | + |
| 30 | +## Setup |
| 31 | + |
| 32 | +**Requires `npm install`.** This skill has a `package.json` with dependencies (e.g. `openai` SDK). The deployment agent should check for `package.json` and run `npm install` in the skill directory before launching. |
| 33 | + |
| 34 | +Entry script: `scripts/run-benchmark.cjs` |
| 35 | + |
| 36 | +### Verification |
| 37 | + |
| 38 | +```bash |
| 39 | +node scripts/run-benchmark.cjs --help |
| 40 | +``` |
| 41 | + |
| 42 | +## Quick Start |
| 43 | + |
| 44 | +### As an Aegis Skill (automatic) |
| 45 | + |
| 46 | +When spawned by Aegis, all configuration is injected via environment variables. The benchmark discovers your VLM server automatically, generates an HTML report, and opens it when complete. |
| 47 | + |
| 48 | +### Standalone |
| 49 | + |
| 50 | +```bash |
| 51 | +# Run all 40 tests |
| 52 | +node scripts/run-benchmark.cjs --vlm http://localhost:5405 |
| 53 | + |
| 54 | +# Quick mode (2 tests per category = 10 total) |
| 55 | +node scripts/run-benchmark.cjs --vlm http://localhost:5405 --mode quick |
| 56 | + |
| 57 | +# Skip report auto-open |
| 58 | +node scripts/run-benchmark.cjs --vlm http://localhost:5405 --no-open |
| 59 | +``` |
| 60 | + |
| 61 | +## Configuration |
| 62 | + |
| 63 | +### Environment Variables (set by Aegis) |
| 64 | + |
| 65 | +| Variable | Default | Description | |
| 66 | +|----------|---------|-------------| |
| 67 | +| `AEGIS_VLM_URL` | *(required)* | VLM server base URL | |
| 68 | +| `AEGIS_VLM_MODEL` | — | Loaded VLM model ID | |
| 69 | +| `AEGIS_SKILL_ID` | — | Skill identifier (enables skill mode) | |
| 70 | +| `AEGIS_SKILL_PARAMS` | `{}` | JSON params from skill config | |
| 71 | + |
| 72 | +> **Note**: URLs should be base URLs (e.g. `http://localhost:5405`). The benchmark appends `/v1/chat/completions` automatically. |
| 73 | +
|
| 74 | +### User Configuration (config.yaml) |
| 75 | + |
| 76 | +| Parameter | Type | Default | Description | |
| 77 | +|-----------|------|---------|-------------| |
| 78 | +| `mode` | select | `full` | Which mode: `full` (40 tests) or `quick` (10 tests — 2 per category) | |
| 79 | +| `noOpen` | boolean | `false` | Skip auto-opening the HTML report in browser | |
| 80 | + |
| 81 | +### CLI Arguments (standalone fallback) |
| 82 | + |
| 83 | +| Argument | Default | Description | |
| 84 | +|----------|---------|-------------| |
| 85 | +| `--vlm URL` | *(required)* | VLM server base URL | |
| 86 | +| `--mode MODE` | `full` | Test mode: `full` or `quick` | |
| 87 | +| `--out DIR` | `~/.aegis-ai/homesafe-benchmarks` | Results directory | |
| 88 | +| `--no-open` | — | Don't auto-open report in browser | |
| 89 | + |
| 90 | +## Protocol |
| 91 | + |
| 92 | +### Aegis → Skill (env vars) |
| 93 | +``` |
| 94 | +AEGIS_VLM_URL=http://localhost:5405 |
| 95 | +AEGIS_SKILL_ID=homesafe-bench |
| 96 | +AEGIS_SKILL_PARAMS={} |
| 97 | +``` |
| 98 | + |
| 99 | +### Skill → Aegis (stdout, JSON lines) |
| 100 | +```jsonl |
| 101 | +{"event": "ready", "vlm": "SmolVLM-500M", "system": "Apple M3"} |
| 102 | +{"event": "suite_start", "suite": "🔥 Fire / Smoke"} |
| 103 | +{"event": "test_result", "suite": "...", "test": "...", "status": "pass", "timeMs": 4500} |
| 104 | +{"event": "suite_end", "suite": "...", "passed": 7, "failed": 1} |
| 105 | +{"event": "complete", "passed": 36, "total": 40, "timeMs": 180000, "reportPath": "/path/to/report.html"} |
| 106 | +``` |
| 107 | + |
| 108 | +Human-readable output goes to **stderr** (visible in Aegis console tab). |
| 109 | + |
| 110 | +## Citation |
| 111 | + |
| 112 | +This benchmark is inspired by: |
| 113 | + |
| 114 | +> **HomeSafeBench: Towards Measuring the Proficiency of Home Safety for Embodied AI Agents** |
| 115 | +> arXiv:2509.23690 |
| 116 | +> |
| 117 | +> Unlike the academic benchmark (embodied agent + navigation in simulated 3D environments), our version uses **static indoor camera frames** — matching real-world indoor security camera deployment (fixed wall/ceiling mount). All fixture images are **AI-generated** consistent with DeepCamera's privacy-first approach. |
| 118 | +
|
| 119 | +## Requirements |
| 120 | + |
| 121 | +- Node.js ≥ 18 |
| 122 | +- `npm install` (for `openai` SDK dependency) |
| 123 | +- Running VLM server (llama-server with vision model, or OpenAI-compatible VLM endpoint) |
0 commit comments