Skip to content

Commit 7cb3660

Browse files
committed
feat: add HomeSafe-Bench skill + disk space checks
- New HomeSafe-Bench skill: 40 indoor safety VLM tests across 5 categories (fire/smoke, electrical, trip/fall, child safety, falling objects) - 26/40 AI-generated fixture frames (remaining pending image gen quota) - Runtime disk space pre-check in SmartHome-Bench (15GB full / 2GB subset) - Register homesafe-bench in skills.json - All datasets download at runtime, not during deployment
1 parent 9b81316 commit 7cb3660

34 files changed

+1509
-0
lines changed

skills.json

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -133,6 +133,38 @@
133133
"ui_unlocks": [
134134
"benchmark_report"
135135
]
136+
},
137+
{
138+
"id": "homesafe-bench",
139+
"name": "HomeSafe Indoor Safety Benchmark",
140+
"description": "VLM evaluation suite for indoor home safety hazard detection — 40 tests across 5 categories: fire/smoke, electrical, trip/fall, child safety, falling objects.",
141+
"version": "1.0.0",
142+
"category": "analysis",
143+
"path": "skills/analysis/homesafe-bench",
144+
"tags": [
145+
"benchmark",
146+
"vlm",
147+
"safety",
148+
"hazard",
149+
"indoor"
150+
],
151+
"platforms": [
152+
"linux-x64",
153+
"linux-arm64",
154+
"darwin-arm64",
155+
"darwin-x64",
156+
"win-x64"
157+
],
158+
"requirements": {
159+
"node": ">=18",
160+
"ram_gb": 2
161+
},
162+
"capabilities": [
163+
"benchmark"
164+
],
165+
"ui_unlocks": [
166+
"benchmark_report"
167+
]
136168
}
137169
]
138170
}
Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
---
2+
name: HomeSafe-Bench
3+
description: VLM indoor safety hazard detection benchmark inspired by HomeSafeBench (arXiv 2509.23690)
4+
version: 1.0.0
5+
category: analysis
6+
runtime: node
7+
entry: scripts/run-benchmark.cjs
8+
install: npm
9+
10+
requirements:
11+
node: ">=18"
12+
npm_install: true
13+
platforms: ["linux", "macos", "windows"]
14+
---
15+
16+
# HomeSafe-Bench — Indoor Safety Hazard Detection
17+
18+
VLM benchmark evaluating indoor home safety hazard detection from static camera frames. Inspired by [HomeSafeBench (arXiv 2509.23690)](https://arxiv.org/abs/2509.23690), adapted for **fixed indoor security cameras** with AI-generated fixture images.
19+
20+
**40 tests** across **5 hazard categories**:
21+
22+
| Category | Tests | Example Hazards |
23+
|----------|-------|-----------------|
24+
| 🔥 Fire / Smoke | 8 | Stove smoke, candle near curtain, iron left face-down |
25+
| ⚡ Electrical | 8 | Overloaded power strip, exposed wire, frayed cord |
26+
| 🚶 Trip / Fall | 8 | Toys on staircase, wet floor, loose rug edge |
27+
| 👶 Child Safety | 8 | Open cabinet with chemicals, uncovered outlet, blind cord |
28+
| 📦 Falling Objects | 8 | Heavy items on high shelf, leaning bookshelf, loose ceiling fan |
29+
30+
## Setup
31+
32+
**Requires `npm install`.** This skill has a `package.json` with dependencies (e.g. `openai` SDK). The deployment agent should check for `package.json` and run `npm install` in the skill directory before launching.
33+
34+
Entry script: `scripts/run-benchmark.cjs`
35+
36+
### Verification
37+
38+
```bash
39+
node scripts/run-benchmark.cjs --help
40+
```
41+
42+
## Quick Start
43+
44+
### As an Aegis Skill (automatic)
45+
46+
When spawned by Aegis, all configuration is injected via environment variables. The benchmark discovers your VLM server automatically, generates an HTML report, and opens it when complete.
47+
48+
### Standalone
49+
50+
```bash
51+
# Run all 40 tests
52+
node scripts/run-benchmark.cjs --vlm http://localhost:5405
53+
54+
# Quick mode (2 tests per category = 10 total)
55+
node scripts/run-benchmark.cjs --vlm http://localhost:5405 --mode quick
56+
57+
# Skip report auto-open
58+
node scripts/run-benchmark.cjs --vlm http://localhost:5405 --no-open
59+
```
60+
61+
## Configuration
62+
63+
### Environment Variables (set by Aegis)
64+
65+
| Variable | Default | Description |
66+
|----------|---------|-------------|
67+
| `AEGIS_VLM_URL` | *(required)* | VLM server base URL |
68+
| `AEGIS_VLM_MODEL` || Loaded VLM model ID |
69+
| `AEGIS_SKILL_ID` || Skill identifier (enables skill mode) |
70+
| `AEGIS_SKILL_PARAMS` | `{}` | JSON params from skill config |
71+
72+
> **Note**: URLs should be base URLs (e.g. `http://localhost:5405`). The benchmark appends `/v1/chat/completions` automatically.
73+
74+
### User Configuration (config.yaml)
75+
76+
| Parameter | Type | Default | Description |
77+
|-----------|------|---------|-------------|
78+
| `mode` | select | `full` | Which mode: `full` (40 tests) or `quick` (10 tests — 2 per category) |
79+
| `noOpen` | boolean | `false` | Skip auto-opening the HTML report in browser |
80+
81+
### CLI Arguments (standalone fallback)
82+
83+
| Argument | Default | Description |
84+
|----------|---------|-------------|
85+
| `--vlm URL` | *(required)* | VLM server base URL |
86+
| `--mode MODE` | `full` | Test mode: `full` or `quick` |
87+
| `--out DIR` | `~/.aegis-ai/homesafe-benchmarks` | Results directory |
88+
| `--no-open` || Don't auto-open report in browser |
89+
90+
## Protocol
91+
92+
### Aegis → Skill (env vars)
93+
```
94+
AEGIS_VLM_URL=http://localhost:5405
95+
AEGIS_SKILL_ID=homesafe-bench
96+
AEGIS_SKILL_PARAMS={}
97+
```
98+
99+
### Skill → Aegis (stdout, JSON lines)
100+
```jsonl
101+
{"event": "ready", "vlm": "SmolVLM-500M", "system": "Apple M3"}
102+
{"event": "suite_start", "suite": "🔥 Fire / Smoke"}
103+
{"event": "test_result", "suite": "...", "test": "...", "status": "pass", "timeMs": 4500}
104+
{"event": "suite_end", "suite": "...", "passed": 7, "failed": 1}
105+
{"event": "complete", "passed": 36, "total": 40, "timeMs": 180000, "reportPath": "/path/to/report.html"}
106+
```
107+
108+
Human-readable output goes to **stderr** (visible in Aegis console tab).
109+
110+
## Citation
111+
112+
This benchmark is inspired by:
113+
114+
> **HomeSafeBench: Towards Measuring the Proficiency of Home Safety for Embodied AI Agents**
115+
> arXiv:2509.23690
116+
>
117+
> Unlike the academic benchmark (embodied agent + navigation in simulated 3D environments), our version uses **static indoor camera frames** — matching real-world indoor security camera deployment (fixed wall/ceiling mount). All fixture images are **AI-generated** consistent with DeepCamera's privacy-first approach.
118+
119+
## Requirements
120+
121+
- Node.js ≥ 18
122+
- `npm install` (for `openai` SDK dependency)
123+
- Running VLM server (llama-server with vision model, or OpenAI-compatible VLM endpoint)
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
params:
2+
- key: mode
3+
label: Test Mode
4+
type: select
5+
options: [full, quick]
6+
default: full
7+
description: "Which test mode: full (40 tests) or quick (10 tests — 2 per category)"
8+
9+
- key: noOpen
10+
label: Don't auto-open report
11+
type: boolean
12+
default: false
13+
description: Skip opening the HTML report in browser after completion
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
#!/bin/bash
2+
# HomeSafe-Bench deployment script
3+
# Runs npm install to fetch openai SDK dependency
4+
5+
set -e
6+
cd "$(dirname "$0")"
7+
npm install
8+
echo "✅ HomeSafe-Bench dependencies installed"
710 KB
Loading
642 KB
Loading
880 KB
Loading
605 KB
Loading
568 KB
Loading
801 KB
Loading

0 commit comments

Comments
 (0)