|
| 1 | +--- |
| 2 | +name: SmartHome Video Anomaly Benchmark |
| 3 | +description: VLM evaluation suite for video anomaly detection in smart home camera footage |
| 4 | +version: 1.0.0 |
| 5 | +category: analysis |
| 6 | +runtime: node |
| 7 | +entry: scripts/run-benchmark.cjs |
| 8 | +install: npm |
| 9 | + |
| 10 | +requirements: |
| 11 | + node: ">=18" |
| 12 | + npm_install: true |
| 13 | + system_deps: [yt-dlp, ffmpeg] |
| 14 | + platforms: ["linux", "macos", "windows"] |
| 15 | +--- |
| 16 | + |
| 17 | +# SmartHome Video Anomaly Benchmark |
| 18 | + |
| 19 | +Benchmark suite evaluating VLM models on **video anomaly detection** across **7 smart home categories** — based on the [SmartHome-Bench](https://github.com/Xinyi-0724/SmartHome-Bench-LLM) dataset (1,203 YouTube smart home camera clips). |
| 20 | + |
| 21 | +> **Key differentiator from HomeSec-Bench**: This benchmark requires **video understanding** (multi-frame sampling), not single-frame analysis. |
| 22 | +
|
| 23 | +## Setup |
| 24 | + |
| 25 | +**Requires `npm install` + system dependencies.** The deployment agent should: |
| 26 | +1. Check for `yt-dlp` and `ffmpeg` in PATH |
| 27 | +2. Run `npm install` in the skill directory |
| 28 | + |
| 29 | +Entry script: `scripts/run-benchmark.cjs` |
| 30 | + |
| 31 | +### Verification |
| 32 | + |
| 33 | +```bash |
| 34 | +node scripts/run-benchmark.cjs --help |
| 35 | +``` |
| 36 | + |
| 37 | +## Quick Start |
| 38 | + |
| 39 | +### As an Aegis Skill (automatic) |
| 40 | + |
| 41 | +When spawned by Aegis, configuration is injected via environment variables. The benchmark downloads video clips, samples frames, evaluates with VLM, and generates an HTML report. |
| 42 | + |
| 43 | +### Standalone |
| 44 | + |
| 45 | +```bash |
| 46 | +# Run with local VLM (subset mode, 50 videos) |
| 47 | +node scripts/run-benchmark.cjs --vlm http://localhost:5405 |
| 48 | + |
| 49 | +# Quick test with 10 videos |
| 50 | +node scripts/run-benchmark.cjs --vlm http://localhost:5405 --max-videos 10 |
| 51 | + |
| 52 | +# Full benchmark (all curated clips) |
| 53 | +node scripts/run-benchmark.cjs --vlm http://localhost:5405 --mode full |
| 54 | + |
| 55 | +# Filter by category |
| 56 | +node scripts/run-benchmark.cjs --vlm http://localhost:5405 --categories "Wildlife,Security" |
| 57 | + |
| 58 | +# Skip download (re-evaluate cached videos) |
| 59 | +node scripts/run-benchmark.cjs --vlm http://localhost:5405 --skip-download |
| 60 | + |
| 61 | +# Skip report auto-open |
| 62 | +node scripts/run-benchmark.cjs --vlm http://localhost:5405 --no-open |
| 63 | +``` |
| 64 | + |
| 65 | +## Configuration |
| 66 | + |
| 67 | +### Environment Variables (set by Aegis) |
| 68 | + |
| 69 | +| Variable | Default | Description | |
| 70 | +|----------|---------|-------------| |
| 71 | +| `AEGIS_VLM_URL` | *(required)* | VLM server base URL | |
| 72 | +| `AEGIS_VLM_MODEL` | — | Loaded VLM model ID | |
| 73 | +| `AEGIS_SKILL_ID` | — | Skill identifier (enables skill mode) | |
| 74 | +| `AEGIS_SKILL_PARAMS` | `{}` | JSON params from skill config | |
| 75 | + |
| 76 | +> **Note**: This is a VLM-only benchmark. An LLM gateway is not required. |
| 77 | +
|
| 78 | +### User Configuration (config.yaml) |
| 79 | + |
| 80 | +This skill includes a [`config.yaml`](config.yaml) that defines user-configurable parameters. Aegis parses this at install time and renders a config panel in the UI. Values are delivered via `AEGIS_SKILL_PARAMS`. |
| 81 | + |
| 82 | +| Parameter | Type | Default | Description | |
| 83 | +|-----------|------|---------|-------------| |
| 84 | +| `mode` | select | `subset` | Which clips to evaluate: `subset` (~50 clips) or `full` (all ~105 curated clips) | |
| 85 | +| `maxVideos` | number | `50` | Maximum number of videos to evaluate | |
| 86 | +| `categories` | text | `all` | Comma-separated category filter (e.g. `Wildlife,Security`) | |
| 87 | +| `noOpen` | boolean | `false` | Skip auto-opening the HTML report in browser | |
| 88 | + |
| 89 | +### CLI Arguments (standalone fallback) |
| 90 | + |
| 91 | +| Argument | Default | Description | |
| 92 | +|----------|---------|-------------| |
| 93 | +| `--vlm URL` | *(required)* | VLM server base URL | |
| 94 | +| `--out DIR` | `~/.aegis-ai/smarthome-bench` | Results directory | |
| 95 | +| `--max-videos N` | `50` | Max videos to evaluate | |
| 96 | +| `--mode MODE` | `subset` | `subset` or `full` | |
| 97 | +| `--categories LIST` | `all` | Comma-separated category filter | |
| 98 | +| `--skip-download` | — | Skip video download, use cached | |
| 99 | +| `--no-open` | — | Don't auto-open report in browser | |
| 100 | +| `--report` | *(auto in skill mode)* | Force report generation | |
| 101 | + |
| 102 | +## Protocol |
| 103 | + |
| 104 | +### Aegis → Skill (env vars) |
| 105 | +``` |
| 106 | +AEGIS_VLM_URL=http://localhost:5405 |
| 107 | +AEGIS_SKILL_ID=smarthome-bench |
| 108 | +AEGIS_SKILL_PARAMS={} |
| 109 | +``` |
| 110 | + |
| 111 | +### Skill → Aegis (stdout, JSON lines) |
| 112 | +```jsonl |
| 113 | +{"event": "ready", "model": "SmolVLM2-2.2B", "system": "Apple M3"} |
| 114 | +{"event": "suite_start", "suite": "Wildlife"} |
| 115 | +{"event": "test_result", "suite": "Wildlife", "test": "smartbench_0003", "status": "pass", "timeMs": 4500} |
| 116 | +{"event": "suite_end", "suite": "Wildlife", "passed": 12, "failed": 3} |
| 117 | +{"event": "complete", "passed": 78, "total": 105, "timeMs": 480000, "reportPath": "/path/to/report.html"} |
| 118 | +``` |
| 119 | + |
| 120 | +Human-readable output goes to **stderr** (visible in Aegis console tab). |
| 121 | + |
| 122 | +## Test Suites (7 Categories) |
| 123 | + |
| 124 | +| Suite | Description | Anomaly Examples | |
| 125 | +|-------|-------------|------------------| |
| 126 | +| 🦊 Wildlife | Wild animals near home cameras | Bear on porch, deer in garden, coyote at night | |
| 127 | +| 👴 Senior Care | Elderly activity monitoring | Falls, wandering, unusual inactivity | |
| 128 | +| 👶 Baby Monitoring | Infant/child safety | Stroller rolling, child climbing, unsupervised | |
| 129 | +| 🐾 Pet Monitoring | Pet behavior detection | Pet illness, escaped pets, unusual behavior | |
| 130 | +| 🔒 Home Security | Intrusion & suspicious activity | Break-ins, trespassing, porch pirates | |
| 131 | +| 📦 Package Delivery | Package arrival & theft | Stolen packages, misdelivered, weather damage | |
| 132 | +| 🏠 General Activity | General smart home events | Unusual hours activity, appliance issues | |
| 133 | + |
| 134 | +Each clip is evaluated for **binary anomaly detection**: the VLM predicts normal (0) or abnormal (1), compared against expert annotations. |
| 135 | + |
| 136 | +## Metrics |
| 137 | + |
| 138 | +Per-category and overall: |
| 139 | +- **Accuracy** — correct predictions / total |
| 140 | +- **Precision** — true positives / predicted positives |
| 141 | +- **Recall** — true positives / actual positives |
| 142 | +- **F1-Score** — harmonic mean of precision & recall |
| 143 | +- **Confusion Matrix** — TP, FP, TN, FN breakdown |
| 144 | + |
| 145 | +## Results |
| 146 | + |
| 147 | +Results are saved to `~/.aegis-ai/smarthome-bench/` as JSON. An HTML report with per-category breakdown, confusion matrix, and model comparison is auto-generated. |
| 148 | + |
| 149 | +## Requirements |
| 150 | + |
| 151 | +- Node.js ≥ 18 |
| 152 | +- `npm install` (for `openai` SDK dependency) |
| 153 | +- `yt-dlp` (video download from YouTube) |
| 154 | +- `ffmpeg` (frame extraction from video clips) |
| 155 | +- Running VLM server (must support multi-image input) |
| 156 | + |
| 157 | +## Citation |
| 158 | + |
| 159 | +Based on [SmartHome-Bench: A Comprehensive Benchmark for Video Anomaly Detection in Smart Homes Using Multi-Modal Foundation Models](https://arxiv.org/abs/2506.12992). |
0 commit comments