You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat(benchmark): add Indoor Safety Hazards VLM suite (12 tests)
Expand HomeSec-Bench VLM Scene Analysis from 35 to 47 tests with a new
Indoor Safety Hazards category covering fire, electrical, trip/fall,
child safety, and blocked exit scenarios using AI-generated indoor
security camera frames.
New test scenarios:
- Stove smoke, candle near curtain, space heater near drapes, iron left on
- Overloaded power strip, frayed electrical cord
- Toys on stairs, wet floor
- Person fallen, items on high shelf
- Open cabinet with chemicals
- Cluttered/blocked exit
Total benchmark: 131 → 143 tests
VLM suite: 35 → 47 tests
Version: 2.0.0 → 2.1.0
@@ -140,7 +140,7 @@ Camera → Frame Governor → detect.py (JSONL) → Aegis IPC → Live Overlay
140
140
141
141
## 📊 HomeSec-Bench — How Secure Is Your Local AI?
142
142
143
-
**HomeSec-Bench** is a 131-test security benchmark that measures how well your local AI performs as a security guard. It tests what matters: Can it detect a person in fog? Classify a break-in vs. a delivery? Resist prompt injection? Route alerts correctly at 3 AM?
143
+
**HomeSec-Bench** is a 143-test security benchmark that measures how well your local AI performs as a security guard. It tests what matters: Can it detect a person in fog? Classify a break-in vs. a delivery? Resist prompt injection? Route alerts correctly at 3 AM?
144
144
145
145
Run it on your own hardware to know exactly where your setup stands.
Copy file name to clipboardExpand all lines: skills/analysis/home-security-benchmark/SKILL.md
+6-6Lines changed: 6 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,7 @@
1
1
---
2
2
name: Home Security AI Benchmark
3
3
description: LLM & VLM evaluation suite for home security AI applications
4
-
version: 2.0.0
4
+
version: 2.1.0
5
5
category: analysis
6
6
runtime: node
7
7
entry: scripts/run-benchmark.cjs
@@ -15,7 +15,7 @@ requirements:
15
15
16
16
# Home Security AI Benchmark
17
17
18
-
Comprehensive benchmark suite evaluating LLM and VLM models on **131 tests** across **16 suites** — context preprocessing, tool use, security classification, prompt injection resistance, alert routing, knowledge injection, VLM-to-alert triage, and scene analysis.
18
+
Comprehensive benchmark suite evaluating LLM and VLM models on **143 tests** across **16 suites** — context preprocessing, tool use, security classification, prompt injection resistance, alert routing, knowledge injection, VLM-to-alert triage, and scene analysis.
19
19
20
20
## Setup
21
21
@@ -76,7 +76,7 @@ This skill includes a [`config.yaml`](config.yaml) that defines user-configurabl
76
76
77
77
| Parameter | Type | Default | Description |
78
78
|-----------|------|---------|-------------|
79
-
|`mode`| select |`llm`| Which suites to run: `llm` (96 tests), `vlm` (35 tests), or `full` (131 tests) |
79
+
|`mode`| select |`llm`| Which suites to run: `llm` (96 tests), `vlm` (47 tests), or `full` (143 tests) |
80
80
|`noOpen`| boolean |`false`| Skip auto-opening the HTML report in browser |
81
81
82
82
Platform parameters like `AEGIS_GATEWAY_URL` and `AEGIS_VLM_URL` are auto-injected by Aegis — they are **not** in `config.yaml`. See [Aegis Skill Platform Parameters](../../../docs/skill-params.md) for the full platform contract.
@@ -112,7 +112,7 @@ AEGIS_SKILL_PARAMS={}
112
112
113
113
Human-readable output goes to **stderr** (visible in Aegis console tab).
114
114
115
-
## Test Suites (131 Tests)
115
+
## Test Suites (143 Tests)
116
116
117
117
| Suite | Tests | Domain |
118
118
|-------|-------|--------|
@@ -131,7 +131,7 @@ Human-readable output goes to **stderr** (visible in Aegis console tab).
0 commit comments