Commit 535ba50
committed
feat(benchmark): expand HomeSec-Bench from 78 to 131 tests across 16 suites
Phase 2 (78→116):
- VLM Scene Analysis: 7→35 tests with 28 AI-generated fixture images
- New Error Recovery & Edge Cases suite (4 tests)
- New Privacy & Compliance suite (3 tests)
- Knowledge Distillation +2, Narrative Synthesis +1
Phase 3 (116→131):
- New Alert Routing & Subscription suite (5 tests): channel targeting,
quiet hours, subscription modification, schedule cancellation, broadcast
- New Knowledge Injection to Dialog suite (5 tests): KI-personalized
responses, schedule-aware narration, relevance filtering, conflict handling
- New VLM-to-Alert Triage suite (5 tests): end-to-end VLM description
to urgency classification with alert message generation
- Added schedule_task and knowledge_read tools to AEGIS_TOOLS
- Expanded event_subscribe with channel and targetType params
Paper: Updated LaTeX to reflect 131 tests, 16 suites, 96 LLM, 35 VLM.
Compiled PDF included.1 parent f3bfa5b commit 535ba50
File tree
33 files changed
+2179
-81
lines changed- docs/paper
- skills/analysis/home-security-benchmark
- fixtures
- frames
- scripts
33 files changed
+2179
-81
lines changedBinary file not shown.
Large diffs are not rendered by default.
0 commit comments