Skip to content

Commit 535ba50

Browse files
committed
feat(benchmark): expand HomeSec-Bench from 78 to 131 tests across 16 suites
Phase 2 (78→116): - VLM Scene Analysis: 7→35 tests with 28 AI-generated fixture images - New Error Recovery & Edge Cases suite (4 tests) - New Privacy & Compliance suite (3 tests) - Knowledge Distillation +2, Narrative Synthesis +1 Phase 3 (116→131): - New Alert Routing & Subscription suite (5 tests): channel targeting, quiet hours, subscription modification, schedule cancellation, broadcast - New Knowledge Injection to Dialog suite (5 tests): KI-personalized responses, schedule-aware narration, relevance filtering, conflict handling - New VLM-to-Alert Triage suite (5 tests): end-to-end VLM description to urgency classification with alert message generation - Added schedule_task and knowledge_read tools to AEGIS_TOOLS - Expanded event_subscribe with channel and targetType params Paper: Updated LaTeX to reflect 131 tests, 16 suites, 96 LLM, 35 VLM. Compiled PDF included.
1 parent f3bfa5b commit 535ba50

33 files changed

+2179
-81
lines changed
88 KB
Binary file not shown.

docs/paper/home-security-benchmark.tex

Lines changed: 1105 additions & 0 deletions
Large diffs are not rendered by default.
833 KB
Loading
706 KB
Loading
438 KB
Loading
854 KB
Loading
748 KB
Loading
462 KB
Loading
998 KB
Loading
853 KB
Loading

0 commit comments

Comments
 (0)