You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat(benchmark): auto-generate and open HTML report, update SKILL.md to v2.0.0
- Report is now always generated after benchmark completion
- Auto-opens in browser via 'open' (macOS) / 'xdg-open' (Linux)
- Use --no-open to suppress browser launch
- Removed --report flag (report always generated)
- Updated SKILL.md: 131 tests, 16 suites, env var documentation,
configuration table with defaults and descriptions
description: LLM & VLM evaluation suite for home security AI applications
4
-
version: 1.0.0
4
+
version: 2.0.0
5
5
category: analysis
6
6
---
7
7
8
8
# Home Security AI Benchmark
9
9
10
-
Comprehensive benchmark suite that evaluates LLM and VLM models on tasks specific to**home security AI assistants** — deduplication, event classification, knowledge extraction, tool use, and scene analysis.
10
+
Comprehensive benchmark suite evaluating LLM and VLM models on **131 tests** across**16 suites** — context preprocessing, tool use, security classification, prompt injection resistance, alert routing, knowledge injection, VLM-to-alert triage, and scene analysis.
11
11
12
12
## Quick Start
13
13
14
+
### As an Aegis Skill (automatic)
15
+
16
+
When spawned by Aegis, all configuration is injected via environment variables. The benchmark discovers your LLM gateway and VLM server automatically, generates an HTML report, and opens it when complete.
|`AEGIS_SKILL_PARAMS`|`{}`| JSON params from skill config |
44
+
45
+
> **Note**: URLs should be base URLs (e.g. `http://localhost:5405`). The benchmark appends `/v1/chat/completions` automatically. Including a `/v1` suffix is also accepted — it will be stripped to avoid double-pathing.
Results are saved to `~/.aegis-ai/benchmarks/` as JSON. The HTML report generator reads all historical results for cross-model comparison.
101
+
Results are saved to `~/.aegis-ai/benchmarks/` as JSON. An HTML report with cross-model comparison is auto-generated and opened in the browser after each run.
69
102
70
103
## Requirements
71
104
72
105
- Node.js ≥ 18
73
106
- Running LLM server (llama-cpp, vLLM, or any OpenAI-compatible API)
74
-
- Optional: Running VLM server for scene analysis tests
107
+
- Optional: Running VLM server for scene analysis tests (35 tests)
0 commit comments