SharpAI
diff --git a/‎README.md‎
Lines changed: 17 additions & 14 deletions b/‎README.md‎
Lines changed: 17 additions & 14 deletions
diff --git a/‎docs/detection-protocol.md‎
Lines changed: 94 additions & 0 deletions b/‎docs/detection-protocol.md‎
Lines changed: 94 additions & 0 deletions
diff --git a/‎docs/legacy-applications.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/legacy-applications.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/skill-development.md‎
Lines changed: 101 additions & 1 deletion b/‎docs/skill-development.md‎
Lines changed: 101 additions & 1 deletion
diff --git a/‎skills.json‎
Lines changed: 48 additions & 0 deletions b/‎skills.json‎
Lines changed: 48 additions & 0 deletions
diff --git a/‎skills/analysis/home-security-benchmark/SKILL.md‎
Lines changed: 10 additions & 3 deletions b/‎skills/analysis/home-security-benchmark/SKILL.md‎
Lines changed: 10 additions & 3 deletions
diff --git a/‎skills/analysis/home-security-benchmark/package-lock.json‎
Lines changed: 37 additions & 0 deletions b/‎skills/analysis/home-security-benchmark/package-lock.json‎
Lines changed: 37 additions & 0 deletions
@@ -28,20 +28,23 @@
 
 Each skill is a self-contained module with its own model, parameters, and [communication protocol](docs/skill-development.md). See the [Skill Development Guide](docs/skill-development.md) and [Platform Parameters](docs/skill-params.md) to build your own.
 
-| Category | Skill | What It Does |
-|----------|-------|--------------|
-| **Detection** | [`yolo-detection-2026`](skills/detection/yolo-detection-2026/) | Real-time 80+ class object detection |
-| | [`dinov3-grounding`](skills/detection/dinov3-grounding/) | Open-vocabulary detection — describe what to find |
-| | [`person-recognition`](skills/detection/person-recognition/) | Re-identify individuals across cameras |
-| **Analysis** | [`vlm-scene-analysis`](skills/analysis/vlm-scene-analysis/) | Describe what happened in recorded clips |
-| | [`sam2-segmentation`](skills/analysis/sam2-segmentation/) | Click-to-segment with pixel-perfect masks |
-| **Transformation** | [`depth-estimation`](skills/transformation/depth-estimation/) | Monocular depth maps with Depth Anything v2 |
-| **Annotation** | [`dataset-annotation`](skills/annotation/dataset-annotation/) | AI-assisted labeling → COCO export |
-| **Camera Providers** | [`eufy`](skills/camera-providers/eufy/) · [`reolink`](skills/camera-providers/reolink/) · [`tapo`](skills/camera-providers/tapo/) | Direct camera integrations via RTSP |
-| **Streaming** | [`go2rtc-cameras`](skills/streaming/go2rtc-cameras/) | RTSP → WebRTC live view |
-| **Channels** | [`matrix`](skills/channels/matrix/) · [`line`](skills/channels/line/) · [`signal`](skills/channels/signal/) | Messaging channels for Clawdbot agent |
-| **Automation** | [`mqtt`](skills/automation/mqtt/) · [`webhook`](skills/automation/webhook/) · [`ha-trigger`](skills/automation/ha-trigger/) | Event-driven automation triggers |
-| **Integrations** | [`homeassistant-bridge`](skills/integrations/homeassistant-bridge/) | HA cameras in ↔ detection results out |
+| Category | Skill | What It Does | Status |
+|----------|-------|--------------|:------:|
+| **Detection** | [`yolo-detection-2026`](skills/detection/yolo-detection-2026/) | Real-time 80+ class object detection | ✅|
+| | [`dinov3-grounding`](skills/detection/dinov3-grounding/) | Open-vocabulary detection — describe what to find | 📐 |
+| | [`person-recognition`](skills/detection/person-recognition/) | Re-identify individuals across cameras | 📐 |
+| **Analysis** | [`home-security-benchmark`](skills/analysis/home-security-benchmark/) | [131-test evaluation suite](#-homesec-bench--how-secure-is-your-local-ai) for LLM & VLM security performance | ✅ |
+| | [`vlm-scene-analysis`](skills/analysis/vlm-scene-analysis/) | Describe what happened in recorded clips | 📐 |
+| | [`sam2-segmentation`](skills/analysis/sam2-segmentation/) | Click-to-segment with pixel-perfect masks | 📐 |
+| **Transformation** | [`depth-estimation`](skills/transformation/depth-estimation/) | Monocular depth maps with Depth Anything v2 | 📐 |
+| **Annotation** | [`dataset-annotation`](skills/annotation/dataset-annotation/) | AI-assisted labeling → COCO export | 📐 |
+| **Camera Providers** | [`eufy`](skills/camera-providers/eufy/) · [`reolink`](skills/camera-providers/reolink/) · [`tapo`](skills/camera-providers/tapo/) | Direct camera integrations via RTSP | 📐 |
+| **Streaming** | [`go2rtc-cameras`](skills/streaming/go2rtc-cameras/) | RTSP → WebRTC live view | 📐 |
+| **Channels** | [`matrix`](skills/channels/matrix/) · [`line`](skills/channels/line/) · [`signal`](skills/channels/signal/) | Messaging channels for Clawdbot agent | 📐 |
+| **Automation** | [`mqtt`](skills/automation/mqtt/) · [`webhook`](skills/automation/webhook/) · [`ha-trigger`](skills/automation/ha-trigger/) | Event-driven automation triggers | 📐 |
+| **Integrations** | [`homeassistant-bridge`](skills/integrations/homeassistant-bridge/) | HA cameras in ↔ detection results out | 📐 |
+
+> ✅ Ready · 🧪 Testing · 📐 Planned
 
 > **Registry:** All skills are indexed in [`skills.json`](skills.json) for programmatic discovery.
 
 
@@ -0,0 +1,94 @@
+# Detection Skill Protocol
+
+Communication protocol for DeepCamera detection skills integrated with SharpAI Aegis.
+
+## Transport
+
+- **stdin** (Aegis → Skill): frame events and commands
+- **stdout** (Skill → Aegis): detection results, ready/error events
+- **stderr**: logging only — ignored by Aegis data parser
+
+Format: **JSON Lines** (one JSON object per line, newline-delimited).
+
+## Events
+
+### Ready (Skill → Aegis)
+
+Emitted after model loads successfully. `fps` reflects the skill's configured processing rate. `available_sizes` lists the model variants the skill supports.
+
+```jsonl
+{"event": "ready", "model": "yolo2026n", "device": "mps", "classes": 80, "fps": 5, "available_sizes": ["nano", "small", "medium", "large"]}
+```
+
+### Frame (Aegis → Skill)
+
+Instruction to analyze a specific frame. `frame_id` is an incrementing integer used to correlate request/response.
+
+```jsonl
+{"event": "frame", "frame_id": 42, "camera_id": "front_door", "timestamp": "2026-03-01T14:30:00Z", "frame_path": "/tmp/aegis_detection/frame_front_door.jpg", "width": 1920, "height": 1080}
+```
+
+### Detections (Skill → Aegis)
+
+Results of frame analysis. Must echo the same `frame_id` received in the frame event.
+
+```jsonl
+{"event": "detections", "frame_id": 42, "camera_id": "front_door", "timestamp": "2026-03-01T14:30:00Z", "objects": [
+  {"class": "person", "confidence": 0.92, "bbox": [100, 50, 300, 400]},
+  {"class": "car",    "confidence": 0.87, "bbox": [500, 200, 900, 500]}
+]}
+```
+
+### Error (Skill → Aegis)
+
+Indicates a processing error. `retriable: true` means Aegis can send the next frame.
+
+```jsonl
+{"event": "error", "frame_id": 42, "message": "Inference error: ...", "retriable": true}
+```
+
+### Stop (Aegis → Skill)
+
+Graceful shutdown command.
+
+```jsonl
+{"command": "stop"}
+```
+
+## Data Formats
+
+### Bounding Boxes
+
+**Format**: `[x_min, y_min, x_max, y_max]` — pixel coordinates (xyxy).
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `x_min` | int | Left edge (pixels) |
+| `y_min` | int | Top edge (pixels) |
+| `x_max` | int | Right edge (pixels) |
+| `y_max` | int | Bottom edge (pixels) |
+
+Coordinates are in the original image space (not normalized).
+
+### Timestamps
+
+ISO 8601 format: `2026-03-01T14:30:00Z`
+
+### Frame Transfer
+
+Frames are written to `/tmp/aegis_detection/frame_{camera_id}.jpg` as JPEG files with recycled per-camera filenames (overwritten each cycle). The `frame_path` in the frame event is the absolute path to the JPEG file.
+
+## FPS Presets
+
+| Preset | FPS | Use Case |
+|--------|-----|----------|
+| Ultra Low | 0.2 | Battery saver |
+| Low | 0.5 | Passive surveillance |
+| Normal | 1 | Standard monitoring |
+| Active | 3 | Active area monitoring |
+| High | 5 | Security-critical zones |
+| Real-time | 15 | Live tracking |
+
+## Backpressure
+
+The protocol is **request-response**: Aegis sends one frame, waits for the detection result, then sends the next. This provides natural backpressure — if the skill is slow, Aegis automatically drops frames (always uses the latest available frame).
@@ -7,7 +7,7 @@
 
 ## Application 1: Self-supervised Person Recognition (REID) for Intruder Detection
 
-SharpAI yolov7_reid is an open source python application that leverages AI technologies to detect intruders with traditional surveillance cameras. [Source code](https://github.com/SharpAI/DeepCamera/blob/master/src/yolov7_reid/src/detector_cpu.py)
+SharpAI yolov7_reid is an open source python application that leverages AI technologies to detect intruders with traditional surveillance cameras. [Source code](https://github.com/SharpAI/DeepCamera/blob/master/src/yolov7_reid/src/detector.py)
 
 It leverages Yolov7 as person detector, FastReID for person feature extraction, Milvus the local vector database for self-supervised learning to identify unseen persons, Labelstudio to host images locally and for further usage such as labeling data and training your own classifier. It also integrates with Home-Assistant to empower smart home with AI technology.
 
 
@@ -11,7 +11,13 @@ A skill is a self-contained folder that provides an AI capability to [SharpAI Ae
 ```
 skills/<category>/<skill-name>/
 ├── SKILL.md              # Manifest + setup instructions
-├── requirements.txt      # Python dependencies
+├── config.yaml           # Configuration schema for Aegis UI
+├── deploy.sh             # Zero-assumption installer
+├── requirements.txt      # Default Python dependencies
+├── requirements_cuda.txt # NVIDIA GPU dependencies
+├── requirements_rocm.txt # AMD GPU dependencies
+├── requirements_mps.txt  # Apple Silicon dependencies
+├── requirements_cpu.txt  # CPU-only dependencies
 ├── scripts/
 │   └── main.py           # Entry point
 ├── assets/
@@ -68,6 +74,70 @@ LLM agent can read and execute.
 | `url` | URL input with validation | Server address |
 | `camera_select` | Camera picker | Target cameras |
 
+## config.yaml — Configuration Schema
+
+Defines user-configurable options shown in the Aegis Skills UI. Parsed by `parseConfigYaml()`.
+
+```yaml
+params:
+  - key: auto_start
+    label: Auto Start
+    type: boolean
+    default: false
+    description: "Start automatically on Aegis launch"
+
+  - key: model_size
+    label: Model Size
+    type: select
+    default: nano
+    description: "Choose model variant"
+    options:
+      - { value: nano, label: "Nano (fastest)" }
+      - { value: small, label: "Small (balanced)" }
+
+  - key: confidence
+    label: Confidence
+    type: number
+    default: 0.5
+    description: "Min confidence (0.1–1.0)"
+```
+
+### Reserved Keys
+
+| Key | Type | Behavior |
+|-----|------|----------|
+| `auto_start` | boolean | Aegis auto-starts the skill on boot when `true` |
+
+## deploy.sh — Zero-Assumption Installer
+
+Bootstraps the environment from scratch. Must handle:
+
+1. **Find Python** — check system → conda → pyenv
+2. **Create venv** — isolated `.venv/` inside skill directory
+3. **Detect GPU** — CUDA → ROCm → MPS → CPU fallback
+4. **Install deps** — from matching `requirements_<backend>.txt`
+5. **Verify** — import test
+
+Emit JSONL progress for Aegis UI:
+```bash
+echo '{"event": "progress", "stage": "gpu", "backend": "mps"}'
+echo '{"event": "complete", "backend": "mps", "message": "Installed!"}'
+```
+
+## Environment Variables
+
+Aegis injects these into every skill process:
+
+| Variable | Description |
+|----------|-------------|
+| `AEGIS_SKILL_ID` | Skill identifier |
+| `AEGIS_SKILL_PARAMS` | JSON string of user config values |
+| `AEGIS_GATEWAY_URL` | LLM gateway URL |
+| `AEGIS_VLM_URL` | VLM server URL |
+| `AEGIS_LLM_MODEL` | Active LLM model name |
+| `AEGIS_VLM_MODEL` | Active VLM model name |
+| `PYTHONUNBUFFERED` | Set to `1` for real-time output |
+
 ## JSON Lines Protocol
 
 Scripts communicate with Aegis via stdin/stdout. Each line is a JSON object.
@@ -108,6 +178,36 @@ Scripts communicate with Aegis via stdin/stdout. Each line is a JSON object.
 echo '{"event": "frame", "camera_id": "test", "frame_path": "/tmp/test.jpg"}' | python scripts/main.py
 ```
 
+## skills.json — Catalog Registration
+
+Register skills in the repo root `skills.json`:
+
+```json
+{
+  "skills": [
+    {
+      "id": "my-skill",
+      "name": "My Skill",
+      "description": "What it does",
+      "category": "detection",
+      "tags": ["tag1"],
+      "path": "skills/detection/my-skill",
+      "status": "testing",
+      "platforms": ["darwin-arm64", "linux-x64"]
+    }
+  ]
+}
+```
+
+### Status Values
+
+| Status | Emoji | Meaning |
+|--------|-------|---------|
+| `ready` | ✅ | Production-quality, tested |
+| `testing` | 🧪 | Functional, needs validation |
+| `experimental` | ⚗️ | Proof of concept |
+| `planned` | 📐 | Not yet implemented |
+
 ## Reference
 
 See [`skills/detection/yolo-detection-2026/`](../skills/detection/yolo-detection-2026/) for a complete working example.
@@ -48,6 +48,54 @@
       "ui_unlocks": [
         "benchmark_report"
       ]
+    },
+    {
+      "id": "yolo-detection-2026",
+      "name": "YOLO 2026 Object Detection",
+      "description": "State-of-the-art real-time object detection — 80+ COCO classes, bounding box overlays, multi-size model selection.",
+      "version": "1.0.0",
+      "category": "detection",
+      "path": "skills/detection/yolo-detection-2026",
+      "tags": [
+        "detection",
+        "yolo",
+        "object-detection",
+        "real-time",
+        "coco"
+      ],
+      "platforms": [
+        "linux-x64",
+        "linux-arm64",
+        "darwin-arm64",
+        "darwin-x64",
+        "win-x64"
+      ],
+      "requirements": {
+        "python": ">=3.9",
+        "ram_gb": 2
+      },
+      "capabilities": [
+        "live_detection",
+        "bbox_overlay"
+      ],
+      "ui_unlocks": [
+        "detection_overlay",
+        "detection_results"
+      ],
+      "fps_presets": [
+        0.2,
+        0.5,
+        1,
+        3,
+        5,
+        15
+      ],
+      "model_sizes": [
+        "nano",
+        "small",
+        "medium",
+        "large"
+      ]
     }
   ]
 }
@@ -5,7 +5,7 @@ version: 2.0.0
 category: analysis
 runtime: node
 entry: scripts/run-benchmark.cjs
-install: none
+install: npm
 ---
 
 # Home Security AI Benchmark
@@ -14,7 +14,7 @@ Comprehensive benchmark suite evaluating LLM and VLM models on **131 tests** acr
 
 ## Setup
 
-**No installation required.** This skill has zero external dependencies — it uses only Node.js built-in modules. No `npm install` needed.
+**Requires `npm install`.** This skill has a `package.json` with dependencies (e.g. `openai` SDK). The deployment agent should check for `package.json` and run `npm install` in the skill directory before launching.
 
 Entry script: `scripts/run-benchmark.cjs`
 
@@ -53,7 +53,13 @@ node scripts/run-benchmark.cjs --no-open
 | Variable | Default | Description |
 |----------|---------|-------------|
 | `AEGIS_GATEWAY_URL` | `http://localhost:5407` | LLM gateway (OpenAI-compatible) |
+| `AEGIS_LLM_URL` | — | Direct llama-server LLM endpoint |
+| `AEGIS_LLM_API_TYPE` | `openai` | LLM provider type (builtin, openai, etc.) |
+| `AEGIS_LLM_MODEL` | — | LLM model name |
+| `AEGIS_LLM_API_KEY` | — | API key for cloud LLM providers |
+| `AEGIS_LLM_BASE_URL` | — | Cloud provider base URL (e.g. `https://api.openai.com/v1`) |
 | `AEGIS_VLM_URL` | *(disabled)* | VLM server base URL |
+| `AEGIS_VLM_MODEL` | — | Loaded VLM model ID |
 | `AEGIS_SKILL_ID` | — | Skill identifier (enables skill mode) |
 | `AEGIS_SKILL_PARAMS` | `{}` | JSON params from skill config |
 
@@ -129,5 +135,6 @@ Results are saved to `~/.aegis-ai/benchmarks/` as JSON. An HTML report with cros
 ## Requirements
 
 - Node.js ≥ 18
-- Running LLM server (llama-cpp, vLLM, or any OpenAI-compatible API)
+- `npm install` (for `openai` SDK dependency)
+- Running LLM server (llama-server, OpenAI API, or any OpenAI-compatible endpoint)
 - Optional: Running VLM server for scene analysis tests (35 tests)