Skip to content

Commit 8d67e55

Browse files
authored
Merge pull request #135 from SharpAI/develop
Develop
2 parents 75beb9a + 965e935 commit 8d67e55

File tree

17 files changed

+824
-151
lines changed

17 files changed

+824
-151
lines changed

README.md

Lines changed: 17 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -28,20 +28,23 @@
2828

2929
Each skill is a self-contained module with its own model, parameters, and [communication protocol](docs/skill-development.md). See the [Skill Development Guide](docs/skill-development.md) and [Platform Parameters](docs/skill-params.md) to build your own.
3030

31-
| Category | Skill | What It Does |
32-
|----------|-------|--------------|
33-
| **Detection** | [`yolo-detection-2026`](skills/detection/yolo-detection-2026/) | Real-time 80+ class object detection |
34-
| | [`dinov3-grounding`](skills/detection/dinov3-grounding/) | Open-vocabulary detection — describe what to find |
35-
| | [`person-recognition`](skills/detection/person-recognition/) | Re-identify individuals across cameras |
36-
| **Analysis** | [`vlm-scene-analysis`](skills/analysis/vlm-scene-analysis/) | Describe what happened in recorded clips |
37-
| | [`sam2-segmentation`](skills/analysis/sam2-segmentation/) | Click-to-segment with pixel-perfect masks |
38-
| **Transformation** | [`depth-estimation`](skills/transformation/depth-estimation/) | Monocular depth maps with Depth Anything v2 |
39-
| **Annotation** | [`dataset-annotation`](skills/annotation/dataset-annotation/) | AI-assisted labeling → COCO export |
40-
| **Camera Providers** | [`eufy`](skills/camera-providers/eufy/) · [`reolink`](skills/camera-providers/reolink/) · [`tapo`](skills/camera-providers/tapo/) | Direct camera integrations via RTSP |
41-
| **Streaming** | [`go2rtc-cameras`](skills/streaming/go2rtc-cameras/) | RTSP → WebRTC live view |
42-
| **Channels** | [`matrix`](skills/channels/matrix/) · [`line`](skills/channels/line/) · [`signal`](skills/channels/signal/) | Messaging channels for Clawdbot agent |
43-
| **Automation** | [`mqtt`](skills/automation/mqtt/) · [`webhook`](skills/automation/webhook/) · [`ha-trigger`](skills/automation/ha-trigger/) | Event-driven automation triggers |
44-
| **Integrations** | [`homeassistant-bridge`](skills/integrations/homeassistant-bridge/) | HA cameras in ↔ detection results out |
31+
| Category | Skill | What It Does | Status |
32+
|----------|-------|--------------|:------:|
33+
| **Detection** | [`yolo-detection-2026`](skills/detection/yolo-detection-2026/) | Real-time 80+ class object detection ||
34+
| | [`dinov3-grounding`](skills/detection/dinov3-grounding/) | Open-vocabulary detection — describe what to find | 📐 |
35+
| | [`person-recognition`](skills/detection/person-recognition/) | Re-identify individuals across cameras | 📐 |
36+
| **Analysis** | [`home-security-benchmark`](skills/analysis/home-security-benchmark/) | [131-test evaluation suite](#-homesec-bench--how-secure-is-your-local-ai) for LLM & VLM security performance ||
37+
| | [`vlm-scene-analysis`](skills/analysis/vlm-scene-analysis/) | Describe what happened in recorded clips | 📐 |
38+
| | [`sam2-segmentation`](skills/analysis/sam2-segmentation/) | Click-to-segment with pixel-perfect masks | 📐 |
39+
| **Transformation** | [`depth-estimation`](skills/transformation/depth-estimation/) | Monocular depth maps with Depth Anything v2 | 📐 |
40+
| **Annotation** | [`dataset-annotation`](skills/annotation/dataset-annotation/) | AI-assisted labeling → COCO export | 📐 |
41+
| **Camera Providers** | [`eufy`](skills/camera-providers/eufy/) · [`reolink`](skills/camera-providers/reolink/) · [`tapo`](skills/camera-providers/tapo/) | Direct camera integrations via RTSP | 📐 |
42+
| **Streaming** | [`go2rtc-cameras`](skills/streaming/go2rtc-cameras/) | RTSP → WebRTC live view | 📐 |
43+
| **Channels** | [`matrix`](skills/channels/matrix/) · [`line`](skills/channels/line/) · [`signal`](skills/channels/signal/) | Messaging channels for Clawdbot agent | 📐 |
44+
| **Automation** | [`mqtt`](skills/automation/mqtt/) · [`webhook`](skills/automation/webhook/) · [`ha-trigger`](skills/automation/ha-trigger/) | Event-driven automation triggers | 📐 |
45+
| **Integrations** | [`homeassistant-bridge`](skills/integrations/homeassistant-bridge/) | HA cameras in ↔ detection results out | 📐 |
46+
47+
> ✅ Ready · 🧪 Testing · 📐 Planned
4548
4649
> **Registry:** All skills are indexed in [`skills.json`](skills.json) for programmatic discovery.
4750

docs/detection-protocol.md

Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
# Detection Skill Protocol
2+
3+
Communication protocol for DeepCamera detection skills integrated with SharpAI Aegis.
4+
5+
## Transport
6+
7+
- **stdin** (Aegis → Skill): frame events and commands
8+
- **stdout** (Skill → Aegis): detection results, ready/error events
9+
- **stderr**: logging only — ignored by Aegis data parser
10+
11+
Format: **JSON Lines** (one JSON object per line, newline-delimited).
12+
13+
## Events
14+
15+
### Ready (Skill → Aegis)
16+
17+
Emitted after model loads successfully. `fps` reflects the skill's configured processing rate. `available_sizes` lists the model variants the skill supports.
18+
19+
```jsonl
20+
{"event": "ready", "model": "yolo2026n", "device": "mps", "classes": 80, "fps": 5, "available_sizes": ["nano", "small", "medium", "large"]}
21+
```
22+
23+
### Frame (Aegis → Skill)
24+
25+
Instruction to analyze a specific frame. `frame_id` is an incrementing integer used to correlate request/response.
26+
27+
```jsonl
28+
{"event": "frame", "frame_id": 42, "camera_id": "front_door", "timestamp": "2026-03-01T14:30:00Z", "frame_path": "/tmp/aegis_detection/frame_front_door.jpg", "width": 1920, "height": 1080}
29+
```
30+
31+
### Detections (Skill → Aegis)
32+
33+
Results of frame analysis. Must echo the same `frame_id` received in the frame event.
34+
35+
```jsonl
36+
{"event": "detections", "frame_id": 42, "camera_id": "front_door", "timestamp": "2026-03-01T14:30:00Z", "objects": [
37+
{"class": "person", "confidence": 0.92, "bbox": [100, 50, 300, 400]},
38+
{"class": "car", "confidence": 0.87, "bbox": [500, 200, 900, 500]}
39+
]}
40+
```
41+
42+
### Error (Skill → Aegis)
43+
44+
Indicates a processing error. `retriable: true` means Aegis can send the next frame.
45+
46+
```jsonl
47+
{"event": "error", "frame_id": 42, "message": "Inference error: ...", "retriable": true}
48+
```
49+
50+
### Stop (Aegis → Skill)
51+
52+
Graceful shutdown command.
53+
54+
```jsonl
55+
{"command": "stop"}
56+
```
57+
58+
## Data Formats
59+
60+
### Bounding Boxes
61+
62+
**Format**: `[x_min, y_min, x_max, y_max]` — pixel coordinates (xyxy).
63+
64+
| Field | Type | Description |
65+
|-------|------|-------------|
66+
| `x_min` | int | Left edge (pixels) |
67+
| `y_min` | int | Top edge (pixels) |
68+
| `x_max` | int | Right edge (pixels) |
69+
| `y_max` | int | Bottom edge (pixels) |
70+
71+
Coordinates are in the original image space (not normalized).
72+
73+
### Timestamps
74+
75+
ISO 8601 format: `2026-03-01T14:30:00Z`
76+
77+
### Frame Transfer
78+
79+
Frames are written to `/tmp/aegis_detection/frame_{camera_id}.jpg` as JPEG files with recycled per-camera filenames (overwritten each cycle). The `frame_path` in the frame event is the absolute path to the JPEG file.
80+
81+
## FPS Presets
82+
83+
| Preset | FPS | Use Case |
84+
|--------|-----|----------|
85+
| Ultra Low | 0.2 | Battery saver |
86+
| Low | 0.5 | Passive surveillance |
87+
| Normal | 1 | Standard monitoring |
88+
| Active | 3 | Active area monitoring |
89+
| High | 5 | Security-critical zones |
90+
| Real-time | 15 | Live tracking |
91+
92+
## Backpressure
93+
94+
The protocol is **request-response**: Aegis sends one frame, waits for the detection result, then sends the next. This provides natural backpressure — if the skill is slow, Aegis automatically drops frames (always uses the latest available frame).

docs/legacy-applications.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77

88
## Application 1: Self-supervised Person Recognition (REID) for Intruder Detection
99

10-
SharpAI yolov7_reid is an open source python application that leverages AI technologies to detect intruders with traditional surveillance cameras. [Source code](https://github.com/SharpAI/DeepCamera/blob/master/src/yolov7_reid/src/detector_cpu.py)
10+
SharpAI yolov7_reid is an open source python application that leverages AI technologies to detect intruders with traditional surveillance cameras. [Source code](https://github.com/SharpAI/DeepCamera/blob/master/src/yolov7_reid/src/detector.py)
1111

1212
It leverages Yolov7 as person detector, FastReID for person feature extraction, Milvus the local vector database for self-supervised learning to identify unseen persons, Labelstudio to host images locally and for further usage such as labeling data and training your own classifier. It also integrates with Home-Assistant to empower smart home with AI technology.
1313

docs/skill-development.md

Lines changed: 101 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,13 @@ A skill is a self-contained folder that provides an AI capability to [SharpAI Ae
1111
```
1212
skills/<category>/<skill-name>/
1313
├── SKILL.md # Manifest + setup instructions
14-
├── requirements.txt # Python dependencies
14+
├── config.yaml # Configuration schema for Aegis UI
15+
├── deploy.sh # Zero-assumption installer
16+
├── requirements.txt # Default Python dependencies
17+
├── requirements_cuda.txt # NVIDIA GPU dependencies
18+
├── requirements_rocm.txt # AMD GPU dependencies
19+
├── requirements_mps.txt # Apple Silicon dependencies
20+
├── requirements_cpu.txt # CPU-only dependencies
1521
├── scripts/
1622
│ └── main.py # Entry point
1723
├── assets/
@@ -68,6 +74,70 @@ LLM agent can read and execute.
6874
| `url` | URL input with validation | Server address |
6975
| `camera_select` | Camera picker | Target cameras |
7076

77+
## config.yaml — Configuration Schema
78+
79+
Defines user-configurable options shown in the Aegis Skills UI. Parsed by `parseConfigYaml()`.
80+
81+
```yaml
82+
params:
83+
- key: auto_start
84+
label: Auto Start
85+
type: boolean
86+
default: false
87+
description: "Start automatically on Aegis launch"
88+
89+
- key: model_size
90+
label: Model Size
91+
type: select
92+
default: nano
93+
description: "Choose model variant"
94+
options:
95+
- { value: nano, label: "Nano (fastest)" }
96+
- { value: small, label: "Small (balanced)" }
97+
98+
- key: confidence
99+
label: Confidence
100+
type: number
101+
default: 0.5
102+
description: "Min confidence (0.1–1.0)"
103+
```
104+
105+
### Reserved Keys
106+
107+
| Key | Type | Behavior |
108+
|-----|------|----------|
109+
| `auto_start` | boolean | Aegis auto-starts the skill on boot when `true` |
110+
111+
## deploy.sh — Zero-Assumption Installer
112+
113+
Bootstraps the environment from scratch. Must handle:
114+
115+
1. **Find Python** — check system → conda → pyenv
116+
2. **Create venv** — isolated `.venv/` inside skill directory
117+
3. **Detect GPU** — CUDA → ROCm → MPS → CPU fallback
118+
4. **Install deps** — from matching `requirements_<backend>.txt`
119+
5. **Verify** — import test
120+
121+
Emit JSONL progress for Aegis UI:
122+
```bash
123+
echo '{"event": "progress", "stage": "gpu", "backend": "mps"}'
124+
echo '{"event": "complete", "backend": "mps", "message": "Installed!"}'
125+
```
126+
127+
## Environment Variables
128+
129+
Aegis injects these into every skill process:
130+
131+
| Variable | Description |
132+
|----------|-------------|
133+
| `AEGIS_SKILL_ID` | Skill identifier |
134+
| `AEGIS_SKILL_PARAMS` | JSON string of user config values |
135+
| `AEGIS_GATEWAY_URL` | LLM gateway URL |
136+
| `AEGIS_VLM_URL` | VLM server URL |
137+
| `AEGIS_LLM_MODEL` | Active LLM model name |
138+
| `AEGIS_VLM_MODEL` | Active VLM model name |
139+
| `PYTHONUNBUFFERED` | Set to `1` for real-time output |
140+
71141
## JSON Lines Protocol
72142

73143
Scripts communicate with Aegis via stdin/stdout. Each line is a JSON object.
@@ -108,6 +178,36 @@ Scripts communicate with Aegis via stdin/stdout. Each line is a JSON object.
108178
echo '{"event": "frame", "camera_id": "test", "frame_path": "/tmp/test.jpg"}' | python scripts/main.py
109179
```
110180

181+
## skills.json — Catalog Registration
182+
183+
Register skills in the repo root `skills.json`:
184+
185+
```json
186+
{
187+
"skills": [
188+
{
189+
"id": "my-skill",
190+
"name": "My Skill",
191+
"description": "What it does",
192+
"category": "detection",
193+
"tags": ["tag1"],
194+
"path": "skills/detection/my-skill",
195+
"status": "testing",
196+
"platforms": ["darwin-arm64", "linux-x64"]
197+
}
198+
]
199+
}
200+
```
201+
202+
### Status Values
203+
204+
| Status | Emoji | Meaning |
205+
|--------|-------|---------|
206+
| `ready` | ✅ | Production-quality, tested |
207+
| `testing` | 🧪 | Functional, needs validation |
208+
| `experimental` | ⚗️ | Proof of concept |
209+
| `planned` | 📐 | Not yet implemented |
210+
111211
## Reference
112212

113213
See [`skills/detection/yolo-detection-2026/`](../skills/detection/yolo-detection-2026/) for a complete working example.

skills.json

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,54 @@
4848
"ui_unlocks": [
4949
"benchmark_report"
5050
]
51+
},
52+
{
53+
"id": "yolo-detection-2026",
54+
"name": "YOLO 2026 Object Detection",
55+
"description": "State-of-the-art real-time object detection — 80+ COCO classes, bounding box overlays, multi-size model selection.",
56+
"version": "1.0.0",
57+
"category": "detection",
58+
"path": "skills/detection/yolo-detection-2026",
59+
"tags": [
60+
"detection",
61+
"yolo",
62+
"object-detection",
63+
"real-time",
64+
"coco"
65+
],
66+
"platforms": [
67+
"linux-x64",
68+
"linux-arm64",
69+
"darwin-arm64",
70+
"darwin-x64",
71+
"win-x64"
72+
],
73+
"requirements": {
74+
"python": ">=3.9",
75+
"ram_gb": 2
76+
},
77+
"capabilities": [
78+
"live_detection",
79+
"bbox_overlay"
80+
],
81+
"ui_unlocks": [
82+
"detection_overlay",
83+
"detection_results"
84+
],
85+
"fps_presets": [
86+
0.2,
87+
0.5,
88+
1,
89+
3,
90+
5,
91+
15
92+
],
93+
"model_sizes": [
94+
"nano",
95+
"small",
96+
"medium",
97+
"large"
98+
]
5199
}
52100
]
53101
}

skills/analysis/home-security-benchmark/SKILL.md

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ version: 2.0.0
55
category: analysis
66
runtime: node
77
entry: scripts/run-benchmark.cjs
8-
install: none
8+
install: npm
99
---
1010

1111
# Home Security AI Benchmark
@@ -14,7 +14,7 @@ Comprehensive benchmark suite evaluating LLM and VLM models on **131 tests** acr
1414

1515
## Setup
1616

17-
**No installation required.** This skill has zero external dependencies — it uses only Node.js built-in modules. No `npm install` needed.
17+
**Requires `npm install`.** This skill has a `package.json` with dependencies (e.g. `openai` SDK). The deployment agent should check for `package.json` and run `npm install` in the skill directory before launching.
1818

1919
Entry script: `scripts/run-benchmark.cjs`
2020

@@ -53,7 +53,13 @@ node scripts/run-benchmark.cjs --no-open
5353
| Variable | Default | Description |
5454
|----------|---------|-------------|
5555
| `AEGIS_GATEWAY_URL` | `http://localhost:5407` | LLM gateway (OpenAI-compatible) |
56+
| `AEGIS_LLM_URL` || Direct llama-server LLM endpoint |
57+
| `AEGIS_LLM_API_TYPE` | `openai` | LLM provider type (builtin, openai, etc.) |
58+
| `AEGIS_LLM_MODEL` || LLM model name |
59+
| `AEGIS_LLM_API_KEY` || API key for cloud LLM providers |
60+
| `AEGIS_LLM_BASE_URL` || Cloud provider base URL (e.g. `https://api.openai.com/v1`) |
5661
| `AEGIS_VLM_URL` | *(disabled)* | VLM server base URL |
62+
| `AEGIS_VLM_MODEL` || Loaded VLM model ID |
5763
| `AEGIS_SKILL_ID` || Skill identifier (enables skill mode) |
5864
| `AEGIS_SKILL_PARAMS` | `{}` | JSON params from skill config |
5965

@@ -129,5 +135,6 @@ Results are saved to `~/.aegis-ai/benchmarks/` as JSON. An HTML report with cros
129135
## Requirements
130136

131137
- Node.js ≥ 18
132-
- Running LLM server (llama-cpp, vLLM, or any OpenAI-compatible API)
138+
- `npm install` (for `openai` SDK dependency)
139+
- Running LLM server (llama-server, OpenAI API, or any OpenAI-compatible endpoint)
133140
- Optional: Running VLM server for scene analysis tests (35 tests)

skills/analysis/home-security-benchmark/package-lock.json

Lines changed: 37 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)