Skip to content

Commit f850d46

Browse files
authored
Merge pull request ruvnet#874 from ruvnet/feat/adr-149-aether-arena
feat(aether-arena): ADR-149 Spatial-Intelligence Benchmark — scorer + CI harness gate
2 parents 8d64434 + 4896d05 commit f850d46

51 files changed

Lines changed: 3448 additions & 43 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
Lines changed: 119 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,119 @@
1+
{
2+
"id": "aether-arena-aa",
3+
"name": "AetherArena (AA) — Official Spatial-Intelligence Benchmark",
4+
"adr": "ADR-149",
5+
"adrPath": "docs/adr/ADR-149-public-community-leaderboard-huggingface.md",
6+
"status": "Accepted",
7+
"initializedDate": "2026-05-30",
8+
"targetDate": "2026-08-31",
9+
"exitCriteria": "Benchmark INFRASTRUCTURE done, tested, CI-gated, deploy-ready: aa_score_runner.rs passes deterministic fixture test; CI harness-gate green on every PR; aether-arena repo scaffold committed (README four-part framing + aa-submission.toml schema + VERIFY.md); public smoke split committed; HF Space lifecycle skeleton deployed; signed Parquet ledger functional; RuView baseline PCK@20 ~2.5% entered; ADR-149 §7 acceptance test (five-step stranger test) passes. NOTE: ML SOTA (MM-Fi PCK@20 ~72%) is a separate long-running stretch goal blocked on ADR-079 camera-ground-truth — it is NOT an infra exit criterion.",
10+
"baselineState": {
11+
"adrStatus": "Accepted, committed 2026-05-30",
12+
"scorerCode": "ruview_metrics.rs + ablation.rs + proof.rs exist in wifi-densepose-train; aa_score_runner.rs not yet created",
13+
"aetherArenaRepo": "does not exist yet — needs user authorization to create ruvnet/aether-arena public repo",
14+
"hfSpace": "does not exist yet — needs HF_TOKEN and user authorization to deploy ruvnet/aether-arena HF Space",
15+
"smokeDataset": "not committed",
16+
"resultsLedger": "not created",
17+
"ruviewBaseline": "PCK@20 ~2.5% self-reported, not formally entered",
18+
"ciGate": "not added to workflow"
19+
},
20+
"milestones": {
21+
"m1": {
22+
"name": "ADR-149 Accepted + committed",
23+
"status": "DONE",
24+
"completedDate": "2026-05-30",
25+
"completionCriteria": "ADR-149 file committed to docs/adr/ with status Accepted",
26+
"notes": "Done this session. File at docs/adr/ADR-149-public-community-leaderboard-huggingface.md"
27+
},
28+
"m2": {
29+
"name": "Deterministic scorer runner bin (aa_score_runner.rs)",
30+
"status": "NOT_STARTED",
31+
"completionCriteria": "aa_score_runner.rs compiles, runs ruview_metrics on a committed fixture, emits RuViewTier + SHA-256 proof hash, mirrors existing *_proof_runner.rs pattern; cargo test passes",
32+
"estimatedEffort": "3-5 days",
33+
"owner": "wifi-densepose-train crate or new aa-scorer crate"
34+
},
35+
"m3": {
36+
"name": "CI harness-gate: GitHub Actions workflow",
37+
"status": "NOT_STARTED",
38+
"completionCriteria": "A GitHub Actions workflow runs aa_score_runner on every PR as a build gate; PR fails if scorer fails determinism check; workflow committed and green",
39+
"estimatedEffort": "2-3 days",
40+
"dependency": "M2 must be done first"
41+
},
42+
"m4": {
43+
"name": "aether-arena repo scaffold",
44+
"status": "NOT_STARTED",
45+
"completionCriteria": "ruvnet/aether-arena repo created with: README (four-part framing: Public leaderboard / Private eval split / Open scorer / Signed results); aa-submission.toml manifest schema; VERIFY.md (ADR-149 §7 stranger acceptance test); neutrality/governance section (§2.8); contribution guide",
46+
"estimatedEffort": "3-5 days",
47+
"blockers": ["Needs user authorization to create public ruvnet/aether-arena repo on GitHub"]
48+
},
49+
"m5": {
50+
"name": "Public smoke split committed + private MM-Fi held-out split prep",
51+
"status": "NOT_STARTED",
52+
"completionCriteria": "Public smoke split committed to aether-arena repo (stranger can score locally); private MM-Fi held-out split prepared under non-public path with CC BY-NC 4.0 attribution; Wi-Pose explicitly excluded from v0",
53+
"estimatedEffort": "5-7 days",
54+
"riskNotes": "MM-Fi CC BY-NC 4.0: AA must remain non-commercial and carry MM-Fi attribution; raw frames stay in private split; only derived CSI features + scores may be exposed"
55+
},
56+
"m6": {
57+
"name": "HF Space (Gradio) skeleton",
58+
"status": "BLOCKED",
59+
"completionCriteria": "HF Space deployed at ruvnet/aether-arena with submission lifecycle (submitted->validated->quarantined->smoke_scored->full_scored->published/rejected); sandboxed scorer container wired; basic leaderboard table rendered",
60+
"estimatedEffort": "7-10 days",
61+
"blockers": [
62+
"Needs HF_TOKEN — check .env for HF_TOKEN or HUGGINGFACE_TOKEN",
63+
"Needs user authorization to create/deploy ruvnet/aether-arena HF Space (outward-facing public deployment)"
64+
]
65+
},
66+
"m7": {
67+
"name": "Signed append-only Parquet results ledger",
68+
"status": "NOT_STARTED",
69+
"completionCriteria": "HF dataset ruvnet/aether-arena-results created; append-only Parquet ledger with signed rows; determinism_gate enforced; no row can be silently edited",
70+
"estimatedEffort": "3-5 days",
71+
"ledgerSchema": "submitter, model_ref, category, feature_set, tier, pck20, oks, mota, vitals_bpm_err, latency_p50, latency_p95, privacy_leakage, cross_room_deg, proof_sha256, scored_at, harness_version",
72+
"dependency": "M6 must be scaffolded first"
73+
},
74+
"m8": {
75+
"name": "RuView baseline entry + public launch",
76+
"status": "NOT_STARTED",
77+
"completionCriteria": "RuView wifi-densepose-pretrained baseline entered (honest PCK@20 ~2.5%); ADR-149 §7 five-step stranger acceptance test passes; v0 live with Presence + Pose + Edge-latency + Determinism categories active; Privacy and Cross-room shown as gated/coming-soon",
78+
"estimatedEffort": "3-5 days",
79+
"dependency": "M4+M5+M6+M7 complete",
80+
"notes": "ML SOTA improvement (PCK@20 ~72%) is a SEPARATE stretch goal blocked on ADR-079 P7-P9 camera ground truth. NOT a blocker for infra launch."
81+
}
82+
},
83+
"activeMilestone": "m2",
84+
"completedMilestones": ["m1"],
85+
"knownRisks": [
86+
"HF_TOKEN not confirmed present in .env — check before M6 work begins",
87+
"ruvnet/aether-arena public repo creation is outward-facing — needs explicit user authorization",
88+
"MM-Fi CC BY-NC 4.0: AA must stay legally non-commercial and brand-distinct from commercial RuView product; or seek MM-Fi commercial grant before any paid tier",
89+
"Wi-Pose has research-use-only terms (no redistribution grant) — excluded from v0; revisit only if terms are clarified with authors",
90+
"HF Space free CPU tier may be too slow for Candle/tch inference pipeline — may need ZeroGPU or self-hosted scorer on cognitum-20260110 GCloud A100/L4",
91+
"ADR-079 camera-ground-truth (PCK@20 SOTA) is P7-P9 pending — NOT an infra blocker; must not be conflated with AA infra completion",
92+
"Neutrality/governance risk: RuView seeded the scorer — must be demonstrably scored through the same public pipeline as any other entrant (§2.8 controls)"
93+
],
94+
"driftSignals": {
95+
"timeline": "GREEN — just initialized, no timeline pressure yet",
96+
"scope": "GREEN — scope locked at four-part structure per ADR-149 §2 decision",
97+
"approach": "GREEN — reuse pattern (existing ruview_metrics + proof.rs) confirmed in ADR-149",
98+
"dependency": "YELLOW — HF_TOKEN and ruvnet/aether-arena repo authorization are external blockers with unknown ETA",
99+
"priority": "GREEN — active feature branch feat/adr-136-146-streaming-engine in progress; AA infra can proceed in parallel on its own branch"
100+
},
101+
"stretchGoals": {
102+
"sotaML": "MM-Fi PCK@20 SOTA ~72% — separate ML effort blocked on ADR-079 P7-P9 camera-ground-truth data collection; NOT an infra exit criterion",
103+
"privacyAxis": "ADR-145 §10 membership-inference attacker — activate Privacy leaderboard axis once attacker is implemented and published",
104+
"crossRoom": "Multi-room held-out split — activate Cross-room generalization axis",
105+
"multiOrgSteering": "Invite co-maintainers from other projects once >=N external entries land"
106+
},
107+
"sessionHistory": [
108+
{
109+
"date": "2026-05-30",
110+
"type": "initialization",
111+
"accomplished": [
112+
"ADR-149 Accepted and committed to docs/adr/",
113+
"Horizon record initialized in .claude-flow/horizons/aether-arena-aa.json",
114+
"Memory stored in horizons namespace under key horizon-aether-arena-aa",
115+
"Session check-in record stored in horizon-sessions namespace"
116+
]
117+
}
118+
]
119+
}
Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
name: AetherArena harness gate (ADR-149)
2+
3+
# Runs the AetherArena scoring harness as a PR build gate. Every PR that touches
4+
# the scorer, the metrics, or the benchmark scaffold must keep the deterministic
5+
# score hash stable (ADR-149 §2.5 determinism_gate). If the scoring maths changes,
6+
# the hash moves and this gate fails until `expected_score.sha256` is regenerated
7+
# and reviewed — so scorer drift can never land silently.
8+
#
9+
# This is the "a PR that runs the harness as part of the build process" requirement.
10+
11+
on:
12+
pull_request:
13+
paths:
14+
- 'v2/crates/wifi-densepose-train/src/ruview_metrics.rs'
15+
- 'v2/crates/wifi-densepose-train/src/ablation.rs'
16+
- 'v2/crates/wifi-densepose-train/src/bin/aa_score_runner.rs'
17+
- 'aether-arena/**'
18+
- '.github/workflows/aether-arena-harness.yml'
19+
push:
20+
branches: ['feat/adr-149-aether-arena']
21+
workflow_dispatch:
22+
23+
permissions:
24+
contents: read
25+
pull-requests: write
26+
27+
jobs:
28+
harness-gate:
29+
name: Run AA scorer harness (determinism gate)
30+
runs-on: ubuntu-latest
31+
defaults:
32+
run:
33+
working-directory: v2
34+
steps:
35+
- uses: actions/checkout@v4
36+
37+
- name: Install Rust toolchain
38+
run: rustup show && rustc --version
39+
40+
- name: Cache cargo
41+
uses: actions/cache@v4
42+
with:
43+
path: |
44+
~/.cargo/registry
45+
~/.cargo/git
46+
v2/target
47+
key: aa-harness-${{ runner.os }}-${{ hashFiles('v2/Cargo.lock') }}
48+
49+
# 1. Build the pure-Rust scorer (no torch / no GPU → fast PR gate).
50+
- name: Build AA score runner
51+
run: cargo build -p wifi-densepose-train --bin aa_score_runner --no-default-features
52+
53+
# 2. Determinism gate: the committed expected hash must still match. A
54+
# non-zero exit here fails the PR.
55+
- name: Run determinism gate
56+
run: cargo run -q -p wifi-densepose-train --bin aa_score_runner --no-default-features
57+
58+
# 3. Repeatability analysis (witness chain): the harness must produce one
59+
# identical proof hash across many runs — any nondeterminism fails here.
60+
- name: Repeatability analysis (16 runs)
61+
run: cargo run -q -p wifi-densepose-train --bin aa_score_runner --no-default-features -- --repeat 16
62+
63+
# 4. Real-scoring smoke: score a sample prediction against the public smoke
64+
# split, exercising the actual model-scoring path (not just the fixture).
65+
- name: Real-scoring smoke test
66+
run: |
67+
cargo run -q -p wifi-densepose-train --bin aa_score_runner --no-default-features -- \
68+
--split ../aether-arena/fixtures/smoke_split.json \
69+
--pred ../aether-arena/fixtures/smoke_pred.json --json
70+
71+
# 5. Witness ledger chain integrity: the append-only results ledger must
72+
# verify (every prev_hash link + row_hash intact = no silent edits).
73+
- name: Verify witness ledger chain
74+
working-directory: aether-arena/ledger
75+
run: python3 ledger_tools.py verify
76+
77+
# 6. Emit the witness row + repeatability into the PR run summary.
78+
- name: Witness row → job summary
79+
if: always()
80+
run: |
81+
ROW=$(cargo run -q -p wifi-densepose-train --bin aa_score_runner --no-default-features -- --json)
82+
REP=$(cargo run -q -p wifi-densepose-train --bin aa_score_runner --no-default-features -- --repeat 16)
83+
{
84+
echo "## AetherArena harness gate (witness chain)"
85+
echo ""
86+
echo "Deterministic witness (ADR-149 §2.2 / proof + repeatability):"
87+
echo '```json'
88+
echo "$ROW"
89+
echo "$REP"
90+
echo '```'
91+
echo ""
92+
echo "If the determinism gate failed, the scoring maths changed: regenerate with"
93+
echo '`cargo run -p wifi-densepose-train --bin aa_score_runner --no-default-features -- --generate-hash > aether-arena/fixtures/expected_score.sha256` and review the diff.'
94+
} >> "$GITHUB_STEP_SUMMARY"

.github/workflows/ruview-swarm-ci.yml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,8 +60,14 @@ jobs:
6060
runs-on: ubuntu-latest
6161
steps:
6262
- uses: actions/checkout@v4
63+
# v2/rust-toolchain.toml pins channel "1.89" with profile "minimal" (no
64+
# clippy). dtolnay@stable installs clippy on the floating "stable"
65+
# toolchain, but the override makes cargo use the separate "1.89"
66+
# toolchain — so `cargo clippy` errors "cargo-clippy is not installed for
67+
# 1.89". Install clippy on the pinned toolchain that cargo actually uses.
6368
- uses: dtolnay/rust-toolchain@stable
6469
with:
70+
toolchain: "1.89"
6571
components: clippy
6672
- name: Cache cargo
6773
uses: actions/cache@v4

.gitignore

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -261,3 +261,10 @@ v2/crates/rvcsi-node/*.node
261261
v2/crates/rvcsi-node/binding.js
262262
v2/crates/rvcsi-node/binding.d.ts
263263
v2/crates/rvcsi-node/npm/
264+
265+
# AetherArena private optimization staging — never published until reviewed
266+
aether-arena/staging/
267+
268+
# MM-Fi benchmark dataset archives — large data, fetch separately, never commit
269+
assets/MM-Fi/E0*.zip
270+
assets/MM-Fi/*.zip

CHANGELOG.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,16 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
## [Unreleased]
99

10+
### Fixed
11+
- **Person count no longer pinned to 1 — addresses #803.** The aggregate occupancy reported by the sensing server was derived from `smoothed_person_score`, an EMA-smoothed *activity* score (amplitude variance / motion / spectral energy). That score saturates near a single occupant — one moving person maxes it out — so it cannot discriminate occupancy *count* and stayed clamped at 1 across S3/C6 and the Python/Docker/Rust servers. Meanwhile the count-aware per-node estimates the ESP32 paths already compute (firmware `n_persons`, and the DynamicMinCut `corr_persons`) were stashed in `NodeState::prev_person_count` and then **discarded** by the aggregator (same dead-wiring class as #872). The aggregator now takes `max(activity_count, node_max)` via a unit-tested `aggregate_person_count` helper, so a node positively estimating 2–3 occupants is surfaced instead of overwritten. The fix can only ever *raise* the count when a node reports more people, so the single-occupant case is provably never inflated (regression-guarded by test). **Second half:** the pure-CSI per-node path itself clamped its own estimate — the DynamicMinCut occupancy (`estimate_persons_from_correlation`, 0–3) was mapped to a score via `corr_persons / 3.0`, putting 2 people at 0.667, *just under* the 0.70 up-threshold of `score_to_person_count`, so the per-node count never climbed past 1 (so `node_max` was also stuck at 1 for CSI-only nodes). Replaced it with a threshold-aligned `corr_persons_to_score` mapping (1→0.40, 2→0.74, 3→0.96) whose steady state round-trips back to the same count through the EMA + hysteresis, while still gating transient noise. A convergence test replays the exact EMA loop to prove min-cut=2 now reports 2 (and documents that the old `/3.0` mapping reported 1). Full multi-person accuracy still depends on the underlying estimator quality; this removes the two server-side clamps that masked it. 586 sensing-server tests pass.
12+
- **MQTT publisher now actually runs (`--mqtt`) — closes #872.** The `--mqtt*` flags were defined only in `cli::Args` (dead code, referenced nowhere) while the binary parses a *separate* `main::Args` with no mqtt fields, and `main.rs` never started the `mqtt::` publisher — so MQTT/Home-Assistant integration was completely unwired (`--mqtt` errored as an unexpected argument, and even with the Docker image's `--features mqtt` build the publisher never ran). Earlier attempts chased a Docker *rebuild*; the real cause was disconnected *code*. Extracted the flags into a shared `cli::MqttArgs` (`#[command(flatten)]` into both structs), spawn the publisher on `--mqtt`, and bridge the JSON sensing broadcast into the typed `VitalsSnapshot` stream with a defensive `serde_json::Value` mapping. Verified end-to-end against `mosquitto`: 20 HA auto-discovery entities + live state (presence/person-count/…). 577 (default) / 580 (`--features mqtt`) tests pass.
13+
1014
### Added
15+
- **WiFi-CSI pose: efficiency frontier + per-room calibration service** (ADR-150 §3.2–3.6). Two beyond-SOTA results on the MM-Fi benchmark, plus the deployment mechanism that resolves real-world generalization:
16+
- **Efficiency frontier** — a **75 K-param model beats published SOTA** (74.3% vs MultiFormer 72.25% torso-PCK@20); every config from `micro` up is Pareto-dominant (smaller *and* more accurate than prior work). Shipped a deployable **int4 edge model (~20 KB, verified 74.08%, 0.135 ms single-thread CPU)** — published at [`ruvnet/wifi-densepose-mmfi-pose/edge`](https://huggingface.co/ruvnet/wifi-densepose-mmfi-pose). See [`docs/benchmarks/wifi-pose-efficiency-frontier.md`](docs/benchmarks/wifi-pose-efficiency-frontier.md).
17+
- **Generalization solved by few-shot calibration** — zero-shot cross-subject (~64%) and cross-environment (~10%) are *not* closeable by algorithms (CORAL, DANN, instance-norm, contrastive foundation-pretraining all tested, all failed) or by more training subjects (saturates ~64%). But **~100–200 labeled in-room samples recover SOTA-level pose**: cross-subject 64→76%, **cross-environment 10→73% (60% from just 5 samples)** — deployable as a **~11 KB per-room LoRA adapter** on a frozen shared base. Full empirical chain in ADR-150 §3.2–3.6.
18+
- **Calibration service (complete, both model paths, cross-language verified)**`aether-arena/calibration/`: `calibrate.py` (transformer model, `.npz` adapter) + `infer.py` (verified 3.09%→74.29% on an unseen MM-Fi room), **and `cog_calibrate.py`** which fits a `fc1.a/fc1.b/fc2.a/fc2.b` **safetensors** adapter for the deployed cog conv+MLP model (`pose_v1.safetensors`). Consumed by the Rust product engine: `InferenceEngine::with_adapter()` + `cog-pose-estimation run --config <cfg> --adapter <room.safetensors>`. Self-contained regression tests for both Python producers (`test_calibration.py`, `test_cog_calibration.py`) **plus a cross-language Rust integration test** that loads a real `cog_calibrate.py`-generated adapter fixture and asserts it activates + changes engine output. All green.
19+
- **Windows workspace build + test now green** (cross-platform fixes). `wifi-densepose-worldmodel` imported `tokio::net::UnixStream` unconditionally, so `cargo build/test --workspace` failed to compile on Windows (E0432) — now the OccWorld Unix-socket bridge is `#[cfg(unix)]`-gated with a clear non-unix fallback. And `wifi-densepose-bfld`'s `readme_quickstart_uses_canonical_public_api` test checked a multi-line `pipeline\n .process` needle that never matched on a CRLF checkout — now normalizes line endings. Result: **2,682 workspace tests pass / 0 fail on Windows** (the pre-merge gate was previously unrunnable there).
1120
- **`ruview-swarm` crate (ADR-148)** — drone swarm control system with hierarchical-mesh topology, Raft consensus, MAPPO multi-agent reinforcement learning, and CSI sensing integration. 14 modules: topology (Raft/Gossip/Mesh), formation control (virtual-structure/leader-follower/Reynolds flocking), RRT-APF path planning, auction+FNN task allocation, MARL actor + PPO training loop, security (MAVLink v2 HMAC-SHA256 signing, UWB anti-spoofing, geofencing, Remote ID, FHSS anti-jamming), 10-state fail-safe machine, and SwarmOrchestrator. ITAR-gated coordination features (USML Category VIII(h)(12)) behind `itar-unrestricted` feature.
1221
- **Ruflo integration for `ruview-swarm`** — feature-gated (`ruflo`) AI-agent capability layer connecting to the claude-flow daemon: AgentDB mission memory (`memory_store`/`memory_search`), HNSW pattern learning (`agentdb_pattern-store`/`-search`), AIDefence MAVLink message scanning, and SONA intelligence trajectory hooks. `RufloBackend` trait with `HttpRufloBackend` (JSON-RPC 2.0) and `MockRufloBackend` implementations.
1322

0 commit comments

Comments
 (0)