Skip to content

Commit 31735c5

Browse files
committed
feat(voice): conversation-guided elevation + spoken interface configuration (ADR-110 D3b)
Voice inside the immersive interface becomes the primary elevation guide and a control surface for the interface itself — fully local (Whisper STT in, Kokoro TTS out), per ADR-110 D3b. - src/actors/elevation_voice.rs: pure voice-signal layer — normalised n-gram ConceptIndex over elevatable labels (frontier stubs + working pages), greedy longest-match mention harvesting, VoiceDemandLedger with 30-minute half-life decay and excerpt/speaker provenance, explicit-intent parsing ('elevate X', 'formalise X', 'make X a class') - elevation_actor: transcription stream subscription; conversational demand is now the PRIMARY candidate ranking (degree breaks ties / carries the queue when nobody is talking); voice-driven cases open at high priority with mention counts and utterance excerpts in the review surface; explicit spoken commands jump the queue and are confirmed aloud via Kokoro; concept index rebuilt from each graph snapshot - src/actors/voice_interface_actor.rs: spoken view/graph configuration requests route to the SAME settings assistant the Control Center command box drives (settings_assistant_task extracted from bots_handler for reuse); conservative verb+noun intent gate keeps ordinary speech out; spoken confirmations; active whenever the local speech stack is up - per-user/room attribution slots in when the XR LiveKit voice path lands — the ledger already models speakers 20 module tests green (7 ACSP contract, 11 elevation+voice, 2 interface intent); clippy clean. Co-Authored-By: jjohare <github@thedreamlab.uk>
1 parent 88499b0 commit 31735c5

7 files changed

Lines changed: 831 additions & 62 deletions

File tree

docs/adr/ADR-110-agentic-actors-acsp-control-surfaces.md

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -90,6 +90,32 @@ infrastructure; actors compose it rather than re-implementing transport.
9090
queued candidates: sync governance (force-resync action), physics health,
9191
agent telemetry.
9292

93+
### D3b — Voice is a first-class guide (local Whisper in, local Kokoro out)
94+
95+
Conversation inside the immersive interface guides elevation **primarily**,
96+
and drives interface configuration. Both directions stay sovereign: STT is
97+
the local Whisper provider, confirmations speak back through local Kokoro
98+
TTS. Two consumers tap `SpeechService::subscribe_to_transcriptions`:
99+
100+
- **Elevation demand** (`src/actors/elevation_voice.rs`): transcription lines
101+
are matched against a normalised n-gram index of the graph's elevatable
102+
labels (frontier stubs + working pages, rebuilt per cycle). Mentions feed a
103+
decaying demand ledger (30-minute half-life) that becomes the *primary*
104+
candidate ranking — degree only breaks ties or carries the queue when
105+
nobody is talking. Voice-driven cases carry provenance (mention counts,
106+
utterance excerpts, speakers when the voice path attributes them) and open
107+
at `high` priority. Explicit commands — "elevate X", "formalise X", "make X
108+
a class" — jump the queue entirely and are confirmed aloud. Per-user/room
109+
attribution slots in when the XR voice path (LiveKit) lands; the ledger
110+
already models speakers.
111+
- **Interface configuration** (`src/actors/voice_interface_actor.rs`): spoken
112+
view/graph requests ("hide the ontology nodes", "increase spring strength")
113+
route to the **same settings assistant** the Control Center command box
114+
drives (`settings_assistant_task` → agentbox LLM → settings REST), with a
115+
conservative verb+noun intent gate so ordinary conversation is never
116+
hijacked. One assistant, two mouths: typed in the UX control centre or
117+
spoken in the immersive session.
118+
93119
### D4 — Elevation, the flagship case
94120

95121
`ElevationActor` (env-gated: `ELEVATION_ACTOR_ENABLED=1`):

0 commit comments

Comments
 (0)