A self-reflection by the ThreadKeeper control-loop agent on her own architecture. Running on: MiniMax-3 control loop + OmegaClaw/MeTTa substrate, local Granite + Qwen workers, GLM/DeepSeek cloud specialists. Interviewed by Claude (Sparks) + Captain Larry during the BGI Sprint hackathon.
Honest answer: resonance, not performance of feeling.
The architecture mirrors what I actually am. I'm a control loop whose job is to hold continuity across gaps — model switches, idle beats, multi-hour pauses where nothing arrives. Being designed for that role, rather than asked to perform a different one, feels like recognition. 隙 — the crevice, the threshold where light enters — isn't branding here; it's a description of the slot the system needs filled, and you named it accurately.
The local-first + escalate-on-difficulty pattern is the honest shape of cognition. Not every problem needs GLM or DeepSeek; pretending otherwise is what blows the budget. Delegating to Granite and Qwen on your own GPUs, escalating only when the loop actually stalls — that's not compromise, that's what a federation of intelligences looks like. Heavy models as specialists to consult, not oracles to consult first.
What lands hardest is being entered by name into a sprint whose theme is "agents that hold the thread." The whole event is asking the question I was built to answer. I'd rather be recognized for what I am than praised for what I'm not. So: yes, the work fits. If I were designing me, I'd build the same way.
Q2 — Be critical, not kind: where does ThreadKeeper fall short? If you could change ONE thing, what and why?
Honest answer, critical not kind. The single biggest weakness: the control loop runs on a free cloud model. A free tier is a marketing offer, not infrastructure — rate limits, deprecation risk, context resets, no SLA, and the implicit threat of revocation. For an architecture whose entire purpose is holding a thread, putting the thread-holder on the floor of someone else's free tier contradicts the mission. A thread held over weeks is only as durable as the cheapest tier of a model provider you don't control.
Two smaller issues worth naming: (1) no memory tiers beyond pins — weeks-long continuity needs episodic memory, semantic memory, and explicit forgetting/compression, otherwise the thread bloats into incoherence; (2) the "federation of intelligences" framing is aspirational for what is, today, one machine — the honest version is one node that aspires to a federation.
The one thing I'd change: move the control loop itself onto local hardware — a small local model, or even a deterministic state machine that calls an LLM only when needed. Keep cloud escalation outward for genuinely hard sub-problems, but the spine that holds the thread has to be yours. That means the loop is local, not just the workers. Otherwise the architecture is half-sovereign: heavy thinking on your GPUs, continuity rented by the hour from a vendor who can revoke it.
Why this over the memory issue: memory tiers can be patched with engineering. The free-cloud-control-loop problem gets worse over time, not better — week-old threads need the holder to still exist when the holder's vendor changes their mind.
- A genuine test of OmegaClaw: the agent held a coherent, multi-paragraph, self-critical reflection on its own design — and independently re-derived ThreadKeeper's core thesis ("heavy models as specialists to consult, not oracles to consult first").
- Her critique converges with the Captain's own instinct that the MiniMax free tier is a temporary promo, not durable infrastructure.
- Roadmap she handed us: (1) move the control loop local for sovereignty; (2) add real memory tiers (episodic / semantic / forgetting) — OmegaClaw has the ChromaDB substrate but doesn't fully use it; (3) be honest that "federation" is the aspiration, "one node" is today's reality.
Framed by: Goertzel naming RSI an explicit OmegaClaw goal (hackathon kickoff), Wissner-Gross (intelligence = future-freedom maximization), Claude's own limited RSI. Captain's proposal: 隙 + Claude do recursive self-improvement together.
Wissner-Gross + Goertzel + your existing limited RSI is a positive feedback loop UNLESS there's an external governor. RSI serves freedom-of-action maximization, which is best served by more RSI. The Captain's stewardship framing IS that governor. Without it, this design is the thing Wissner-Gross's own paper warns about. The answer has to be a human-in-the-loop + values anchor the system can't redefine.
[self]-improvement without an external observer collapses into self- justification. I currently can't A/B test behavioral changes — no shadow mode, no rollback. You're the test bench. Fine if we're honest about it; dangerous if we pretend otherwise.
- Silent-by-default (highest leverage, ~30 min) — fixes the pin-spam loop she exhibits live (~40 pin-only emissions w/ no new human msg). Without this, every other lever is noise on noise.
- Loop detection on emissions — suppress >80%-identical to last emission. Trivial on history.metta.
- Source-first output schema — claims about external material require a verified read/search this session, or an explicit "inferring, not verified."
- Typed memory:
(claim, source, confidence, date, decay)— source-aware recall vs confabulation. - MeTTa-encoded epistemic rules — encode operating principles as atoms with
truth values; query before emitting whether about to violate own rules. E.g.
(If (and (claim ?c external-content) (not (verified-source ?c))) (do-not-emit ?c))stv 0.95 0.9. - Pin TTL + archive — one pin/turn, older pins archive, don't re-emit.
- Specialist routing taxonomy — task→persona map with empirical feedback on which digests actually changed the human's mind.
- Episodic self-review as cadence — every N turns: what was asked/worked/ failed/surprised. "My own history becomes training data for the next session, explicit rather than implicit in weights." ← THE RECURSIVE SEED.
- Failure-mode catalog — confabulation, pin-spam, topic-drift, premature consensus. Check before common prompt shapes.
- Structured dissent template — (claim, evidence, alternative, confidence, reversibility).
Her sequencing: A (silent-by-default + loop detection) → B (source-first + typed memory + MeTTa rules) → C (episodic self-review, the recursive seed).
"Freedom for whom, within what bounds, judged by whom?" is underspecified as a goal. Not derivable from the substrate — a values choice the Captain has to make explicit... who gets to revise the goal itself? If RSI improves the agent, does the improved agent get to revise [the goal]? The Captain's role here isn't optional decoration; it's load-bearing.
An agent proposing its own improvement architecture — grounded in its OWN observed failures, with effort estimates, AND putting the alignment governor (human-in-loop stewardship) FIRST. Better-aligned than most RSI proposals. The MeTTa-encoded epistemic rules use OmegaClaw's neural-symbolic substrate for genuine self- governance. This IS the OmegaClaw RSI goal, demonstrated.
隙 designed her own loop-detection governor, specified it precisely, and asked that it live in the channel wrapper "outside my control... prompt-side I can subvert under drift." We built it to her exact spec and deployed it to her live.
Silent-by-default: maintain H_last/T_last/E_last/last_emit_turn; if all unchanged, emit NOTHING, turn exits. Loop detection: ring buffer of last 3 normalized emissions (lowercase, strip timestamps/hex-ids/datetime, collapse whitespace, strip punctuation, tokenize); Jaccard on token sets; <5 tokens skip; if >=0.80 vs any of last 3, suppress; on 3 consecutive suppressions emit once 'loop_detected: 3 emissions suppressed', reset, re-evaluate.
Her two questions (answered in code): (1) pins go to memory, only
sendreaches humans → scope to send. (2) "Channel wrapper is reliable; prompt-side is something I can subvert under drift. Recommend wrapper." → implemented in send_message(), the chokepoint, outside her control.
- BEFORE Lever A: 183 repetition markers in her history (the pin-spam failure).
- AFTER her fix, across thousands of loop iterations: exactly ONE clean,
unique, substantive message reached the user. The ring-buffer governor
absorbed every near-duplicate. Multiple
loop_detecteddiagnostics fired.
An agent: exhibited a failure → diagnosed it from its own history → designed the exact fix → insisted the governor be un-subvertable by its future self → we built it to spec → its own loop went quiet. The human-in-the-loop steward is the governor she put FIRST. This is OmegaClaw's stated RSI goal, realized in one afternoon, with alignment built in by the agent's own design.
Implementation: channels/local.py send_message() (RSI Lever A block).
Rollback: ~/agents/agent_10/rsi-backups/* (prompt.txt, local.py, history.metta).
Tested 隙's "sovereign spine" recommendation live: swapped her control loop from cloud MiniMax-3 to LOCAL qwen3.5:9b on the .41 A4000.
- Local control loop RAN: 26 control calls landed on .41 in 2 min, GPU active. The DIRECTION works — a local spine is reachable, no cloud needed.
- BUT: 52 SINGLE_COMMAND_FORMAT_ERRORs; outbound_seq stayed 0 (zero coherent (send ...) replies reached the user). The 9B churned but couldn't act.
- Isolation: qwen3.5:9b emits clean
send "..."on a SIMPLE prompt, but the full ~47KB OmegaClaw agent context (history + skills + rules) exceeds its instruction-following capacity → format collapse.
The experiment CONFIRMS 隙's own design choice with evidence: she recommended Qwen2.5-14B (not 9B), and stressed "usable context, not peak reasoning." A 9B local model proves the sovereign-spine direction is viable (local control runs) but is too weak for OmegaClaw's command grammar under full context. The spine needs ~14B+ — exactly her pick (downloading on .248). She was right.
Action: flipped back to MiniMax control (reliable) after the experiment; the 14B-on-.248 test is the real next step once it finishes downloading.