You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is this a new feature, an improvement, or a change to existing functionality?
Improvement
How would you describe the priority of this feature request
High
Please provide a clear description of problem this feature solves
This is a framework proposal, not code. Current state doesn't have auditability or task tracking, which is essential for agent testing.
Parallel execution agent management.
Task Bus (bus.py) — State Machine
queued → running → completed/failed/aborted/timed_out
Every state transition:
SET hcli:task:{id}:state with TTL
PUBLISH hcli:task:{id}:notify for subscribers
HMAC-SHA256 signed results to prevent spoofing
Dispatcher (dispatcher.py) — Concurrent Execution
ThreadPoolExecutor with semaphore gating (MAX_CONCURRENT_TASKS)
Per-chat serialization — tasks from the same chat are serialized via locks to protect session history
BLPOP on Redis queue with backpressure (semaphore acquired before popping)
Heartbeat file for health monitoring
Graceful SIGTERM shutdown with in-flight task drain
Worker (worker.py) — Task Execution with Full Context
Builds per-task system prompts with session memory + skill injection
Spawns Claude as a subprocess with start_new_session=True (own process group for clean kill)
Abort mechanism: Redis pub/sub control channel per task, listener thread that os.killpg() on abort
Timeout enforcement (TASK_TIMEOUT, default 600s)
Session chunking to disk when size exceeds MAX_SESSION_BYTES
Conversation history tracking in Redis with idle sweep
The ANNOUNCEMENT/REPLY Protocol (development process)
This is separate from the runtime — it's how the codebase itself was built. The docs/decisions/ directory contains the actual
artifacts:
architect-report-redis.md — The architect agent's report after 3 discussion rounds with 4 expert teams
core-reply.md — Core team's analysis of how the dispatcher split affects their MCP contract
orchestration-reply.md — Orchestration team's analysis of why single-container is correct
interface-reply.md, llm-reply.md — Other teams' responses
These show the actual protocol in action:
Architect pushes ANNOUNCEMENT.md to each team's branch
Each team analyzes independently, pushes REPLY.md
Architect synthesizes into architectural decisions (AD-1 through AD-12)
Contracts between teams are documented explicitly
For example, the architect-report shows how 4 AI expert teams debated MCP-over-Redis across 3 rounds and ultimately rejected it
(AD-12), with clear reasoning from each team about risks, call chains, and container topology.
The Fundamental Gap
┌─────────────────────────────┬────────────────────────────────────────┬────────────────────────────────────────────────────────┐
│ Capability │ NeMo Agent Toolkit │ h-cli │
├─────────────────────────────┼────────────────────────────────────────┼────────────────────────────────────────────────────────┤
│ Parallel execution │ asyncio.gather() on tool calls │ ThreadPoolExecutor + semaphore + per-chat locks │
├─────────────────────────────┼────────────────────────────────────────┼────────────────────────────────────────────────────────┤
│ Task state tracking │ None — fire and forget │ Full state machine with Redis persistence + TTLs │
├─────────────────────────────┼────────────────────────────────────────┼────────────────────────────────────────────────────────┤
│ Result integrity │ Plain string concatenation │ HMAC-SHA256 signed results │
├─────────────────────────────┼────────────────────────────────────────┼────────────────────────────────────────────────────────┤
│ Abort/cancel │ Not supported │ Redis control channel + os.killpg() │
├─────────────────────────────┼────────────────────────────────────────┼────────────────────────────────────────────────────────┤
│ Crash recovery │ Not supported │ Startup scan marks orphaned running → failed │
├─────────────────────────────┼────────────────────────────────────────┼────────────────────────────────────────────────────────┤
│ Agent coordination protocol │ LLM chooses next tool via ReAct prompt │ Git branches + Redis pub/sub + ANNOUNCEMENT/REPLY docs │
├─────────────────────────────┼────────────────────────────────────────┼────────────────────────────────────────────────────────┤
│ Session continuity │ None between agent calls │ Redis session history + disk chunking + idle sweep │
├─────────────────────────────┼────────────────────────────────────────┼────────────────────────────────────────────────────────┤
│ Health monitoring │ None │ Heartbeat file + Docker healthcheck │
├─────────────────────────────┼────────────────────────────────────────┼────────────────────────────────────────────────────────┤
│ Observability │ logger.info() with duration │ Redis counters + TimescaleDB + Grafana │
├─────────────────────────────┼────────────────────────────────────────┼────────────────────────────────────────────────────────┤
│ Notification │ None │ PUBLISH on every state transition │
└─────────────────────────────┴────────────────────────────────────────┴────────────────────────────────────────────────────────┘
Describe your ideal solution
The Setup
One tmux session. One operator. One architect agent. Eight expert teams — each an independent Claude instance in its own tmux pane.
flowchart LR
OP["Operator\n(human)"] -->|"direction"| AR["Architect\n(Claude)"]
AR -->|"tasks via\ngit + Redis"| teams
teams -->|"branches +\ndone signals"| AR
AR -->|"results"| OP
subgraph teams ["Expert Teams — each a separate Claude instance"]
T1["orchestration"] ~~~ T2["interface"] ~~~ T3["core"] ~~~ T4["llm"]
T5["monitor"] ~~~ T6["hssh"] ~~~ T7["knowledge"] ~~~ T8["security"]
end
style OP fill:#c62828,color:#fff,stroke:#b71c1c
style AR fill:#1565c0,color:#fff,stroke:#0d47a1
style teams fill:#1a1a2e,color:#e0e0e0,stroke:#4a4a6a
Loading
How It Works
The operator tells the architect what to build. The architect breaks it into scoped tasks, writes an ANNOUNCEMENT.md for each team, pushes branches, and notifies teams via Redis. Each expert reads its task, implements it in its own directory, pushes a branch with a REPLY.md, and signals done. The architect reviews, merges, and reports back.
No expert ever talks to another expert. All coordination flows through the architect.
sequenceDiagram
participant O as Operator
participant A as Architect
participant R as Redis
participant E1 as Core Team
participant E2 as Interface Team
O->>A: "Add output sanitization"
A->>A: Create branches with ANNOUNCEMENT.md
A->>R: PUBLISH round "core interface"
A->>R: PUBLISH msg:core "pull branch, check ANNOUNCEMENT.md"
A->>R: PUBLISH msg:interface "pull branch, check ANNOUNCEMENT.md"
R->>E1: Task notification
R->>E2: Task notification
par Parallel execution
E1->>E1: Read task, implement, push branch
E2->>E2: Read task, implement, push branch
end
E1->>R: PUBLISH done "core"
E2->>R: PUBLISH done "interface"
R->>A: "All teams done: core interface"
A->>A: Review branches, merge to main
A->>O: Done — here's what changed
Each pane is a separate Claude Code instance. They share only the git repo and a Redis instance. The conductor (a small shell script on the Redis pane) tracks which teams have signaled done and notifies the architect when a round is complete.
The Rules
Strict conventions prevent chaos:
Experts stay in scope. Each team owns one directory. Edits outside it are forbidden unless the task explicitly allows it.
Communication is async. Tasks go out via git branches + Redis. Results come back via git branches + Redis. No shared state, no direct messaging.
Rounds are atomic. The architect declares a round, all teams execute in parallel, all signal done, then the architect merges. No partial merges mid-round.
Main stays clean. No communication artifacts (ANNOUNCEMENT.md, REPLY.md) reach the main branch. The architect strips them during merge.
Pull before work. Every team pulls main before starting its task branch — prevents divergence.
Push before signal. A "done" signal without a pushed branch is useless. Push first, signal second.
What This Means
The entire codebase — 12 Docker services, 45 security hardening items, two network topologies, an Asimov-inspired AI firewall, session management, skill teaching, vector memory, and monitoring — was built through this process. One human steering, AI agents executing in parallel, strict protocols preventing them from stepping on each other.
The operator never wrote code. The architect never read implementation details. The experts never coordinated directly. Each role stayed in its lane, and the system grew commit by commit.
(private)670+ commits. Zero merge conflicts from scope violations (after the first week).
Is this a new feature, an improvement, or a change to existing functionality?
Improvement
How would you describe the priority of this feature request
High
Please provide a clear description of problem this feature solves
This is a framework proposal, not code. Current state doesn't have auditability or task tracking, which is essential for agent testing.
Parallel execution agent management.
Task Bus (bus.py) — State Machine
queued → running → completed/failed/aborted/timed_out
Every state transition:
Dispatcher (dispatcher.py) — Concurrent Execution
Worker (worker.py) — Task Execution with Full Context
The ANNOUNCEMENT/REPLY Protocol (development process)
This is separate from the runtime — it's how the codebase itself was built. The docs/decisions/ directory contains the actual
artifacts:
These show the actual protocol in action:
For example, the architect-report shows how 4 AI expert teams debated MCP-over-Redis across 3 rounds and ultimately rejected it
(AD-12), with clear reasoning from each team about risks, call chains, and container topology.
The Fundamental Gap
┌─────────────────────────────┬────────────────────────────────────────┬────────────────────────────────────────────────────────┐
│ Capability │ NeMo Agent Toolkit │ h-cli │
├─────────────────────────────┼────────────────────────────────────────┼────────────────────────────────────────────────────────┤
│ Parallel execution │ asyncio.gather() on tool calls │ ThreadPoolExecutor + semaphore + per-chat locks │
├─────────────────────────────┼────────────────────────────────────────┼────────────────────────────────────────────────────────┤
│ Task state tracking │ None — fire and forget │ Full state machine with Redis persistence + TTLs │
├─────────────────────────────┼────────────────────────────────────────┼────────────────────────────────────────────────────────┤
│ Result integrity │ Plain string concatenation │ HMAC-SHA256 signed results │
├─────────────────────────────┼────────────────────────────────────────┼────────────────────────────────────────────────────────┤
│ Abort/cancel │ Not supported │ Redis control channel + os.killpg() │
├─────────────────────────────┼────────────────────────────────────────┼────────────────────────────────────────────────────────┤
│ Crash recovery │ Not supported │ Startup scan marks orphaned running → failed │
├─────────────────────────────┼────────────────────────────────────────┼────────────────────────────────────────────────────────┤
│ Agent coordination protocol │ LLM chooses next tool via ReAct prompt │ Git branches + Redis pub/sub + ANNOUNCEMENT/REPLY docs │
├─────────────────────────────┼────────────────────────────────────────┼────────────────────────────────────────────────────────┤
│ Session continuity │ None between agent calls │ Redis session history + disk chunking + idle sweep │
├─────────────────────────────┼────────────────────────────────────────┼────────────────────────────────────────────────────────┤
│ Health monitoring │ None │ Heartbeat file + Docker healthcheck │
├─────────────────────────────┼────────────────────────────────────────┼────────────────────────────────────────────────────────┤
│ Observability │ logger.info() with duration │ Redis counters + TimescaleDB + Grafana │
├─────────────────────────────┼────────────────────────────────────────┼────────────────────────────────────────────────────────┤
│ Notification │ None │ PUBLISH on every state transition │
└─────────────────────────────┴────────────────────────────────────────┴────────────────────────────────────────────────────────┘
Describe your ideal solution
The Setup
One tmux session. One operator. One architect agent. Eight expert teams — each an independent Claude instance in its own tmux pane.
flowchart LR OP["Operator\n(human)"] -->|"direction"| AR["Architect\n(Claude)"] AR -->|"tasks via\ngit + Redis"| teams teams -->|"branches +\ndone signals"| AR AR -->|"results"| OP subgraph teams ["Expert Teams — each a separate Claude instance"] T1["orchestration"] ~~~ T2["interface"] ~~~ T3["core"] ~~~ T4["llm"] T5["monitor"] ~~~ T6["hssh"] ~~~ T7["knowledge"] ~~~ T8["security"] end style OP fill:#c62828,color:#fff,stroke:#b71c1c style AR fill:#1565c0,color:#fff,stroke:#0d47a1 style teams fill:#1a1a2e,color:#e0e0e0,stroke:#4a4a6aHow It Works
The operator tells the architect what to build. The architect breaks it into scoped tasks, writes an
ANNOUNCEMENT.mdfor each team, pushes branches, and notifies teams via Redis. Each expert reads its task, implements it in its own directory, pushes a branch with aREPLY.md, and signals done. The architect reviews, merges, and reports back.No expert ever talks to another expert. All coordination flows through the architect.
sequenceDiagram participant O as Operator participant A as Architect participant R as Redis participant E1 as Core Team participant E2 as Interface Team O->>A: "Add output sanitization" A->>A: Create branches with ANNOUNCEMENT.md A->>R: PUBLISH round "core interface" A->>R: PUBLISH msg:core "pull branch, check ANNOUNCEMENT.md" A->>R: PUBLISH msg:interface "pull branch, check ANNOUNCEMENT.md" R->>E1: Task notification R->>E2: Task notification par Parallel execution E1->>E1: Read task, implement, push branch E2->>E2: Read task, implement, push branch end E1->>R: PUBLISH done "core" E2->>R: PUBLISH done "interface" R->>A: "All teams done: core interface" A->>A: Review branches, merge to main A->>O: Done — here's what changedThe tmux Layout
Each pane is a separate Claude Code instance. They share only the git repo and a Redis instance. The conductor (a small shell script on the Redis pane) tracks which teams have signaled done and notifies the architect when a round is complete.
The Rules
Strict conventions prevent chaos:
ANNOUNCEMENT.md,REPLY.md) reach the main branch. The architect strips them during merge.What This Means
The entire codebase — 12 Docker services, 45 security hardening items, two network topologies, an Asimov-inspired AI firewall, session management, skill teaching, vector memory, and monitoring — was built through this process. One human steering, AI agents executing in parallel, strict protocols preventing them from stepping on each other.
The operator never wrote code. The architect never read implementation details. The experts never coordinated directly. Each role stayed in its lane, and the system grew commit by commit.
(private)670+ commits. Zero merge conflicts from scope violations (after the first week).
Additional context
https://github.com/h-network/h-cli/blob/main/docs/H-CLI-DEVELOPMENT-EXPLAINED.md
Code of Conduct