Skip to content

Commit f0bcba3

Browse files
johnteeecursoragent
andcommitted
Complete P3 remediation: docs, CI governance gate, and docker-smoke policy.
Align README validation/tournament with Beta maturity, fix architecture section numbering, link threat-model rows to tests, expand governance-gate Phase 5 unit coverage, and document advisory docker-smoke in CONTRIBUTING. Constraint: docker-smoke remains continue-on-error; merge not blocked on Docker. Tested: governance-gate pytest subset (65 passed); pre-commit on changed files. Confidence: high Co-authored-by: Cursor <cursoragent@cursor.com>
1 parent 849eaf5 commit f0bcba3

6 files changed

Lines changed: 47 additions & 35 deletions

File tree

.github/workflows/ci.yml

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -191,12 +191,16 @@ jobs:
191191
- name: Check plan-before-write enforcement
192192
run: pytest tests/test_tranche_b_governance.py -k "plan" -v
193193

194+
- name: Run Phase 5 unit tests
195+
run: pytest tests/test_phase5_context_bus.py tests/test_phase5_workflow_engine.py tests/test_phase5_jit_approval_server.py tests/test_federated_sync.py tests/test_remediation_p1_p2.py -v
196+
194197
- name: Run Phase 4-5 acceptance and adversarial governance tests
195-
run: pytest tests/acceptance/test_consensus_flow.py tests/acceptance/test_sandbox_enhancement_flow.py tests/test_governance_adversarial_runtime.py tests/test_skill_executor.py tests/test_phase6_docker.py -v
198+
run: pytest tests/acceptance/test_consensus_flow.py tests/acceptance/test_sandbox_enhancement_flow.py tests/test_governance_adversarial_runtime.py tests/test_skill_executor.py -v
196199

197200
docker-smoke:
198201
runs-on: ubuntu-latest
199202
needs: lint
203+
# Advisory: Docker/Podman not guaranteed on all forks; failures do not block merge.
200204
continue-on-error: true
201205

202206
steps:

CONTRIBUTING.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,23 @@ Run the same checks as CI before opening a pull request:
2424
.venv/bin/pytest -q
2525
```
2626

27+
Pre-commit runs a smoke subset by default. For the full test suite locally:
28+
29+
```bash
30+
TEAAGENT_PRECOMMIT_FULL=1 pre-commit run --all-files
31+
```
32+
33+
### CI jobs
34+
35+
| Job | Blocks merge? | Purpose |
36+
|-----|---------------|---------|
37+
| `lint` | Yes | Ruff, mypy, format |
38+
| `governance-gate` | Yes | Governance fuzz, plan gate, Phase 4–5 acceptance, Phase 5 unit tests |
39+
| `docker-smoke` | **No** (`continue-on-error`) | `tests/test_phase6_docker.py` when Docker/Podman is available |
40+
| `package` | Yes | Wheel/sdist build |
41+
42+
Run `pytest tests/test_phase6_docker.py` locally before relying on container Code Mode in production.
43+
2744
## Pull Requests
2845

2946
- Keep changes small and focused.

README.md

Lines changed: 10 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -105,29 +105,32 @@ TeaAgent includes persistent memory features to learn from past mistakes and syn
105105
- Commands: `/pin <file>`, `/unpin <file>`, `/pinned` (list)
106106
- Visual indicator in prompt shows pinned file count (e.g., `teaagent📌2>`)
107107

108-
### 8. Self-Healing Validation (Foundation)
108+
### 8. Self-Healing Validation (Beta)
109109

110-
TeaAgent includes LSP/static analysis validation foundation for code correctness:
110+
LSP/static analysis validation is integrated with the agent runner and workflow engine:
111111

112112
**Validation Tools:**
113113
- Auto-detects available tools (ruff, mypy, tsc, eslint)
114114
- Validates code before committing changes
115115
- Supports Python, TypeScript, and JavaScript projects
116-
- Enable with `--validate` flag (opt-in)
116+
- Enable with `--validate` on `agent run` or via workflow self-healing steps
117117

118-
### 9. Tournament Selection (Foundation)
118+
See [maturity-matrix.md](docs/maturity-matrix.md) for surface status and test pointers.
119119

120-
TeaAgent includes tournament-style parallel execution foundation for architecture exploration:
120+
### 9. Tournament Selection (Beta)
121+
122+
Tournament-style parallel execution runs in `SwarmManager` with git worktree isolation,
123+
security-weighted scoring, and centralized approval queue integration:
121124

122125
**Parallel Execution:**
123126
- Create isolated git sandbox branches for multiple approaches
124127
- Auto-generate approach hints based on task keywords
125128
- Execute subagents in parallel with resource limits
126129
- Benchmark correctness, performance, and code quality
127130
- Compare approaches with weighted scoring
128-
- Enable with `--parallel N` flag (opt-in)
131+
- Enable with `--parallel N` on `agent run` (read-only analysis) or swarm/tournament modes
129132

130-
**Status:** Foundation implemented. Full integration with agent runner and human approval workflow documented in spec files for future implementation.
133+
**Status:** Beta — shipped in harness with governance gates; hosted tournament dashboards remain future work. See [maturity-matrix.md](docs/maturity-matrix.md).
131134

132135
### 10. Cognitive Swarm Evolution (Phase 5)
133136

docs/architecture.md

Lines changed: 6 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -80,8 +80,8 @@ The TeaAgent governance system has been hardened through a comprehensive 5-loop
8080

8181
Core Phase 4 (consensus) and Phase 5 (sandbox routing/execution) modules are shipped
8282
with CLI, unit tests, and E2E acceptance. Optional hardening (async vote polling,
83-
WASM skill execution, docker-smoke CI) is shipped; remaining Beta work is native
84-
WASM modules and deeper tournament benchmarks. See [backlog-priority.md](backlog-priority.md).
83+
WASM skill execution) is shipped; `docker-smoke` CI is advisory (see [CONTRIBUTING.md](../CONTRIBUTING.md#ci-jobs)).
84+
Remaining Beta work is native WASM modules and deeper tournament benchmarks. See [backlog-priority.md](backlog-priority.md).
8585

8686
### Phase 4: Federated Swarm Consensus & Peer Attestations — **Beta**
8787
- `ConsensusEngine`, peer registry, voting mechanisms, and attestation trail
@@ -193,18 +193,6 @@ Restricted Python execution with AST allow-list validation:
193193
Code Mode allows only a fixed set of AST nodes and builtin functions — no
194194
imports, no attributes, no arbitrary calls.
195195

196-
### 5. Code Mode
197-
198-
Restricted Python execution with AST allow-list validation:
199-
200-
| Backend | Isolation Level |
201-
|-----------------------|------------------------------------------------|
202-
| Child process (default)| `RLIMIT_CPU`, wall-clock timeout, advisory `RLIMIT_AS` |
203-
| Container | Docker/Podman: `--network none`, `--read-only`, `--cap-drop=ALL`, non-root, tmpfs, CPU/memory/PID limits, streaming output cap, image digest pinning, image allowlist |
204-
205-
Code Mode allows only a fixed set of AST nodes and builtin functions — no
206-
imports, no attributes, no arbitrary calls.
207-
208196
### 6. Governance Hardening (Tranche B)
209197

210198
**Plan-before-Write Enforcement:**
@@ -235,7 +223,7 @@ imports, no attributes, no arbitrary calls.
235223
- Tests conservative defaults and path filtering
236224
- Integrated into CI governance gate
237225

238-
### 8. OAuth 2.1 / DPoP
226+
### 7. OAuth 2.1 / DPoP
239227

240228
`OAuth21AuthorizationServer` and `OAuth21ResourceServer` implement the
241229
authorization code grant with PKCE (S256) and optional DPoP proof-of-possession:
@@ -247,7 +235,7 @@ authorization code grant with PKCE (S256) and optional DPoP proof-of-possession:
247235
- `SQLiteOAuthStore` provides durable client/authorization-code/nonce storage
248236
with PBKDF2-SHA256 client-secret hashing.
249237

250-
### 9. MCP Transport
238+
### 8. MCP Transport
251239

252240
Two transports share the same `handle_mcp_request()` dispatch:
253241

@@ -261,7 +249,7 @@ The HTTP server enforces:
261249
- `Mcp-Session-Id` session tracking.
262250
- Body size limits with `413` for oversized payloads.
263251

264-
### 8. LLM Integration
252+
### 9. LLM Integration
265253

266254
`teaagent.llm` provides a unified adapter layer (`LLMAdapter`) across 13
267255
registered providers in `PROVIDER_CONFIGS`: `claude`, `gpt`, `gemini`,
@@ -275,7 +263,7 @@ adapter implements `chat()` returning an `LLMResponse`. Features include:
275263
- Cost budget pre-flight.
276264
- Streaming via `stream=True` and `on_chunk` callbacks.
277265

278-
### 9. External Federation Boundary (ANP Adapter)
266+
### 10. External Federation Boundary (ANP Adapter)
279267

280268
TeaAgent treats ANP as an optional external federation surface through a
281269
bidirectional adapter boundary:

docs/plans/remediation-roadmap.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -48,12 +48,12 @@ Principle: **smallest verifiable step** per phase; no big-bang refactors.
4848

4949
| # | Task | Output |
5050
|---|------|--------|
51-
| P3.1 | Reconcile README Foundation vs maturity Beta | README + matrix sync |
52-
| P3.2 | Fix `architecture.md` duplicate section numbers | Editorial |
53-
| P3.3 | threat-model verification columns for rows 26, 32 | Link to new tests |
54-
| P3.4 | governance-gate: add phase5/6 unit files | `.github/workflows/ci.yml` |
51+
| P3.1 | Reconcile README Foundation vs maturity Beta | README + matrix sync |
52+
| P3.2 | Fix `architecture.md` duplicate section numbers | Editorial |
53+
| P3.3 | threat-model verification columns (Context Bus, async P2P, swarm, workflow rollback) | `docs/threat-model.md` |
54+
| P3.4 | governance-gate: Phase 5 unit files; docker only in `docker-smoke` | `.github/workflows/ci.yml` |
5555
| P3.5 | ✅ Optional pre-commit smoke (`TEAAGENT_PRECOMMIT_FULL=1` for full suite) | `.pre-commit-config.yaml` |
56-
| P3.6 | docker-smoke: decide block vs advisory | CI policy doc |
56+
| P3.6 | docker-smoke advisory (`continue-on-error`); documented in CONTRIBUTING | `CONTRIBUTING.md` |
5757
| P3.7 | ✅ Plugin fail-closed when `TEAAGENT_PLUGINS_STRICT=1` | S-H8 | `plugins.py` |
5858

5959
---

docs/threat-model.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -22,16 +22,16 @@ This document maps threats to mitigations and verification. It complements [tool
2222
| Provider response schema drift | Low | JSON schema for model decisions | `test_live_provider_conformance_flow.py` | Provider-specific quirks remain |
2323
| Unbounded run cost | Medium | `RunBudget`; iteration/tool/cost caps | `test_p0_harness.py`, `test_p0_slo_flow.py` | User must configure caps |
2424
| JIT approval server unresponsive during wait | High | Async `_wait_for_approval` using `asyncio.Event` + `asyncio.wait_for`; SSE server remains responsive | `tests/test_phase6_jit_server.py` | Fixed synchronous `time.sleep` spin-lock blocking event loop |
25-
| Context Bus SQLite lock contention / transaction leaks | Medium | Per-thread SQLite connections (`threading.local`); `timeout=5.0` on connect; WAL pragmas on each new connection; `_execute_with_retry` with exponential backoff (5 retries) + generation-based reconnect; explicit `conn.rollback()` on write failure; `cleanup_old_deltas` scoped to `workflow_id` | `tests/test_phase5_context_bus.py` (incl. parallel publish + workflow-scoped cleanup) | Global `_reconnect()` still closes all thread handles; see `docs/plans/remediation-roadmap.md` P2.1 |
25+
| Context Bus SQLite lock contention / transaction leaks | Medium | Per-thread SQLite connections (`threading.local`); `timeout=5.0` on connect; WAL pragmas on each new connection; `_execute_with_retry` with exponential backoff (5 retries) + generation-based reconnect (per-thread only); explicit `conn.rollback()` on write failure; `cleanup_old_deltas` scoped to `workflow_id` | `tests/test_phase5_context_bus.py` (incl. parallel publish + workflow-scoped cleanup), `tests/test_remediation_p1_p2.py` | |
2626
| Federated sync state corruption on crash | Medium | `atomic_write_text` + file lock on `federated_sync_state.json`; lock on pending changes | `tests/test_federated_sync.py` | File-based multi-sig quorum still experimental |
2727
| JIT approval server race on approve/reject | Medium | `threading.Lock` on `_requests` / `_pending_events` | `tests/test_phase5_jit_approval_server.py` | Approve from thread without running event loop still drops SSE broadcast |
28-
| Asyncio event loop starvation from synchronous P2P approval polling | High | `collect_approval_signatures` converted to `async def` with `asyncio.sleep`; `_collect_peer_signatures` dispatches via `run_coroutine_threadsafe` or `asyncio.run()` | | Previously 5-minute synchronous `time.sleep` polling blocked event loop — now fully non-blocking |
28+
| Asyncio event loop starvation from synchronous P2P approval polling | High | `collect_approval_signatures` is `async def` with `asyncio.sleep`; blocking I/O uses `run_in_executor`; `_collect_peer_signatures` dispatches via `run_coroutine_threadsafe` or `asyncio.run()` | `tests/test_federated_sync.py`, `tests/test_policy.py` (`MultiSigQuorumTests`) | Dedicated async poll unit test still optional |
2929
| Shell normalization bypass via brace expansion / process substitution | High | Multi-pass `_normalize_shell_arg` now handles `{a,b}` expansion, `<()` process substitution, and non-string/non-list fallback | `tests/test_policy.py` | Catches `/pr{od,oduction}`, `<(echo /prod)`, dict-type command args |
3030
| Protected directory bypass via alternate write tools | High | `workspace_write_*` tool pattern + `.git*` argument pattern covers all write tools and subdirectory contents | `tests/test_policy.py`, `tests/test_file_policy.py` | Previously only `workspace_write_file` was covered |
31-
| Swarm hang / undetected thread deadlock | High | `ThreadPoolExecutor.as_completed(timeout=...)` with partial result collection; `Subagent` tracks `is_running`/`last_heartbeat`; `_heartbeat_monitor_loop` uses thread-ref liveness instead of defunct PID-based `is_process_alive` | `tests/test_swarm.py` | Previously heartbeat monitor checked parent PID (always alive) — now detects actual thread hangs |
31+
| Swarm hang / undetected thread deadlock | High | `ThreadPoolExecutor.as_completed(timeout=...)` with partial result collection; `Subagent` tracks `is_running`/`last_heartbeat`; `_heartbeat_monitor_loop` uses thread-ref liveness instead of defunct PID-based `is_process_alive`; heartbeat hangs merged into swarm `results` | `tests/test_swarm.py`, `tests/test_remediation_p1_p2.py` | Previously heartbeat monitor checked parent PID (always alive) — now detects actual thread hangs |
3232
| Git stash stack corruption in parallel sandboxes | Critical | `stash_save` returns actual stash reflog selector; `stash_pop` accepts specific ref | `tests/test_sandbox.py` | Previously hardcoded `stash@{0}` caused cross-agent stash confusion |
3333
| Workflow self-healing infinite recursion | High | `_execute_step` accepts `current_attempt` parameter preserved across recursive re-execution; abort guard checks attempts against max before proceeding | `tests/test_phase5_workflow_engine.py` | Previously `self_healing_attempts` reset to 0 on new `StepExecution` — now passed through recursion chain |
34-
| Workflow strict validation rollback never executed | High | `execute_workflow` integrates `UndoJournal` + `AuditLogger`; checks `result.requires_rollback` and calls `journal.restore()` on strict validation failure | | Previously `requires_rollback` flag set but never consumed — now triggers full workspace undo |
34+
| Workflow strict validation rollback never executed | High | `execute_workflow` integrates `UndoJournal` + `AuditLogger`; checks `result.requires_rollback` and calls `journal.restore()` on strict validation failure | `tests/test_remediation_p1_p2.py` (`WorkflowRollbackTests`), `tests/test_phase5_workflow_engine.py` | Previously `requires_rollback` flag set but never consumed — now triggers full workspace undo |
3535

3636
## Trust Boundaries
3737

0 commit comments

Comments
 (0)