Skip to content

Commit 3a85773

Browse files
johnteeecursoragent
andcommitted
Deliver P4 strategic tranche: ADR, P2P auth token, async signature tests.
Document JSONL/NFS and MCP TLS posture (ADR 0008), optional TEAAGENT_FEDERATED_SIGNATURE_TOKEN for file-based multi-sig, fix signature poll when pending_approvals is created late, verify jaraco.context>=6.1.0 in selftest, and add async collect_approval_signatures regression tests. Constraint: HTTP P2P transport and SQLite audit migration remain future work. Tested: tests/test_federated_sync.py (21 passed); pre-commit pytest smoke (106 passed). Confidence: high Co-authored-by: Cursor <cursoragent@cursor.com>
1 parent f0bcba3 commit 3a85773

10 files changed

Lines changed: 291 additions & 18 deletions

SECURITY.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -205,6 +205,11 @@ Remaining concurrency limitations:
205205

206206
## Dependency auditing
207207

208+
**CVE-2026-23949 (`jaraco.context` Zip Slip):** TeaAgent constrains transitive
209+
installs to `jaraco-context>=6.1.0` via `[tool.uv] constraint-dependencies` in
210+
`pyproject.toml`. `teaagent selftest` fails if an older `jaraco.context` is present
211+
in the environment (Dependabot alert #10 should clear after lockfile rescan).
212+
208213
CI runs `pip-audit` on the editable install and on `uv export` output (see
209214
`.github/workflows/security.yml`). Local check:
210215

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
# ADR 0008: P4 strategic posture (storage, TLS, P2P auth)
2+
3+
## Status
4+
5+
Accepted — incremental delivery; full multi-node production remains future work.
6+
7+
## Context
8+
9+
Post-audit remediation P1–P3 closed harness correctness, docs, and CI gates.
10+
P4 items are quarter-scale: distributed JSONL, MCP TLS, authenticated P2P
11+
multi-sig transport, and transitive dependency hygiene (Dependabot #10 /
12+
CVE-2026-23949 on `jaraco.context`).
13+
14+
## Decisions
15+
16+
### 1. JSONL locking and NFS
17+
18+
- **Single-node / single-writer workspace:** keep `fcntl.LOCK_EX` on audit and
19+
memory JSONL; `atomic_write_text` for RunStore and federated state files.
20+
- **NFS or multi-writer shared roots:** not supported for JSONL stores; operators
21+
must use one writer per workspace or migrate hot paths to SQLite/Postgres
22+
(OAuth store and Context Bus already SQLite-backed).
23+
- **Migration path (future):** optional `SQLiteAuditStore` / shared DB for
24+
RunStore behind feature flags — not started in P4 tranche.
25+
26+
### 2. MCP TLS
27+
28+
- **No native TLS in `serve_mcp_http`** — same as ADR 0005; terminate TLS at
29+
Caddy/nginx (see `templates/reverse-proxy/`).
30+
- **`TEAAGENT_STRICT_LOCAL=1`** requires bearer/OAuth even on loopback for MCP HTTP.
31+
32+
### 3. Authenticated P2P multi-sig (file transport)
33+
34+
- File-based multi-sig remains **experimental** (not production WAN transport).
35+
- **P4 tranche:** optional `TEAAGENT_FEDERATED_SIGNATURE_TOKEN` — signature JSON
36+
files must carry matching `auth_token` when the env var is set.
37+
- **Future:** HTTP webhook channel with Bearer token (same shape as vote relay);
38+
reuse `surface_auth` token files.
39+
40+
### 4. `jaraco.context` (CVE-2026-23949)
41+
42+
- Constrain transitive installs to **`jaraco-context>=6.1.0`** via `[tool.uv]`
43+
`constraint-dependencies` in `pyproject.toml`.
44+
- Security selftest verifies installed version when the package is present.
45+
46+
## Consequences
47+
48+
- Operators on NFS must not run concurrent TeaAgent writers on the same JSONL paths.
49+
- WAN MCP and federated multi-sig require reverse proxy + env tokens until HTTP P2P ships.
50+
- Dependabot alert #10 should clear once GitHub rescans `uv.lock` at 6.1.2+.
51+
52+
## Alternatives considered
53+
54+
- **Native MCP TLS:** rejected — duplicates proxy features and cert rotation burden.
55+
- **Immediate JSONL→DB migration:** deferred — large schema and replay compatibility work.

docs/architecture.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -306,11 +306,14 @@ User / CLI
306306
| `AuditLogger` | JSONL | `fcntl.LOCK_EX` + `fsync` | Per-run event log |
307307
| `MemoryCatalog` | JSONL | `fcntl.LOCK_EX` + `fsync` | Workspace observations |
308308
| `RunStore` | JSONL | `atomic_write_text` (lock + replace) | Run history and replay |
309-
| `UltraworkStore` | JSONL | `atomic_write_text` | Worker lifecycle records |
309+
| `UltraworkStore` | JSONL | `atomic_write_text` | Worker lifecycle records |
310310
| `SQLiteOAuthStore`| SQLite | WAL + `BEGIN IMMEDIATE` | OAuth clients/codes/nonces |
311311
| `ContextBus` | SQLite | WAL; per-thread connections | Cross-agent Delta cards |
312312
| `FederatedGraphSync` | JSON | none (single-writer file) | Graph sync state + exports |
313313

314+
JSONL rows assume a **single writer per workspace** on a local or advisory-lock-safe
315+
filesystem. NFS multi-writer shared roots are unsupported — see [ADR 0008](adr/0008-p4-strategic-posture.md).
316+
314317
All state is externalized to the filesystem. In-memory runner state is
315318
temporary only — every meaningful event persists to disk before the caller
316319
sees the result.

docs/http-surface-auth.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@ When `--api-token-file` is omitted, relay serve loads the first existing file:
3939
| `TEAAGENT_ALLOW_DEV_SIGNATURES=1` | Allow dev-hash signatures (multi-sig / relay dev mode only) |
4040
| `TEAAGENT_PLUGINS_STRICT=1` | Block unverified plugin entry points (site-packages / unknown source) |
4141
| `TEAAGENT_PRECOMMIT_FULL=1` | Run full pytest in pre-commit (default: smoke subset) |
42+
| `TEAAGENT_FEDERATED_SIGNATURE_TOKEN` | Require matching `auth_token` on file-based P2P approval signatures |
4243

4344
## Headers
4445

@@ -80,6 +81,26 @@ teaagent consensus relay serve \
8081

8182
Requires client certificates signed by `client-ca.pem`.
8283

84+
## MCP HTTP TLS (native TLS not supported)
85+
86+
`teaagent mcp serve --http` does **not** terminate TLS in-process (see
87+
[ADR 0005](adr/0005-mcp-streamable-http.md) and [ADR 0008](adr/0008-p4-strategic-posture.md)).
88+
For remote clients:
89+
90+
1. Bind MCP to loopback (`127.0.0.1`) or a private interface.
91+
2. Terminate TLS at Caddy/nginx using [`templates/reverse-proxy/`](../templates/reverse-proxy/).
92+
3. Set `TEAAGENT_STRICT_LOCAL=1` when the proxy forwards to loopback so bearer/OAuth
93+
is still required on the upstream hop.
94+
95+
## Federated multi-sig file transport
96+
97+
| Variable | Effect |
98+
|----------|--------|
99+
| `TEAAGENT_FEDERATED_SIGNATURE_TOKEN` | When set, P2P signature files under `.teaagent/pending_approvals/` must include matching `auth_token` |
100+
101+
File-based multi-sig remains experimental; WAN deployments should use relay/control-plane
102+
Bearer tokens until HTTP P2P transport ships (ADR 0008).
103+
83104
## Reverse proxy templates
84105

85106
See [`templates/reverse-proxy/`](../templates/reverse-proxy/) for Caddy and nginx examples that:

docs/plans/remediation-roadmap.md

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -60,12 +60,16 @@ Principle: **smallest verifiable step** per phase; no big-bang refactors.
6060

6161
## Phase P4 — Strategic (quarters)
6262

63-
| Item | Reference |
64-
|------|-----------|
65-
| Distributed JSONL locks / DB migration | threat-model, NFS note |
66-
| Native MCP TLS | reverse-proxy termination |
67-
| Authenticated P2P multi-sig transport | federated_sync HTTP channel |
68-
| Dependabot #10 dependency bump | Security UI |
63+
| # | Item | Status | Reference |
64+
|---|------|--------|-----------|
65+
| P4.1 | JSONL / NFS posture + migration ADR | ✅ tranche | [ADR 0008](../adr/0008-p4-strategic-posture.md), threat-model NFS row |
66+
| P4.2 | MCP TLS via reverse proxy (no native TLS) | ✅ documented | [http-surface-auth.md](../http-surface-auth.md), ADR 0008 |
67+
| P4.3 | File P2P signature `auth_token` when `TEAAGENT_FEDERATED_SIGNATURE_TOKEN` set | ✅ shipped | `federated_sync.py`, `security_env.py` |
68+
| P4.3b | HTTP P2P multi-sig channel | 🔲 future | ADR 0008 — reuse relay bearer shape |
69+
| P4.4 | `jaraco.context` CVE-2026-23949 (Dependabot #10) | ✅ constrained | `pyproject.toml` `jaraco-context>=6.1.0`, `selftest` version check |
70+
| P4.5 | Async `collect_approval_signatures` unit tests | ✅ shipped | `tests/test_federated_sync.py` |
71+
72+
Full SQLite audit/run migration remains **P4+ / backlog** — not started.
6973

7074
---
7175

docs/threat-model.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,13 +25,14 @@ This document maps threats to mitigations and verification. It complements [tool
2525
| Context Bus SQLite lock contention / transaction leaks | Medium | Per-thread SQLite connections (`threading.local`); `timeout=5.0` on connect; WAL pragmas on each new connection; `_execute_with_retry` with exponential backoff (5 retries) + generation-based reconnect (per-thread only); explicit `conn.rollback()` on write failure; `cleanup_old_deltas` scoped to `workflow_id` | `tests/test_phase5_context_bus.py` (incl. parallel publish + workflow-scoped cleanup), `tests/test_remediation_p1_p2.py` ||
2626
| Federated sync state corruption on crash | Medium | `atomic_write_text` + file lock on `federated_sync_state.json`; lock on pending changes | `tests/test_federated_sync.py` | File-based multi-sig quorum still experimental |
2727
| JIT approval server race on approve/reject | Medium | `threading.Lock` on `_requests` / `_pending_events` | `tests/test_phase5_jit_approval_server.py` | Approve from thread without running event loop still drops SSE broadcast |
28-
| Asyncio event loop starvation from synchronous P2P approval polling | High | `collect_approval_signatures` is `async def` with `asyncio.sleep`; blocking I/O uses `run_in_executor`; `_collect_peer_signatures` dispatches via `run_coroutine_threadsafe` or `asyncio.run()` | `tests/test_federated_sync.py`, `tests/test_policy.py` (`MultiSigQuorumTests`) | Dedicated async poll unit test still optional |
28+
| Asyncio event loop starvation from synchronous P2P approval polling | High | `collect_approval_signatures` is `async def` with `asyncio.sleep`; blocking I/O uses `run_in_executor`; `_collect_peer_signatures` dispatches via `run_coroutine_threadsafe` or `asyncio.run()` | `tests/test_federated_sync.py` (`test_collect_approval_signatures_async_non_blocking`, quorum/dedup) | |
2929
| Shell normalization bypass via brace expansion / process substitution | High | Multi-pass `_normalize_shell_arg` now handles `{a,b}` expansion, `<()` process substitution, and non-string/non-list fallback | `tests/test_policy.py` | Catches `/pr{od,oduction}`, `<(echo /prod)`, dict-type command args |
3030
| Protected directory bypass via alternate write tools | High | `workspace_write_*` tool pattern + `.git*` argument pattern covers all write tools and subdirectory contents | `tests/test_policy.py`, `tests/test_file_policy.py` | Previously only `workspace_write_file` was covered |
3131
| Swarm hang / undetected thread deadlock | High | `ThreadPoolExecutor.as_completed(timeout=...)` with partial result collection; `Subagent` tracks `is_running`/`last_heartbeat`; `_heartbeat_monitor_loop` uses thread-ref liveness instead of defunct PID-based `is_process_alive`; heartbeat hangs merged into swarm `results` | `tests/test_swarm.py`, `tests/test_remediation_p1_p2.py` | Previously heartbeat monitor checked parent PID (always alive) — now detects actual thread hangs |
3232
| Git stash stack corruption in parallel sandboxes | Critical | `stash_save` returns actual stash reflog selector; `stash_pop` accepts specific ref | `tests/test_sandbox.py` | Previously hardcoded `stash@{0}` caused cross-agent stash confusion |
3333
| Workflow self-healing infinite recursion | High | `_execute_step` accepts `current_attempt` parameter preserved across recursive re-execution; abort guard checks attempts against max before proceeding | `tests/test_phase5_workflow_engine.py` | Previously `self_healing_attempts` reset to 0 on new `StepExecution` — now passed through recursion chain |
3434
| Workflow strict validation rollback never executed | High | `execute_workflow` integrates `UndoJournal` + `AuditLogger`; checks `result.requires_rollback` and calls `journal.restore()` on strict validation failure | `tests/test_remediation_p1_p2.py` (`WorkflowRollbackTests`), `tests/test_phase5_workflow_engine.py` | Previously `requires_rollback` flag set but never consumed — now triggers full workspace undo |
35+
| NFS / multi-writer JSONL corruption | High | Single-writer per workspace; `fcntl` + `atomic_write_text` on supported local FS; SQLite for Context Bus / OAuth | ADR 0008, `teaagent selftest` | Do not share `.teaagent/runs/*.jsonl` across concurrent hosts on NFS without DB migration |
3536

3637
## Trust Boundaries
3738

teaagent/federated_sync.py

Lines changed: 18 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@
1919
from typing import Any, Optional
2020

2121
from teaagent.graphqlite_store import GraphQLiteGraphStore
22+
from teaagent.security_env import federated_signature_token
2223
from teaagent.storage import atomic_write_text
2324

2425
logger = logging.getLogger(__name__)
@@ -534,21 +535,20 @@ async def collect_approval_signatures(
534535
seen_peers: set[str] = set()
535536
approvals_dir = self._root / '.teaagent' / 'pending_approvals'
536537

537-
if not approvals_dir.exists():
538-
return signatures
539-
540538
# Poll for incoming signature files using async sleep
541539
poll_interval = 0.1
542-
max_polls = int(timeout_seconds / poll_interval)
540+
max_polls = max(1, int(timeout_seconds / poll_interval))
543541
polls = 0
544542

545543
loop = asyncio.get_running_loop()
546544

547545
while polls < max_polls:
548-
sig_files = await loop.run_in_executor(
549-
None,
550-
lambda: list(approvals_dir.glob(f'{request_id}_signature_*.json')),
551-
)
546+
sig_files: list[Path] = []
547+
if approvals_dir.exists():
548+
sig_files = await loop.run_in_executor(
549+
None,
550+
lambda: list(approvals_dir.glob(f'{request_id}_signature_*.json')),
551+
)
552552

553553
for sig_file in sig_files:
554554
try:
@@ -557,6 +557,12 @@ async def collect_approval_signatures(
557557
functools.partial(sig_file.read_text, encoding='utf-8'),
558558
)
559559
data = json.loads(content)
560+
expected_token = federated_signature_token()
561+
if (
562+
expected_token is not None
563+
and data.get('auth_token') != expected_token
564+
):
565+
continue
560566
peer_id = data['peer_id']
561567
if peer_id in seen_peers:
562568
continue
@@ -609,13 +615,16 @@ def submit_approval_signature(
609615
)
610616
sig_path.parent.mkdir(parents=True, exist_ok=True)
611617

612-
data = {
618+
data: dict[str, Any] = {
613619
'request_id': request_id,
614620
'peer_id': peer_id,
615621
'signature': signature,
616622
'ssh_key_id': ssh_key_id,
617623
'timestamp': time.time(),
618624
}
625+
token = federated_signature_token()
626+
if token is not None:
627+
data['auth_token'] = token
619628

620629
sig_path.write_text(json.dumps(data, indent=2), encoding='utf-8')
621630
return True

teaagent/security_env.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,3 +22,13 @@ def strict_local_services() -> bool:
2222
def plugins_strict_audit() -> bool:
2323
"""Fail closed on unverified plugin entry points when set."""
2424
return _env_truthy('TEAAGENT_PLUGINS_STRICT')
25+
26+
27+
def federated_signature_token() -> str | None:
28+
"""Optional shared secret for file-based P2P approval signature files.
29+
30+
When set, ``submit_approval_signature`` embeds the token and
31+
``collect_approval_signatures`` ignores files with a missing or wrong token.
32+
"""
33+
value = os.environ.get('TEAAGENT_FEDERATED_SIGNATURE_TOKEN', '').strip()
34+
return value or None

teaagent/selftest.py

Lines changed: 35 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22

33
from __future__ import annotations
44

5+
from importlib.metadata import PackageNotFoundError, version
56
from pathlib import Path
67
from typing import Any
78

@@ -12,6 +13,37 @@
1213
from teaagent.workspace_tools import build_workspace_tool_registry
1314

1415

16+
def _jaraco_context_version_ok() -> dict[str, Any]:
17+
"""CVE-2026-23949: jaraco.context < 6.1.0 has Zip Slip in tarball()."""
18+
try:
19+
installed = version('jaraco.context')
20+
except PackageNotFoundError:
21+
return {
22+
'ok': True,
23+
'skipped': True,
24+
'detail': 'jaraco.context not installed',
25+
}
26+
27+
parts = []
28+
for segment in installed.split('.')[:3]:
29+
segment = segment.split('+', 1)[0]
30+
if not segment.isdigit():
31+
return {
32+
'ok': False,
33+
'skipped': False,
34+
'detail': f'unparseable jaraco.context version {installed!r}',
35+
}
36+
parts.append(int(segment))
37+
while len(parts) < 3:
38+
parts.append(0)
39+
ok = tuple(parts) >= (6, 1, 0)
40+
return {
41+
'ok': ok,
42+
'skipped': False,
43+
'detail': f'jaraco.context=={installed}',
44+
}
45+
46+
1547
def run_security_selftest(root: str | Path = '.') -> dict[str, Any]:
1648
"""Run governance security checks without invoking pytest."""
1749
workspace = Path(root).resolve()
@@ -54,8 +86,9 @@ def run_security_selftest(root: str | Path = '.') -> dict[str, Any]:
5486
},
5587
]
5688
audit_report = check_audit_completeness(sample_events)
89+
jaraco_report = _jaraco_context_version_ok()
5790

58-
ok = not lint_errors and permission_ok and audit_report.ok
91+
ok = not lint_errors and permission_ok and audit_report.ok and jaraco_report['ok']
5992
return {
6093
'ok': ok,
6194
'tool_lint': {
@@ -76,4 +109,5 @@ def run_security_selftest(root: str | Path = '.') -> dict[str, Any]:
76109
'ok': audit_report.ok,
77110
'issues': audit_report.issues,
78111
},
112+
'jaraco_context': jaraco_report,
79113
}

0 commit comments

Comments
 (0)