TeaAgent is a governance-first agent harness. Every architectural decision subordinates raw capability to auditability, confinement, and human oversight. The harness is intentionally thin: orchestration, tool governance, state boundaries, audit, and validation belong here; domain reasoning belongs in the model or reviewed skills.
The project's classification in pyproject.toml is Development Status :: 3 - Alpha. Breaking changes are expected; stability guarantees are per-ADR.
Tool permissions are declared at registration, not inferred at runtime. Approval gates are hard code paths, not soft suggestions. Every destructive action must be explicitly approved or pre-authorised; no tool silently assumes permission.
Every tool call, approval decision, iteration, and final result is written to a JSONL audit log (mode 0600, fcntl.LOCK_EX, fsync after each append) before the next action proceeds. Audit records are append-only and cannot be modified by normal harness operation.
Workspace tools reject ../ escapes, absolute paths, and symlink escapes unconditionally — not as a configurable option. Shell commands are classified inspect or mutate before execution; inspect commands run with shell=False.
An unregistered tool must be denied, not allowed. Unknown policy combinations default to denial. When a classifier cannot determine the safety of an input, it must treat the input as unsafe.
The harness provides primitives: registry, budget, approval, audit, sandbox. It does not encode task-domain knowledge. If a change to the harness requires reasoning about what a specific task does, the change is probably in the wrong layer.
Security properties must be testable, not aspirational. Every security invariant defined in SECURITY.md has corresponding property tests (see tests/test_workspace_tools.py:ShellClassifierPropertyTests).
| Standard | Threshold | Enforced by |
|---|---|---|
| Test coverage (unit) | ≥ 75% lines | pytest --cov-fail-under=75 (CI test job) |
| Skill test coverage | ≥ 80% lines | docs/skill-governance.md manual gate |
| Type coverage | All production code annotated | mypy --disallow-untyped-defs (CI lint job) |
| Lint | Zero errors | ruff check . (CI lint job) |
| Format | Canonical | ruff format --check . (CI lint job) |
| Dependency CVEs | Zero known CVEs in base PR gate; optional-extra CVEs governed separately | pip-audit lanes in CI security job |
| Acceptance suite | Count matches docs/acceptance.md |
acceptance-p0 / acceptance-p1 CI jobs |
| Docs consistency | Scripts pass | validate_docs_consistency.py (CI use-case-matrix job) |
Thresholds are hard failures in CI. There are no exceptions for in-flight work — raise coverage or fix lint before merging.
TeaAgent uses Semantic Versioning 2.0 (MAJOR.MINOR.PATCH):
PATCH— backwards-compatible bug fixes that do not change tool schemas, audit event shapes, or permission semantics.MINOR— backwards-compatible additions: new tools, new optional config keys, new audit event types, new permission modes.MAJOR— breaking changes to the public API, tool schema, audit format, or permission model. Requires an ADR.
The version field in pyproject.toml is the single source of truth. Tags are v{MAJOR}.{MINOR}.{PATCH}.
Significant decisions — tool governance changes, audit semantics, OAuth, MCP transport, Code Mode isolation, async patterns — are recorded as ADRs in docs/adr/. The current range is ADR 0001–0020. When a PR changes one of these domains, it must either cite an existing ADR or create a new one.
Format: docs/adr/NNNN-short-title.md with Status, Decision, Rationale, and Implementation sections.
- Python ≥ 3.10 (tested on 3.10, 3.11, 3.12)
- No runtime dependencies by default; optional extras declared in
pyproject.toml py.typedmarker is present — downstream consumers can rely on type information
Every production module starts with:
from __future__ import annotationsThis enables postponed evaluation of annotations (PEP 563) and is required for forward-reference type hints on Python 3.10.
- Quotes: single (
'), not double. Exception: docstrings use triple-double ("""). - Indent: 4 spaces, no tabs.
- Line length: soft limit of 88 characters;
E501(line too long) is suppressed in ruff — prefer shorter lines but don't break for it. - Import sorting: isort-compatible (
Iruff rules). Standard library, then third-party, then local.
| Kind | Convention | Example |
|---|---|---|
| Module | snake_case |
audit.py, tool_call_context.py |
| Class | PascalCase |
ApprovalPolicy, AuditLogger |
| Function / method | snake_case |
check_tool_access, resolve_workspace_path |
| Private function / method | _snake_case |
_apply_audit_level, _validate_relay_url |
| Constant (module-level) | UPPER_SNAKE_CASE |
AUDIT_FILE_MODE, MAX_AUDIT_STRING_LENGTH |
| Type alias | PascalCase |
AuditLevel, PermissionMode |
| Test function | test_<unit>_<scenario>_<outcome> |
test_check_tool_access_unregistered_tool_denied |
All production code (everything under teaagent/, excluding paths in [tool.coverage.run] omit) must have complete type annotations enforced by:
mypy --disallow-untyped-defs --disallow-incomplete-defs --check-untyped-defs
- Use
Optional[X](not bareX | None) for Python 3.10 compatibility, or usefrom __future__ import annotationsand writeX | None. - Use
from __future__ import annotationsto avoid circular import issues with type hints. - Prefer
from collections.abc import Callable, Iteratorovertyping.Callable,typing.Iterator. - Use
TYPE_CHECKINGguard for imports needed only for type hints. - Do not add
# type: ignorewithout an explanatory comment on the same line.
Use @dataclass(frozen=True) for value objects and policy objects. Use @dataclass (mutable) only when the object has meaningful lifecycle state. Use field(default_factory=...) for mutable default values.
Every module that emits log output declares:
logger = logging.getLogger(__name__)Use logger.debug/info/warning/error/exception — never print() in production code. Do not log credential values; the audit logger's redaction applies to audit events, not Python logger output.
- Default: no comments. Add a comment only when the why is non-obvious (hidden constraint, workaround for a specific bug, invariant that would surprise a reader).
- Do not comment what the code does — well-named identifiers do that.
- Do not reference the current task, ticket number, or caller in comments — those belong in the commit message or PR description.
- Public classes should have a one-line docstring stating the class's invariant or purpose. Public methods need a docstring only if their contract is non-obvious from the signature.
- Never write multi-paragraph docstrings for internal helpers.
- Raise
AgentHarnessErrorsubclasses (defined inteaagent/errors.py) for harness-layer errors. Do not raise bareExceptionorRuntimeErrorfrom governance code. - Do not silently swallow exceptions in governance, audit, or approval code paths.
- Only validate at system boundaries (user input, LLM output, external API responses). Trust internal invariants — do not add defensive checks for states that cannot occur.
| Change type | Required doc update |
|---|---|
| New CLI command or flag | docs/cli.md |
| New audit event type or shape change | docs/audit-events.md |
| New permission mode or approval semantic | SECURITY.md + relevant ADR |
| New acceptance flow test | docs/acceptance.md (count + description) |
| New optional extra / dependency | docs/USAGE.md or README.md extras table |
| Breaking API change | CHANGELOG.md ### Breaking section + migration note |
| New ADR | docs/adr/NNNN-short-title.md |
- Bug fixes with no observable API change.
- Internal refactors with no user-visible effect.
- Test additions that don't introduce new acceptance flows.
Documentation must describe the code as it currently works, not as it was planned to work. If you change behaviour, update the relevant doc in the same PR. Stale docs are treated as bugs.
Docs with Last reviewed: YYYY-MM-DD headers must be updated when the content they describe changes. The scripts/validate_docs_consistency.py script checks a subset of these.
- Markdown, GitHub-flavoured (
*.md). - One blank line between sections.
- Use fenced code blocks with language tags (
python,bash,json). - Tables for structured comparisons (permission matrices, job listings, etc.).
- No trailing whitespace. No "smart" quotes — use ASCII
'and". - ADR format:
Status,Decision,Rationale,Implementationsections, in that order.