System Transparency Engineering Plan - 2026-05-31

Supersession note, 2026-06-05: This file is historical evidence from the Phase 0 transparency pass. The transparency items were absorbed into P0-B (cost/budget truth), P0-D (root/approval truth), P1-B (run evidence), and P2-B (docs as control plane). For current status, use docs/daily-driver-current-status.md and docs/governance/documentation-operating-model-2026-06-04.md.

This plan turns the 2026-05-31 transparency risk audit into reviewable work. The goal is to make TeaAgent easier to inspect, safer to operate, and harder to misrepresent through stale claims.

North Star

TeaAgent should be able to answer four operator questions with evidence:

What can this run do?
Why was this action allowed or denied?
What evidence proves the stated maturity claim?
Which risks are open, accepted, mitigated, or release-blocking?

Non-Goals

Do not add a second agent framework.
Do not add new dependencies unless a task explicitly earns that decision.
Do not make marketing claims that are not backed by acceptance tests or documented operational constraints.
Do not treat local developer convenience as the release default.

Engineering Principles

Prefer existing registry, policy, audit, and acceptance patterns.
Convert hidden state into explicit state, scoped by run, root, or profile.
Make every trust boundary visible in code, docs, and tests.
Keep release claims generated from or checked against evidence.
Fix small consistency failures immediately; use ADRs for architectural choices.

Acceptance Criteria

ID	Acceptance Criterion	Evidence
AC-001	`ToolRegistry.execute` applies pre-hook argument mutations and post-hook result mutations.	New acceptance or unit test fails before fix and passes after.
AC-002	Hook vetoes, mutations, and post-processing decisions are auditable without leaking secrets.	Audit fixture includes hook lifecycle records.
AC-003	Code-analysis graph state is scoped by workspace root or run ID.	Two-root isolation test.
AC-004	Stateful non-file tools are explicitly annotated or capability-gated.	Tool lint rule or dedicated registry test.
AC-005	`cx` and `qmd` backend calls have bounded timeouts.	Timeout fixture with fake slow executable.
AC-006	`workspace_code_parse` validates action-specific fields before execution.	Missing-field tests return classified actionable errors.
AC-007	Remote MCP tools without trusted annotations are conservative by default.	Remote MCP fixture with mutation-like tool name and missing hints.
AC-008	Plugin strict mode is default for CI/release profiles.	Profile test blocks unknown plugin source.
AC-009	Audit L3 privacy wording matches implementation, or encryption is implemented.	Doc/code consistency test.
AC-010	Release maturity status has one canonical source.	Validator checks README, maturity matrix, and package metadata.
AC-011	Docs consistency and competitive docs refresh pass locally and in CI.	`validate_docs_consistency.py` and refresh `--check`.
AC-012	Tool lint warnings are either zero or tracked in a warning budget.	Tool lint report plus warning-budget check.
AC-013	Public claims map to acceptance tests or explicit known gaps.	Claim-to-evidence matrix.
AC-014	Risk register includes owner, status, due date, and release-blocking flag.	Machine-readable risk register or validated Markdown table.
AC-015	Managed, CI, and local profiles have separate documented gates.	Verification profile table and CI jobs.
AC-016	Multi-writer audit storage support is documented or tested.	ADR plus stress/probe test if supported.
AC-017	Policy denials include operator-readable reason codes.	Policy acceptance tests assert reason codes.
AC-018	External tooling diagnostics explain `cx` database permission failures.	CLI diagnostic test and docs.
AC-019	Acceptance tiers are mapped to risk severity.	Risk-to-test matrix links P0/P1/P2 coverage.
AC-020	Release candidate verification produces an evidence bundle.	Script output includes commands, versions, test counts, and artifact paths.

Test Matrix

Tier	Purpose	Command
P0	Fast governance smoke: policy, approvals, audit, destructive guardrails.	`python3 scripts/run_acceptance_tier.py --tier p0`
P1	Main-branch safety: common workflows plus governance regressions.	`python3 scripts/run_acceptance_tier.py --tier p1`
P2 / all	Release confidence across documented acceptance stories.	`python3 scripts/run_acceptance_tier.py --tier all`
Docs	Verify public claims and generated competitive docs.	`python3 scripts/refresh_competitive_docs.py --check && python3 scripts/validate_docs_consistency.py`
Tool governance	Verify tool metadata and registry invariants.	`teaagent tool lint --root .`
Collection	Verify acceptance discovery count and import health.	`python3 -m pytest tests/acceptance --collect-only -q`

Verification Profiles

Profile	When	Required Gates
Local edit	Before finishing a narrow change.	Targeted tests, docs check if docs changed, tool lint if tools changed.
Pull request	Every PR.	P0, docs checks, targeted tests, tool lint.
Main branch	After merge or scheduled run.	P0, P1, docs checks, tool lint.
Release candidate	Before publishing.	P0, P1, all acceptance, docs checks, tool lint, risk-register review, evidence bundle.
Managed runtime	Before enabling hosted or shared use.	Release candidate gates plus strict plugin profile, remote MCP trust review, audit storage review.

Task Plan

ID	Task	Risk Covered	Acceptance	Tests
TASK-001	Fix `docs/use-cases.md` survey marker so docs validators pass.	RSK-002	AC-011	Docs checks.
TASK-002	Add a failing integration test for ToolRegistry pre-hook argument mutation.	RSK-001	AC-001	New hook registry integration test.
TASK-003	Add a failing integration test for ToolRegistry post-hook result mutation.	RSK-001	AC-001	New hook registry integration test.
TASK-004	Wire `run_pre_hooks` and `run_post_hooks` return values through `ToolRegistry.execute`.	RSK-001	AC-001	Hook integration tests plus existing acceptance.
TASK-005	Add audit fields for hook veto, mutation, and post-processing.	RSK-001, RSK-015	AC-002	Audit lifecycle fixture.
TASK-006	Scope code-analysis graph state by root or run ID.	RSK-003	AC-003	Two-root graph isolation test.
TASK-007	Decide whether stateful in-memory graph ingestion is destructive, stateful, or capability-gated.	RSK-004	AC-004	Tool annotation test.
TASK-008	Add a registry lint rule for stateful non-idempotent tools without explicit governance.	RSK-004, RSK-010	AC-004, AC-012	Tool lint tests.
TASK-009	Add per-action validation for `workspace_code_parse`.	RSK-006	AC-006	Missing `name`, `path`, `line`, and `symbol` tests.
TASK-010	Add timeout config to external backend invocations.	RSK-005	AC-005	Fake slow executable test.
TASK-011	Classify backend timeouts as actionable tool errors.	RSK-005	AC-005	Timeout error assertion.
TASK-012	Document `cx` local database requirements and sandbox failure mode.	RSK-020	AC-018	Diagnostic command docs check.
TASK-013	Add a `teaagent code-analysis doctor` or equivalent diagnostic path.	RSK-020	AC-018	CLI diagnostic test.
TASK-014	Add conservative default policy for unannotated remote MCP tools.	RSK-007	AC-007	Remote MCP fixture test.
TASK-015	Add trust profile config for remote MCP servers.	RSK-007, RSK-016	AC-007	Policy and config tests.
TASK-016	Require explicit manifest coverage for remote tools in strict profiles.	RSK-016	AC-007	Manifest coverage report test.
TASK-017	Decide audit L3 path: implement encryption or rename the privacy level.	RSK-008	AC-009	ADR plus doc/code consistency test.
TASK-018	Add audit storage stress/probe for multi-writer behavior.	RSK-012	AC-016	Multi-process writer test or documented unsupported fixture.
TASK-019	Make plugin strict mode default in CI and release profiles.	RSK-009	AC-008	Strict profile plugin test.
TASK-020	Add release-channel status source file.	RSK-011	AC-010	Metadata validator.
TASK-021	Validate README, maturity matrix, and package classifier against release status.	RSK-011	AC-010	Docs metadata test.
TASK-022	Add a warning-budget file for tool lint warnings.	RSK-010	AC-012	Tool lint budget check.
TASK-023	Convert remaining tool lint warnings into fixes or tracked exceptions.	RSK-010	AC-012	`teaagent tool lint --root .`.
TASK-024	Create a claim-to-evidence matrix for public safety and maturity claims.	RSK-013, RSK-019	AC-013	Docs validator.
TASK-025	Link each risk register row to owner, status, due date, and release-blocking flag.	RSK-018	AC-014	Risk-register validation.
TASK-026	Map P0/P1/P2 acceptance tests to risk severity.	RSK-013	AC-019	Risk-to-test matrix check.
TASK-027	Add release evidence bundle script.	RSK-017	AC-020	Script smoke test.
TASK-028	Store command versions and test counts in the evidence bundle.	RSK-017	AC-020	Snapshot test.
TASK-029	Add CI job or scheduled workflow for full acceptance tier.	RSK-017	AC-015	CI config check.
TASK-030	Add ADR for managed-runtime trust boundaries.	RSK-007, RSK-009, RSK-012	AC-015	ADR presence check.
TASK-031	Add policy denial reason codes to audit events.	RSK-015	AC-017	Policy denial acceptance test.
TASK-032	Add operator-facing "why denied" explain command or output path.	RSK-015	AC-017	CLI acceptance test.
TASK-033	Add docs generator guardrails for date markers and generated-matrix links.	RSK-014	AC-011	Generator unit test.
TASK-034	Add docs pre-commit or local verification target.	RSK-002, RSK-014	AC-011	Make/script smoke test.
TASK-035	Run release-candidate verification and publish residual risks.	All	AC-020	Full profile gates.

Implementation status (2026-05-31)

ID	Status
TASK-001	Done (docs validators green)
TASK-002–004	Done (`ToolRegistry.execute` + `tests/test_hooks.py`)
TASK-005	Open (audit hook mutation fields)
TASK-006–008	Done (per-root graph, `stateful`, lint)
TASK-009–011	Done (code_parse validation, backend timeouts)
TASK-012–013	Open (cx doctor CLI)
TASK-014–016	Open (remote MCP trust defaults)
TASK-017	Done Phase A (L3 docfix; encryption optional)
TASK-018–021	Open
TASK-022	Open (warning budget file)
TASK-023	Done (tool lint warnings at 0)
TASK-024–026	Open
TASK-027–028	Done (evidence bundle MVP; counts-only profile)
TASK-029–030	Open
TASK-031–032	Done (denial `reason_code`, `approval why-denied`)
TASK-033	Done (docs generator guardrails)
TASK-034	Done (`scripts/verify_docs.sh`)
TASK-035	Open

Suggested Sequencing

Phase 0 - Restore green documentation gates

TASK-001
TASK-033
TASK-034

Exit criteria: docs consistency and competitive docs refresh pass locally.

Phase 1 - Close high-risk execution contract gaps

TASK-002
TASK-003
TASK-004
TASK-005
TASK-009
TASK-010
TASK-011

Exit criteria: hook behavior and backend failures are covered by tests and classified errors.

Phase 2 - Make hidden state and trust boundaries explicit

TASK-006
TASK-007
TASK-008
TASK-014
TASK-015
TASK-016
TASK-031
TASK-032

Exit criteria: stateful tools, remote MCP tools, and policy denials have visible governance controls.

Phase 3 - Align claims, profiles, and release evidence

TASK-017
TASK-019
TASK-020
TASK-021
TASK-022
TASK-023
TASK-024
TASK-025
TASK-026
TASK-027
TASK-028
TASK-029
TASK-030
TASK-035

Exit criteria: release claims have evidence, release profiles are explicit, and accepted risks are visible.

Risk-to-Test Backlog

Risk	New Test Needed
RSK-001	`test_tool_registry_applies_hook_mutations`.
RSK-003	`test_code_graph_is_scoped_by_workspace_root`.
RSK-004	`test_stateful_non_idempotent_tools_require_governance_annotation`.
RSK-005	`test_external_backend_timeout_is_classified`.
RSK-006	`test_workspace_code_parse_missing_action_args_are_actionable`.
RSK-007	`test_remote_mcp_unannotated_mutation_requires_trust_profile`.
RSK-008	`test_audit_level_privacy_claim_matches_behavior`.
RSK-009	`test_release_profile_enables_plugin_strict_mode`.
RSK-010	`test_tool_lint_warning_budget_is_enforced`.
RSK-011	`test_release_status_metadata_is_consistent`.
RSK-012	`test_audit_writer_concurrency_contract`.
RSK-013	`test_public_claims_have_acceptance_evidence`.
RSK-014	`test_generated_docs_markers_are_canonical`.
RSK-015	`test_policy_denial_audit_includes_reason_code`.
RSK-016	`test_remote_tool_manifest_coverage_report`.
RSK-017	`test_release_profile_contains_required_gates`.
RSK-018	`test_risk_register_rows_have_owner_status_due_date`.
RSK-019	`test_claim_to_evidence_matrix_has_no_unowned_claims`.
RSK-020	`test_code_analysis_doctor_reports_cx_database_issue`.

Definition of Done

A task is done only when:

The code or docs change is narrow and linked to a risk ID.
Acceptance criteria are updated or already cover the change.
Relevant tests pass locally.
New residual risks are added to the risk register.
Public docs avoid claims that the tests do not support.

Open Decisions

Decision	Options	Recommended Path
Audit L3 privacy	Implement encryption, or reword level.	Reword immediately, then evaluate encryption as a separate ADR.
Stateful graph governance	Mark destructive, add stateful annotation, or require capability gate.	Add explicit stateful annotation and root/run scoping.
Remote MCP unknown tools	Allow by policy mode, deny by default, or require manifest.	Require trust profile or manifest for mutation-capable unknowns.
Release maturity status	Keep Alpha classifier, promote classifier, or split package/product status.	Create canonical release status file before changing classifier.
Full acceptance cadence	Every PR, main only, nightly, release only.	P0 on PR, P1 on main/protected, all nightly and release.

Immediate Next PR Candidate

The smallest high-value follow-up PR should include:

Hook registry integration tests.
ToolRegistry.execute hook return-value wiring.
Docs consistency regression for the survey marker.
Backend action-specific validation tests for missing arguments.

This PR would close the most concrete correctness gap while keeping the change small enough for focused review.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

System Transparency Engineering Plan - 2026-05-31

North Star

Non-Goals

Engineering Principles

Acceptance Criteria

Test Matrix

Verification Profiles

Task Plan

Implementation status (2026-05-31)

Suggested Sequencing

Phase 0 - Restore green documentation gates

Phase 1 - Close high-risk execution contract gaps

Phase 2 - Make hidden state and trust boundaries explicit

Phase 3 - Align claims, profiles, and release evidence

Risk-to-Test Backlog

Definition of Done

Open Decisions

Immediate Next PR Candidate

Uh oh!

FilesExpand file tree

system-transparency-engineering-plan-2026-05-31.md

Latest commit

History

system-transparency-engineering-plan-2026-05-31.md

File metadata and controls

System Transparency Engineering Plan - 2026-05-31

North Star

Non-Goals

Engineering Principles

Acceptance Criteria

Test Matrix

Verification Profiles

Task Plan

Implementation status (2026-05-31)

Suggested Sequencing

Phase 0 - Restore green documentation gates

Phase 1 - Close high-risk execution contract gaps

Phase 2 - Make hidden state and trust boundaries explicit

Phase 3 - Align claims, profiles, and release evidence

Risk-to-Test Backlog

Definition of Done

Open Decisions

Immediate Next PR Candidate