Skip to content

Commit 4ff2a0a

Browse files
authored
docs(docs): add in-repo testing guide (#134)
## Summary - add a repo-local `TESTING.md` derived from the docs source so testing guidance is available in-repo - update `AGENTS.md` to reference the local testing guide instead of the external docs site - fold in evaluator-specific testing expectations so the local guide stands on its own ## Why The current `AGENTS.md` points to an external webpage for core testing conventions. This makes important contributor guidance harder to use offline and less visible during local code work. ## Verification - [x] Reviewed the extracted content against `../docs-agent-control/testing.mdx` - [x] Ran `git diff --check` - [ ] Ran broader repo checks (not needed for this docs-only change)
1 parent 8b6e3f5 commit 4ff2a0a

2 files changed

Lines changed: 159 additions & 1 deletion

File tree

AGENTS.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ Server API endpoints, engine evaluation, and SDK evaluation are latency-sensitiv
5656

5757
## Testing conventions
5858

59-
All testing guidance (including behavior changes require tests) lives in [Agent Control Testing Guide](https://docs.agentcontrol.dev/testing).
59+
All testing guidance (including "behavior changes require tests") lives in `TESTING.md`.
6060

6161
## Common change map
6262

TESTING.md

Lines changed: 158 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,158 @@
1+
# Testing Guide
2+
3+
This file is the in-repo copy of the testing guidance used by the Agent Control docs site. Keep it in sync with the published testing guide when testing conventions change.
4+
5+
## Goals
6+
7+
- Make tests readable and reviewable.
8+
- Prefer verifying behavior through the public contract over internal details.
9+
- Keep the suite reliable: deterministic, minimal flake, clear failures.
10+
- Any behavior change should include a test change. Pure refactors only require test changes if behavior changes.
11+
12+
## What "public contract" means
13+
14+
Public contract is anything exposed to end users and not an internal implementation detail.
15+
16+
Practical mapping in this repo:
17+
18+
- **Server**: HTTP endpoints, request/response schemas, and documented behavior.
19+
- **SDK**: symbols exported from `sdks/python/src/agent_control/__init__.py` and documented behavior.
20+
- **Models**: Pydantic models, fields, validation, and serialization in `models/src/agent_control_models/`.
21+
- **Engine**: stable entrypoints for evaluation behavior. Avoid asserting on private helpers or module structure.
22+
23+
## Prefer testing via public contract
24+
25+
Choose the narrowest user-facing interface that can express the scenario:
26+
27+
1. Server behavior: drive via HTTP endpoints, creating setup state through the API where feasible.
28+
2. SDK behavior: drive via exported SDK APIs.
29+
3. Engine behavior: drive via the engine's stable entrypoints.
30+
4. Only if needed: test internal helpers for hard-to-reach edge cases or performance-sensitive parsing or validation.
31+
32+
Why this rule exists:
33+
34+
- Contract tests survive refactors.
35+
- They catch integration mismatches between models, server, and SDK layers.
36+
- They better reflect how users experience failures.
37+
38+
It is acceptable to use internals when:
39+
40+
- The public route to set up state is disproportionately slow or complex.
41+
- You need to force an otherwise unreachable error path.
42+
- You are testing a pure function where the public API adds no value.
43+
44+
If you use internals, say so explicitly in the test's `# Given:` block. Example: `# Given: seeded DB row directly for speed`.
45+
46+
## Given / When / Then style
47+
48+
Use `# Given`, `# When`, and `# Then` comments to separate intent from mechanics.
49+
50+
Guidelines:
51+
52+
- **Given**: inputs, state, fixtures, mocks, and preconditions.
53+
- **When**: the single action under test.
54+
- **Then**: assertions about outcomes, errors, and side effects.
55+
- Prefer one `When` per test. Split tests unless multiple actions are inseparable.
56+
- Keep comments short and specific.
57+
58+
### Example: unit-level validation
59+
60+
```python
61+
def test_scope_rejects_invalid_step_name_regex() -> None:
62+
# Given: a scope with an invalid regex
63+
scope = {"step_name_regex": "("}
64+
65+
# When: constructing the model
66+
with pytest.raises(ValueError):
67+
ControlScope.model_validate(scope)
68+
69+
# Then: a clear validation error is raised
70+
```
71+
72+
### Example: API-level behavior
73+
74+
```python
75+
def test_create_control_returns_id(client: TestClient) -> None:
76+
# Given: a valid control payload
77+
payload = {"name": "pii-protection"}
78+
79+
# When: creating the control via the public API
80+
response = client.put("/api/v1/controls", json=payload)
81+
82+
# Then: the response contains the control id
83+
assert response.status_code == 200
84+
assert "control_id" in response.json()
85+
```
86+
87+
### Example: SDK-level behavior
88+
89+
```python
90+
async def test_sdk_denies_on_local_control() -> None:
91+
# Given: an SDK client and a local deny control
92+
client = AgentControlClient(base_url="http://localhost:8000")
93+
controls = [{"execution": "sdk", "action": {"decision": "deny"}, ...}]
94+
95+
# When: evaluating via the SDK public API
96+
result = await check_evaluation_with_local(
97+
client=client,
98+
agent_name="demo-agent",
99+
step=Step(type="tool", name="db_query", input={"sql": "SELECT 1"}, output=None),
100+
stage="pre",
101+
controls=controls,
102+
)
103+
104+
# Then: the evaluation is unsafe
105+
assert result.is_safe is False
106+
```
107+
108+
## Setup guidance
109+
110+
- Prefer creating records via public endpoints rather than writing DB rows directly.
111+
- Prefer invoking behavior via public entrypoints.
112+
- Avoid asserting on internal or private fields unless they are part of the contract.
113+
114+
Specific guidance:
115+
116+
- **Server**: use HTTP endpoints when practical. The service layer is internal.
117+
- **SDK**: use symbols exported from `sdks/python/src/agent_control/__init__.py`.
118+
- **Database seeding**: direct row insertion is acceptable for migration tests, otherwise prefer public setup flows.
119+
120+
## Evaluator-specific expectations
121+
122+
When adding or changing evaluators, tests should cover at least these three cases:
123+
124+
1. Null or empty input: returns `matched=False` and no error.
125+
2. Normal evaluation: returns the correct `matched` result for the configured threshold or predicate.
126+
3. Infrastructure failure: returns `matched=False` with `error` set, unless the evaluator intentionally uses a different documented error policy.
127+
128+
Additional evaluator rules worth testing when relevant:
129+
130+
- `error` is for infrastructure failures, not normal evaluation outcomes.
131+
- Evaluators are reused across concurrent requests, so avoid request-scoped state on `self`.
132+
- Pre-compiled patterns, timeout handling, and async boundaries should be covered when they are part of the evaluator behavior.
133+
134+
## Running tests
135+
136+
Prefer Makefile targets when available:
137+
138+
- All tests: `make test`
139+
- All checks: `make check`
140+
- Server tests: `make server-test`
141+
- Engine tests: `make engine-test`
142+
- SDK tests: `make sdk-test`
143+
144+
If there is no Makefile target for the task, run the underlying command directly.
145+
146+
Package-specific notes:
147+
148+
- Server tests use the configured test database in `server/Makefile`.
149+
- SDK tests start a local server and wait on `/health`.
150+
- Models tests currently run directly from the `models/` package.
151+
152+
## Practical defaults
153+
154+
- New behavior should come with a focused test.
155+
- Bug fixes should include a regression test when practical.
156+
- Prefer small, specific test fixtures over broad shared setup.
157+
- Keep tests deterministic. Avoid timing-sensitive assertions and unnecessary sleeps.
158+
- When changing shared contracts in `models/`, expect corresponding server and SDK test updates.

0 commit comments

Comments
 (0)