Skip to content

Commit 843c253

Browse files
authored
chore: src/ README audit gate + 5 package READMEs (#126, #152) (#71)
1 parent caf93d3 commit 843c253

11 files changed

Lines changed: 255 additions & 2 deletions

File tree

.github/branch-protection/develop.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212
"Version bump check",
1313
"Action pinning audit",
1414
"Tests required",
15+
"src/ README audit",
1516
"Frontend Build",
1617
"Frontend Quality",
1718
"Branch-protection contexts sync",

.github/branch-protection/main.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212
"Version bump check",
1313
"Action pinning audit",
1414
"Tests required",
15+
"src/ README audit",
1516
"Frontend Build",
1617
"Frontend Quality",
1718
"Branch-protection contexts sync",
Lines changed: 142 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,142 @@
1+
#!/usr/bin/env python3
2+
"""Enforce the per-package README rule from `CLAUDE.md`.
3+
4+
`CLAUDE.md` *Code standards*: *"Each `src/` directory has a README
5+
explaining its purpose and key interfaces."* As `src/` grows, READMEs
6+
go missing silently — and even when one exists, it can degrade into
7+
unstructured prose that no contributor reads. This script audits both
8+
shape and substance.
9+
10+
Behaviour:
11+
12+
- Walks every subdirectory under `src/` (recursive, skipping
13+
`__pycache__`).
14+
- A subdirectory must have a `README.md` when it contains at least one
15+
`.py` file other than `__init__.py`. Empty directories and
16+
`__init__.py`-only namespace packages are exempt — they have no
17+
surface to document.
18+
- The README must be non-trivial: at least `MIN_BYTES = 200` bytes
19+
after stripping leading/trailing whitespace. Catches the empty-stub
20+
failure mode (#126).
21+
- The README must contain a `## Key interfaces` heading (or its
22+
documented synonym `## Public surface`). #152 promoted the gate
23+
from presence + size to presence + structure: a 200-byte unrelated
24+
paragraph passed pre-#152, but doesn't actually document the
25+
package's public surface. The Purpose statement is encoded
26+
positionally as the H1 heading + immediately-following paragraph
27+
rather than as an explicit `## Purpose` heading — that's the
28+
existing convention across all seven packages and the natural
29+
markdown shape; a separate `## Purpose` heading would duplicate
30+
what the H1 already establishes.
31+
32+
There is **no exemption mechanism**, mirroring `check_file_length.py`
33+
(see `feedback_no_noqa`). If a future package legitimately needs a
34+
shorter README (e.g. a single-file helper with a self-explanatory
35+
filename), restructure rather than carve out an allowlist entry.
36+
37+
Exit codes:
38+
0 — every package with code has a non-trivial, structured README
39+
1 — at least one package is missing a README, has a stub one, or
40+
lacks the required heading
41+
2 — `src/` does not exist (run from the wrong directory?)
42+
43+
Usage (from repo root):
44+
45+
python .github/scripts/check_src_readmes.py
46+
"""
47+
48+
from __future__ import annotations
49+
50+
import re
51+
import sys
52+
from pathlib import Path
53+
54+
SRC_ROOT = Path("src")
55+
MIN_BYTES = 200
56+
57+
# Required heading shapes. Each tuple is "any-of" — a README satisfies the
58+
# rule if it contains at least one matching heading. Match is anchored to
59+
# line start, case-insensitive, allows `#` levels 2-4 so a deeply nested
60+
# subsection still counts.
61+
KEY_INTERFACES_HEADING = re.compile(
62+
r"^#{2,4}\s+(Key interfaces|Public surface)\b",
63+
re.IGNORECASE | re.MULTILINE,
64+
)
65+
66+
67+
def _normalised(path: Path) -> str:
68+
return path.as_posix()
69+
70+
71+
def _has_documentable_code(directory: Path) -> bool:
72+
"""True when the directory has at least one `.py` file beyond `__init__.py`."""
73+
for entry in directory.iterdir():
74+
if entry.is_file() and entry.suffix == ".py" and entry.name != "__init__.py":
75+
return True
76+
return False
77+
78+
79+
def _readme_failure(directory: Path) -> str | None:
80+
"""Return an error message if the directory's README is missing or stub-sized."""
81+
readme = directory / "README.md"
82+
if not readme.is_file():
83+
return (
84+
f"::error file={_normalised(directory)}::missing README.md. "
85+
"`CLAUDE.md` requires every `src/` package to document its "
86+
"purpose and key interfaces."
87+
)
88+
body = readme.read_text(encoding="utf-8").strip()
89+
if len(body.encode("utf-8")) < MIN_BYTES:
90+
return (
91+
f"::error file={_normalised(readme)}::README.md is shorter than "
92+
f"{MIN_BYTES} bytes after stripping whitespace. Add purpose + "
93+
"key-interfaces text — a single heading does not satisfy the rule."
94+
)
95+
if not KEY_INTERFACES_HEADING.search(body):
96+
return (
97+
f"::error file={_normalised(readme)}::README.md missing a "
98+
"`## Key interfaces` heading (or the synonym `## Public surface`). "
99+
"Per CLAUDE.md the README must document the package's public "
100+
"surface; the heading anchors that section so contributors can "
101+
"find it. Add the heading and list the package's exported names."
102+
)
103+
return None
104+
105+
106+
def main() -> int:
107+
if not SRC_ROOT.is_dir():
108+
print(f"::error::{SRC_ROOT.as_posix()} not found; run from repo root")
109+
return 2
110+
111+
audited: list[Path] = []
112+
failures: list[str] = []
113+
114+
for directory in sorted(p for p in SRC_ROOT.rglob("*") if p.is_dir()):
115+
if directory.name == "__pycache__":
116+
continue
117+
if not _has_documentable_code(directory):
118+
continue
119+
audited.append(directory)
120+
message = _readme_failure(directory)
121+
if message is not None:
122+
failures.append(message)
123+
124+
if failures:
125+
for line in failures:
126+
print(line)
127+
print(
128+
f"\n{len(failures)} package(s) failed the README audit. "
129+
"Fix in this PR — there is no exemption mechanism, see the "
130+
"module docstring."
131+
)
132+
return 1
133+
134+
print(
135+
f"src/ README audit OK — {len(audited)} package(s) documented "
136+
f"(min {MIN_BYTES} bytes, `## Key interfaces` heading required)."
137+
)
138+
return 0
139+
140+
141+
if __name__ == "__main__":
142+
sys.exit(main())

.github/workflows/ci.yml

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -151,6 +151,19 @@ jobs:
151151
python-version: "3.14"
152152
- run: python .github/scripts/check_tests_present.py
153153

154+
src-readmes:
155+
name: src/ README audit
156+
runs-on: ubuntu-latest
157+
# CLAUDE.md: every `src/` package documents its purpose + key
158+
# interfaces. The audit checks shape (presence + min 200 bytes) and
159+
# structure (`## Key interfaces` heading). No exemption mechanism.
160+
steps:
161+
- uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
162+
- uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065 # v5
163+
with:
164+
python-version: "3.14"
165+
- run: python .github/scripts/check_src_readmes.py
166+
154167
frontend-build:
155168
name: Frontend Build
156169
runs-on: ubuntu-latest

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[project]
22
name = "harness-python-react"
3-
version = "0.2.0"
3+
version = "0.2.1"
44
description = "Production-quality LLM-driven coding harness — Python (FastAPI) backend, Vite + React + TypeScript frontend."
55
readme = "README.md"
66
requires-python = ">=3.14"

src/api/README.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
# `src/api/`
2+
3+
FastAPI HTTP surface for the harness scaffold. Owns request handling, route registration, lifespan hooks, and the in-memory session store. Imports from `src/observability/` (instrumentation), `src/models/` (Pydantic contracts) — never the other way around (enforced by `import-linter`).
4+
5+
## Key interfaces
6+
7+
- **`main.app`** — the `FastAPI()` instance with CORS middleware, OTel auto-instrumentation, and the v1 router mounted. Exposed for `uvicorn src.api.main:app` and the FastAPI `TestClient` fixtures in `tests/conftest.py`.
8+
- **`main.lifespan(app)`** — async context manager that runs `setup_tracing``setup_logging``instrument_httpx``instrument_fastapi` once on startup, then constructs the `SessionStore`. Single point of "what is process-globally configured?"
9+
- **`routes.router`**`APIRouter(prefix="/api/v1")`. Every route a real client should hit lives here; un-versioned routes live on `app` directly (only the FastAPI auto-generated docs at `/docs` and `/redoc` qualify).
10+
- **`routes.health`**`GET /api/v1/health``HealthResponse` (status + `importlib.metadata.version`-derived build version).
11+
- **`routes.echo`**`GET /api/v1/echo?msg=...``EchoResponse`. Demonstrates the request/response contract pattern that real domain endpoints follow.
12+
- **`sessions.SessionStore`** — single-process in-memory dict mapping session id → conversation history. Plumbed onto `app.state.session_store` in lifespan; replace with a Redis or DB-backed implementation for multi-process deployments.
13+
14+
## Conventions
15+
16+
- Every route returns a `StrictModel` subclass — never a raw dict. The `extra="forbid"` config makes typos and renamed fields fail at construction, not three calls deep.
17+
- Routes are versioned under `/api/v1/`. Adding a v2 endpoint = a new router, never a breaking change to v1.
18+
- The CORS policy is wide-open in the scaffold so the Vite dev server proxy works on first run; tighten via config when a real auth layer arrives.

src/eval/README.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
# `src/eval/`
2+
3+
Provider-agnostic evaluation harness — golden QA cases driven by a caller-supplied `answer_fn`, three tolerance modes (exact / numeric / semantic), and a `Protocol`-based LLM-judge seam so no SDK leaks into this layer. Designed to be runnable on a fresh clone without LLM credentials (the example case uses `exact_match`); the nightly workflow opt-in is documented in `docs/EVAL_HARNESS.md`.
4+
5+
## Key interfaces
6+
7+
- **`models.EvalCase`** — Pydantic shape of an entry in `eval/golden_qa.json`: `id`, `question`, `expected_answer`, `tolerance`, plus optional `category` / `difficulty` / `expected_tools` / `notes`.
8+
- **`models.EvalResult`** — per-case outcome: actual answer, latency, pass/fail, tolerance score, failure reason.
9+
- **`runner.load_golden_dataset(path=None)`** — pure JSON loader. Standalone so call sites can introspect the dataset without paying for an `EvalRunner`.
10+
- **`runner.EvalRunner(answer_fn, judge_client=None, judge_model="")`** — the orchestrator. `answer_fn: Callable[[str], str]` is the single seam — wire your agent loop, direct LLM client, or stub. `judge_client` implements the `judge.LLMClient` Protocol and is only consulted for `semantic_similar` cases.
11+
- **`evaluate(case)`** — run one case; returns `EvalResult`.
12+
- **`evaluate_all()`** — load the golden dataset, evaluate every case, return the list.
13+
- **`judge.LLMClient`**`Protocol` with one method: `complete_json(*, model, prompt) -> str`. Concrete adapters live in your agent code; the harness never imports OpenAI/Anthropic/Azure SDKs.
14+
- **`judge.evaluate_semantic_similarity(question, expected, actual, client, model)`** — calls the judge, returns `(score, explanation)`. Returns `(None, "no LLM client configured")` when `client is None` (inconclusive — runner treats it as pass with an explanatory note rather than a hard fail).
15+
- **`report.generate_report(results)`** — markdown summary: overall accuracy, per-category, per-difficulty, failure analysis with reason text.
16+
- **`__main__`**`python -m src.eval` runs the dataset with an identity `answer_fn` (echoes the question) and prints the markdown report.
17+
18+
## Conventions
19+
20+
- **`coverage` `omit`s this package** — the eval suite is exercised by `eval/test_golden_qa.py`, not by `tests/`. Counting it would inflate misses on every PR that touches behaviour without re-running the eval workflow.
21+
- **No domain coupling.** The harness ships one trivial echo case; real users replace `eval/golden_qa.json` with their domain dataset and wire `answer_fn` to their agent.
22+
- **Tolerance picks happen per case**, not per runner. A dataset can mix `exact_match`, `numeric_close` (within 1%), and `semantic_similar` (LLM judge ≥ 0.8) cases freely.

src/models/README.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
# `src/models/`
2+
3+
Pydantic contracts that cross module or process seams. Import-linter enforces that nothing outside this package matters here — `src.models` depends on **nothing in `src/`** (forbidden contract in `pyproject.toml`). The point: schema bugs surface at construction with clear `ValidationError`s, not as `AttributeError`s deep in the request path.
4+
5+
## Key interfaces
6+
7+
- **`_base.StrictModel`**`BaseModel` subclass with `model_config = ConfigDict(extra="forbid")`. Inherit this for any contract that crosses a boundary; opt into stricter type coercion with `class Foo(StrictModel, strict=True)` when JSON coercion (UUID, int) is undesirable.
8+
- **`health.HealthResponse`**`status: Literal["ok"]` + `version: str`. Returned by `GET /api/v1/health`.
9+
- **`session.SessionCreate`** / **`session.SessionInfo`** — the create-empty / public-info shapes used by `src.api.sessions.SessionStore`.
10+
- **`config.Settings`**`pydantic_settings.BaseSettings` reading the four `LLM_*` env vars (`LLM_PROVIDER`, `LLM_API_KEY`, `LLM_BASE_URL`, `LLM_MODEL`). Provider-pluggable seam — wire OpenAI, Anthropic, Azure, or vLLM by swapping the values without touching this code.
11+
- **`config.get_settings()`** — fresh `Settings` constructor (deliberately unmemoised so tests can re-construct after `monkeypatch.setenv`). Real callers should hold a single instance at startup.
12+
13+
## Conventions
14+
15+
- One module per logical contract group — `health`, `session`, `config`. Add a new file rather than appending to an existing one.
16+
- Per-class `strict=True` is the call site for any model that should reject `"3.14"` for a float; HTTP-boundary models often skip it because JSON requires the coercion.
17+
- `pyproject.toml` `[tool.ruff.lint.per-file-ignores]` adds `"src/models/**" = ["TCH003"]` because Pydantic needs runtime imports for type annotations.

src/observability/README.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# `src/observability/`
2+
3+
OpenTelemetry SDK setup, OTLP exporter wiring, structured-JSON logging with trace correlation, and span helpers built around the OTel GenAI semantic conventions. Wired once from `src.api.main.lifespan` so the rest of the codebase calls into a configured tracer / logger without orchestrating startup.
4+
5+
## Key interfaces
6+
7+
- **`tracing.setup_tracing()`** — creates the global `TracerProvider`, attaches a `BatchSpanProcessor` with the OTLP gRPC exporter (or `ConsoleSpanExporter` when `OTEL_EXPORTER=console`). Reads `OTEL_SERVICE_NAME` (default `harness-python-react`) and `OTEL_EXPORTER_OTLP_ENDPOINT` (default `http://localhost:4317`).
8+
- **`tracing.instrument_fastapi(app)`** + **`tracing.instrument_httpx()`** — auto-instrumentation hooks. Each FastAPI request and outbound httpx call gets a span with the standard semantic-convention attributes.
9+
- **`tracing.get_tracer(name)`** — retrieve a tracer by module name when manual instrumentation is needed.
10+
- **`logging.setup_logging(level=None)`** — configures stdlib `logging` to emit single-line JSON via `_JSONFormatter`. Reads `LOG_LEVEL` env when no argument is passed. Loops in `LoggingInstrumentor` so the `otelTraceID` / `otelSpanID` fields land on every record.
11+
- **`logging._JSONFormatter`** — the formatter; tested directly in `tests/test_observability.py` for the trace-correlation contract.
12+
- **`spans.agent_span(name, attributes=None)`** — context manager that opens a span, sets initial attributes from a `Mapping`, yields the live span for further mutation. Use this rather than calling `tracer.start_as_current_span` directly so the attribute-shape stays consistent.
13+
- **`spans.set_span_attributes(span, **kwargs)`** — set multiple attributes; `None` values are silently skipped.
14+
- **`spans.GENAI_*` / `DB_*` constants** — exported attribute-key constants for the OTel GenAI + database semantic conventions. Use these instead of raw strings so a typo is a `NameError`, not a silently-different attribute.
15+
16+
## Conventions
17+
18+
- **Semconv keys only.** The constants at the top of `spans.py` are the single source of attribute names. Adding a custom `agent.foo` attribute is a hard no — extend the semconv list (or wait for upstream OTel to bless one).
19+
- **Provider-agnostic exporter.** `OTEL_EXPORTER_OTLP_ENDPOINT` is the standard variable; `docker-compose.yml` sets it to `http://jaeger:4317` in the compose network.
20+
- **Lifespan-only setup.** Don't call `setup_tracing()` from anywhere except `src.api.main.lifespan` — duplicate calls trigger OTel's "Overriding of current TracerProvider" warning, and tests rely on the test-fixture variant that attaches an in-memory exporter to whatever provider exists.

src/tools/README.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
# `src/tools/`
2+
3+
Generic tool registry — the dispatcher pattern an LLM-driven agent uses to call typed tools. Each tool is a function `StrictModel -> StrictModel`; the registry maps a name to `(input_schema, callable)` and validates the dict-shaped input against the schema before dispatch. Layer-wise sits below `agent` / `api` / `eval`, above `data` / `models` (enforced by `import-linter`).
4+
5+
## Key interfaces
6+
7+
- **`registry.Registry`** — instance class. Construct one per agent/test fixture; the module also exposes a global `registry` singleton that the example `echo_tool` self-registers into at import time.
8+
- **`register(name, input_schema)`** — decorator factory. Wraps a function and inserts the `(input_schema, fn)` pair under `name`. Raises `ValueError` on duplicate registration.
9+
- **`dispatch(name, raw_input)`**`name`-lookup → `input_schema.model_validate(raw_input)` → call. Pydantic `ValidationError` propagates on bad input (wrong type or unknown keys via `StrictModel.extra="forbid"`).
10+
- **`names()`** — sorted list of registered tool names; cheap introspection for the agent's tool-listing prompt.
11+
- **`registry.UnknownToolError`**`KeyError` subclass raised by `dispatch` when *name* isn't registered. Caught and rendered as a tool-call-error event by the agent loop.
12+
- **`registry.registry`** — the module-global `Registry` instance. Real agents use this so the `echo_tool` is reachable from any consumer; tests construct private `Registry()` instances to avoid cross-test contamination.
13+
- **`registry.echo_tool`** + **`EchoToolInput`** + **`EchoToolOutput`** — example tool demonstrating the layer. Ships pre-registered under name `"echo"` to give a working dispatch path on a fresh clone.
14+
15+
## Conventions
16+
17+
- **Inputs and outputs are `StrictModel`s.** Avoid raw `dict`s — the `extra="forbid"` posture is the entire reason this pattern beats hand-rolled `if/elif` dispatch over JSON.
18+
- **Tools are pure functions** of their input. State the tool needs (DB connection, HTTP client) is injected via partial / closure at registration time, not via module globals.
19+
- **Add a tool = add a module.** Real tools beyond the example get their own file under `src/tools/` and self-register the same way `echo_tool` does. Keep `registry.py` itself focused on the dispatcher.

0 commit comments

Comments
 (0)