|
| 1 | +# AGENTS.md — CodeClone (AI Agent Playbook) |
| 2 | + |
| 3 | +This document is the **source of truth** for how AI agents should work in this repository. |
| 4 | +It is optimized for **determinism**, **CI stability**, and **reproducible changes**. |
| 5 | + |
| 6 | +> Repository goal: maximize **honesty**, **reproducibility**, **determinism**, and **precision** for real‑world CI usage. |
| 7 | +
|
| 8 | +--- |
| 9 | + |
| 10 | +## 1) Operating principles (non‑negotiable) |
| 11 | + |
| 12 | +1. **Do not break CI contracts.** |
| 13 | + - Treat baseline, cache, and report formats as **public APIs**. |
| 14 | + - Any contract change must be **versioned**, documented, and accompanied by tests. |
| 15 | + |
| 16 | +2. **Determinism > cleverness.** |
| 17 | + - Outputs must be stable across runs given identical inputs (same repo, tool version, python tag). |
| 18 | + |
| 19 | +3. **Evidence-based explainability.** |
| 20 | + - The core engine produces **facts/metrics**. |
| 21 | + - HTML/UI **renders facts**, it must not invent interpretations. |
| 22 | + |
| 23 | +4. **Safety first.** |
| 24 | + - Never delete or overwrite user files outside repo. |
| 25 | + - Any write must be atomic where relevant (e.g., baseline `.tmp` + `os.replace`). |
| 26 | + |
| 27 | +--- |
| 28 | + |
| 29 | +## 2) Quick orientation |
| 30 | + |
| 31 | +CodeClone is an AST/CFG-informed clone detector for Python. It supports: |
| 32 | +- **function clones** (strongest signal) |
| 33 | +- **block clones** (sliding window of statements, may be noisy on boilerplate) |
| 34 | +- **segment clones** (report-only unless explicitly gated) |
| 35 | + |
| 36 | +Key artifacts: |
| 37 | +- `codeclone.baseline.json` — trusted baseline snapshot (for CI comparisons) |
| 38 | +- `.cache/codeclone/cache.json` — analysis cache (integrity-checked) |
| 39 | +- `.cache/codeclone/report.html|report.json|report.txt` — reports |
| 40 | + |
| 41 | +--- |
| 42 | + |
| 43 | +## 3) One command to validate your change |
| 44 | + |
| 45 | +Run these locally before proposing changes: |
| 46 | + |
| 47 | +```bash |
| 48 | +uv run ruff check . |
| 49 | +uv run mypy . |
| 50 | +uv run pytest -q |
| 51 | +``` |
| 52 | + |
| 53 | +If you touched baseline/cache/report contracts, also run the repo’s audit runner (or the scenario script if present). |
| 54 | + |
| 55 | +--- |
| 56 | + |
| 57 | +## 4) Baseline contract (v1, stable) |
| 58 | + |
| 59 | +### Baseline file structure (canonical) |
| 60 | + |
| 61 | +```json |
| 62 | +{ |
| 63 | + "meta": { |
| 64 | + "generator": { "name": "codeclone", "version": "X.Y.Z" }, |
| 65 | + "schema_version": "1.0", |
| 66 | + "fingerprint_version": "1", |
| 67 | + "python_tag": "cp313", |
| 68 | + "created_at": "2026-02-08T14:20:15Z", |
| 69 | + "payload_sha256": "…" |
| 70 | + }, |
| 71 | + "clones": { |
| 72 | + "functions": [], |
| 73 | + "blocks": [] |
| 74 | + } |
| 75 | +} |
| 76 | +``` |
| 77 | + |
| 78 | +### Rules |
| 79 | + |
| 80 | +- `schema_version` is **baseline schema**, not package version. |
| 81 | +- Compatibility is tied to: |
| 82 | + - `fingerprint_version` |
| 83 | + - `python_tag` |
| 84 | + - `generator.name == "codeclone"` |
| 85 | +- `payload_sha256` is computed from a **canonical payload**: |
| 86 | + - stable key order |
| 87 | + - clone id lists are **sorted and unique** |
| 88 | + - integrity check uses constant‑time compare (e.g., `hmac.compare_digest`) |
| 89 | + |
| 90 | +### Trust model |
| 91 | + |
| 92 | +- A baseline is either **trusted** (`baseline_status = ok`) or **untrusted**. |
| 93 | +- **Normal mode**: |
| 94 | + - warn |
| 95 | + - ignore untrusted baseline |
| 96 | + - compare vs empty baseline |
| 97 | +- **CI gating mode** (`--ci` / `--fail-on-new`): |
| 98 | + - fail‑fast if baseline untrusted |
| 99 | + - exit code **2** for untrusted baseline |
| 100 | + |
| 101 | +### Legacy behavior |
| 102 | + |
| 103 | +- Legacy baselines (<= 1.3.x layout) must be treated as **untrusted** with explicit messaging and tests. |
| 104 | + |
| 105 | +--- |
| 106 | + |
| 107 | +## 5) Cache contract (integrity + size guards) |
| 108 | + |
| 109 | +- Cache is an **optimization**, never a source of truth. |
| 110 | +- If cache is invalid or too large: |
| 111 | + - warn |
| 112 | + - proceed without cache |
| 113 | + - ensure report meta reflects `cache_used=false` |
| 114 | + |
| 115 | +Never “fix” cache by silently mutating it; prefer regenerate. |
| 116 | + |
| 117 | +--- |
| 118 | + |
| 119 | +## 6) Reports and explainability |
| 120 | + |
| 121 | +Reports come in: |
| 122 | +- HTML (`--html`) |
| 123 | +- JSON (`--json`) |
| 124 | +- Text (`--text`) |
| 125 | + |
| 126 | +### Report invariants |
| 127 | + |
| 128 | +- Ordering must be deterministic (stable sort keys). |
| 129 | +- All provenance fields must be consistent across formats: |
| 130 | + - baseline loaded / status |
| 131 | + - baseline fingerprint + schema versions |
| 132 | + - baseline generator version |
| 133 | + - cache path / cache used |
| 134 | + |
| 135 | +### Explainability contract (core owns facts) |
| 136 | + |
| 137 | +For each clone group (especially block clones), the **core** should be able to provide factual fields such as: |
| 138 | + |
| 139 | +- `match_rule` |
| 140 | +- `signature_kind` |
| 141 | +- `window_size` (block size) / `segment_size` |
| 142 | +- `merged_regions` flag and counts |
| 143 | +- `stmt_type_sequence` (normalized) |
| 144 | +- `stmt_type_histogram` |
| 145 | +- `has_control_flow` (if/for/while/try/match) |
| 146 | +- ratios (assert / assign / call) |
| 147 | +- `max_consecutive_<type>` (e.g., consecutive asserts) |
| 148 | + |
| 149 | +UI can show **hints** only when the predicate is **formal & exact** (100% confidence), e.g.: |
| 150 | +- `assert_only_block` (assert_ratio == 1.0 and consecutive_asserts == block_len) |
| 151 | +- `repeated_stmt_hash` (single stmt hash repeated across window) |
| 152 | + |
| 153 | +No UI-only heuristics that affect gating. |
| 154 | + |
| 155 | +--- |
| 156 | + |
| 157 | +## 7) Noise policy (what is and isn’t a “fix”) |
| 158 | + |
| 159 | +### Acceptable fixes |
| 160 | +- Merge/report-layer improvements (e.g., merge sliding windows into maximal regions) **without changing gating**. |
| 161 | +- Better evidence surfaced in HTML to explain matches. |
| 162 | + |
| 163 | +### Not acceptable as a “quick fix” |
| 164 | +- Weakening detection rules to hide noisy test patterns, unless: |
| 165 | + - it is configurable |
| 166 | + - default remains honest |
| 167 | + - the change is justified by real-world repos |
| 168 | + - it includes tests for false-negative risk |
| 169 | + |
| 170 | +### Preferred remediation for test-only FPs |
| 171 | +- Refactor tests to avoid long repetitive statement sequences: |
| 172 | + - replace chains of `assert "... in html"` with loops or aggregated checks. |
| 173 | + |
| 174 | +--- |
| 175 | + |
| 176 | +## 8) How to propose changes (agent workflow) |
| 177 | + |
| 178 | +When you implement something: |
| 179 | + |
| 180 | +1. **State the intent** (what user-visible issue does it solve?) |
| 181 | +2. **List files touched** and why. |
| 182 | +3. **Call out contracts affected**: |
| 183 | + - baseline / cache / report schema |
| 184 | + - CLI exit codes / messages |
| 185 | +4. **Add/adjust tests** for: |
| 186 | + - normal-mode behavior |
| 187 | + - CI gating behavior |
| 188 | + - determinism (identical output on rerun) |
| 189 | + - legacy/untrusted scenarios where applicable |
| 190 | +5. Run: |
| 191 | + - `ruff`, `mypy`, `pytest` |
| 192 | + |
| 193 | +Avoid changing unrelated files (locks, roadmap) unless required. |
| 194 | + |
| 195 | +--- |
| 196 | + |
| 197 | +## 9) CLI behavior and exit codes |
| 198 | + |
| 199 | +Agents must preserve these semantics: |
| 200 | + |
| 201 | +- **0** — success (including “new clones detected” in non-gating mode) |
| 202 | +- **2** — baseline gating failure (untrusted/missing baseline when CI requires trusted baseline; invalid output extension, etc.) |
| 203 | +- **3** — analysis gating failure (e.g., `--fail-threshold` exceeded or new clones in `--ci` as designed) |
| 204 | + |
| 205 | +If you introduce a new exit reason, document it and add tests. |
| 206 | + |
| 207 | +--- |
| 208 | + |
| 209 | +## 10) Release hygiene (for agent-assisted releases) |
| 210 | + |
| 211 | +Before cutting a release: |
| 212 | + |
| 213 | +- Confirm baseline schema compatibility is unchanged, or properly versioned. |
| 214 | +- Ensure changelog has: |
| 215 | + - user-facing changes |
| 216 | + - migration notes if any |
| 217 | +- Validate `twine check dist/*` for built artifacts. |
| 218 | +- Smoke test install in a clean venv: |
| 219 | + - `pip install dist/*.whl` |
| 220 | + - `codeclone --version` |
| 221 | + - `codeclone . --ci` in a sample repo with baseline. |
| 222 | + |
| 223 | +--- |
| 224 | + |
| 225 | +## 11) “Don’t do this” list |
| 226 | + |
| 227 | +- Don’t add hidden behavior differences between report formats. |
| 228 | +- Don’t make baseline compatibility depend on package patch/minor version. |
| 229 | +- Don’t add project-root hashes or unstable machine-local fields to baseline. |
| 230 | +- Don’t embed suppressions into baseline unless explicitly designed as a versioned contract. |
| 231 | +- Don’t introduce nondeterministic ordering (dict iteration, set ordering, filesystem traversal without sort). |
| 232 | + |
| 233 | +--- |
| 234 | + |
| 235 | +## 12) Where to put new code |
| 236 | + |
| 237 | +## 13) Python language + typing rules (3.10 → 3.14) |
| 238 | + |
| 239 | +These rules are **repo policy**. If you need to violate one, you must explain why in the PR. |
| 240 | + |
| 241 | +### Supported Python versions |
| 242 | +- **Must run on Python 3.10, 3.11, 3.12, 3.13, 3.14**. |
| 243 | +- Do not rely on behavior that is new to only the latest version unless you provide a fallback. |
| 244 | +- Prefer **standard library** features that exist in 3.10+. |
| 245 | + |
| 246 | +### Modern syntax (allowed / preferred) |
| 247 | +Use modern syntax when it stays compatible with 3.10+: |
| 248 | +- `X | Y` unions, `list[str]` / `dict[str, int]` generics (PEP 604 / PEP 585) |
| 249 | +- `from __future__ import annotations` is allowed, but keep behavior consistent across 3.10–3.14. |
| 250 | +- `match/case` (PEP 634) is allowed, but only if it keeps determinism/readability. |
| 251 | +- `typing.Self` (3.11+) **avoid** in public APIs unless you gate it with `typing_extensions`. |
| 252 | +- Prefer `pathlib.Path` over `os.path` for new code (but keep hot paths pragmatic). |
| 253 | + |
| 254 | +### Typing standards |
| 255 | +- **Type hints are required** for all public functions, core pipeline surfaces, and any code that touches: |
| 256 | + baseline, cache, fingerprints, report models, serialization, CLI exit behavior. |
| 257 | +- Keep **`Any` to an absolute minimum**: |
| 258 | + - `Any` is allowed only at IO boundaries (JSON parsing, `argparse`, `subprocess`) and must be |
| 259 | + *narrowed immediately* into typed structures (dataclasses / TypedDict / Protocol / enums). |
| 260 | + - If `Any` appears in “core/domain” code, add a comment: `# Any: <reason>` and a TODO to remove. |
| 261 | +- Prefer **`Literal` / enums** for finite sets (e.g., status codes, kinds). |
| 262 | +- Prefer **`dataclasses`** (frozen where reasonable) for data models; keep models JSON‑serializable. |
| 263 | +- Use `collections.abc` types (`Iterable`, `Sequence`, `Mapping`) for inputs where appropriate. |
| 264 | +- Avoid `cast()` unless you also add an invariant check nearby. |
| 265 | + |
| 266 | +### Dataclasses / models |
| 267 | +- Models that cross module boundaries should be: |
| 268 | + - explicitly typed |
| 269 | + - immutable when possible (`frozen=True`) |
| 270 | + - validated at construction (or via a dedicated `validate_*` function) if they are user‑provided. |
| 271 | + |
| 272 | +### Error handling |
| 273 | +- Prefer explicit, typed error types over stringly‑typed errors. |
| 274 | +- Exit codes are part of the public contract; do not change them without updating tests + docs. |
| 275 | + |
| 276 | +### Determinism requirements (language-level) |
| 277 | +- Never iterate over unordered containers (`set`, `dict`) without sorting first when it affects: |
| 278 | + hashes, IDs, report ordering, baseline payloads, or UI output. |
| 279 | +- Use stable formatting (sorted keys, stable ordering) in JSON output. |
| 280 | + |
| 281 | +### Key PEPs to keep in mind |
| 282 | +- PEP 8, PEP 484 (typing), PEP 526 (variable annotations) |
| 283 | +- PEP 563 / PEP 649 (annotation evaluation changes across versions) — avoid relying on evaluation timing |
| 284 | +- PEP 585 (built-in generics), PEP 604 (X | Y unions) |
| 285 | +- PEP 634 (structural pattern matching) |
| 286 | +- PEP 612 (ParamSpec) / PEP 646 (TypeVarTuple) — only if it clearly helps, don’t overcomplicate |
| 287 | + |
| 288 | + |
| 289 | + |
| 290 | +Prefer these rules: |
| 291 | + |
| 292 | +- **Domain / contracts / enums** live near the domain owner (baseline statuses in baseline domain). |
| 293 | +- **Core logic** should not depend on HTML. |
| 294 | +- **Render** depends on report model, never the other way around. |
| 295 | +- If a module becomes a “god module”, split by: |
| 296 | + - model (types) |
| 297 | + - io/serialization |
| 298 | + - rules/validation |
| 299 | + - ui rendering |
| 300 | + |
| 301 | +Avoid deep package hierarchies unless they clearly reduce coupling. |
| 302 | + |
| 303 | +--- |
| 304 | + |
| 305 | +## 14) Minimal checklist for PRs (agents) |
| 306 | + |
| 307 | +- [ ] Change is deterministic. |
| 308 | +- [ ] Contracts preserved or versioned. |
| 309 | +- [ ] Tests added for new behavior. |
| 310 | +- [ ] `ruff`, `mypy`, `pytest` green. |
| 311 | +- [ ] CLI messages remain helpful and stable (don’t break scripts). |
| 312 | +- [ ] Reports contain provenance fields and reflect trust model correctly. |
| 313 | + |
| 314 | +--- |
| 315 | + |
| 316 | +If you are an AI agent and something here conflicts with an instruction from a maintainer in the PR/issue thread, **ask for clarification in the thread** and default to this document until resolved. |
0 commit comments