Skip to content

Commit b697804

Browse files
authored
Feat/1.4.0 (#6)
* feat(core): stabilize baseline v1 contract and report explainability * docs: align 1.4.0 changelog and contracts * test(ui): add block clone fixture module * fix(ui): restore trustworthy report UX and harden explainability rendering * test(baseline): harden v1 integrity invariants and add golden fixture * feat(cli): formalize exit-code contract and help diagnostics * feat(report): render explainability from core contract facts * docs: align CI contract guidance and add AGENTS playbook * chore(baseline): refresh repository baseline snapshot * test(golden): run canonical fixture only on cp313 * test(golden): derive canonical tag from fixture metadata * test(golden): use core python_tag source in canonical snapshot test * refactor(core): use single python_tag source across cli and tests * refactor(core): expose current_python_tag and remove tag derivation drift * refactor(explain): centralize group labels and compare-note rules in contract * fix(ui): increase spacing between group header and explain badges * feat(ui): wire help modal project links and align explainability labels * fix(ui): prevent provenance panel clipping for cache metadata * docs: optimize and refine README structure and clarity * fix(baseline): decouple schema_version from integrity hash * fix(core): harden cache secret load and centralize report UI labels * fix(cli): enforce unreadable source contract errors in gating * test(clone): dedupe CLI contract assertions and keep baseline clean * chore(docs): update docs * fix(test): fix errors in CI (exit code 152) * fix(core): keep only golden clone debt and finalize report/audit metadata * test(report): extract shared report fixtures and de-duplicate tests * docs: add contract book and align baseline/cache/report documentation * chore(deps): update lockfile * chore(baseline): update baseline snapshot
1 parent 8124008 commit b697804

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

66 files changed

+11192
-3103
lines changed

AGENTS.md

Lines changed: 316 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,316 @@
1+
# AGENTS.md — CodeClone (AI Agent Playbook)
2+
3+
This document is the **source of truth** for how AI agents should work in this repository.
4+
It is optimized for **determinism**, **CI stability**, and **reproducible changes**.
5+
6+
> Repository goal: maximize **honesty**, **reproducibility**, **determinism**, and **precision** for real‑world CI usage.
7+
8+
---
9+
10+
## 1) Operating principles (non‑negotiable)
11+
12+
1. **Do not break CI contracts.**
13+
- Treat baseline, cache, and report formats as **public APIs**.
14+
- Any contract change must be **versioned**, documented, and accompanied by tests.
15+
16+
2. **Determinism > cleverness.**
17+
- Outputs must be stable across runs given identical inputs (same repo, tool version, python tag).
18+
19+
3. **Evidence-based explainability.**
20+
- The core engine produces **facts/metrics**.
21+
- HTML/UI **renders facts**, it must not invent interpretations.
22+
23+
4. **Safety first.**
24+
- Never delete or overwrite user files outside repo.
25+
- Any write must be atomic where relevant (e.g., baseline `.tmp` + `os.replace`).
26+
27+
---
28+
29+
## 2) Quick orientation
30+
31+
CodeClone is an AST/CFG-informed clone detector for Python. It supports:
32+
- **function clones** (strongest signal)
33+
- **block clones** (sliding window of statements, may be noisy on boilerplate)
34+
- **segment clones** (report-only unless explicitly gated)
35+
36+
Key artifacts:
37+
- `codeclone.baseline.json` — trusted baseline snapshot (for CI comparisons)
38+
- `.cache/codeclone/cache.json` — analysis cache (integrity-checked)
39+
- `.cache/codeclone/report.html|report.json|report.txt` — reports
40+
41+
---
42+
43+
## 3) One command to validate your change
44+
45+
Run these locally before proposing changes:
46+
47+
```bash
48+
uv run ruff check .
49+
uv run mypy .
50+
uv run pytest -q
51+
```
52+
53+
If you touched baseline/cache/report contracts, also run the repo’s audit runner (or the scenario script if present).
54+
55+
---
56+
57+
## 4) Baseline contract (v1, stable)
58+
59+
### Baseline file structure (canonical)
60+
61+
```json
62+
{
63+
"meta": {
64+
"generator": { "name": "codeclone", "version": "X.Y.Z" },
65+
"schema_version": "1.0",
66+
"fingerprint_version": "1",
67+
"python_tag": "cp313",
68+
"created_at": "2026-02-08T14:20:15Z",
69+
"payload_sha256": ""
70+
},
71+
"clones": {
72+
"functions": [],
73+
"blocks": []
74+
}
75+
}
76+
```
77+
78+
### Rules
79+
80+
- `schema_version` is **baseline schema**, not package version.
81+
- Compatibility is tied to:
82+
- `fingerprint_version`
83+
- `python_tag`
84+
- `generator.name == "codeclone"`
85+
- `payload_sha256` is computed from a **canonical payload**:
86+
- stable key order
87+
- clone id lists are **sorted and unique**
88+
- integrity check uses constant‑time compare (e.g., `hmac.compare_digest`)
89+
90+
### Trust model
91+
92+
- A baseline is either **trusted** (`baseline_status = ok`) or **untrusted**.
93+
- **Normal mode**:
94+
- warn
95+
- ignore untrusted baseline
96+
- compare vs empty baseline
97+
- **CI gating mode** (`--ci` / `--fail-on-new`):
98+
- fail‑fast if baseline untrusted
99+
- exit code **2** for untrusted baseline
100+
101+
### Legacy behavior
102+
103+
- Legacy baselines (<= 1.3.x layout) must be treated as **untrusted** with explicit messaging and tests.
104+
105+
---
106+
107+
## 5) Cache contract (integrity + size guards)
108+
109+
- Cache is an **optimization**, never a source of truth.
110+
- If cache is invalid or too large:
111+
- warn
112+
- proceed without cache
113+
- ensure report meta reflects `cache_used=false`
114+
115+
Never “fix” cache by silently mutating it; prefer regenerate.
116+
117+
---
118+
119+
## 6) Reports and explainability
120+
121+
Reports come in:
122+
- HTML (`--html`)
123+
- JSON (`--json`)
124+
- Text (`--text`)
125+
126+
### Report invariants
127+
128+
- Ordering must be deterministic (stable sort keys).
129+
- All provenance fields must be consistent across formats:
130+
- baseline loaded / status
131+
- baseline fingerprint + schema versions
132+
- baseline generator version
133+
- cache path / cache used
134+
135+
### Explainability contract (core owns facts)
136+
137+
For each clone group (especially block clones), the **core** should be able to provide factual fields such as:
138+
139+
- `match_rule`
140+
- `signature_kind`
141+
- `window_size` (block size) / `segment_size`
142+
- `merged_regions` flag and counts
143+
- `stmt_type_sequence` (normalized)
144+
- `stmt_type_histogram`
145+
- `has_control_flow` (if/for/while/try/match)
146+
- ratios (assert / assign / call)
147+
- `max_consecutive_<type>` (e.g., consecutive asserts)
148+
149+
UI can show **hints** only when the predicate is **formal & exact** (100% confidence), e.g.:
150+
- `assert_only_block` (assert_ratio == 1.0 and consecutive_asserts == block_len)
151+
- `repeated_stmt_hash` (single stmt hash repeated across window)
152+
153+
No UI-only heuristics that affect gating.
154+
155+
---
156+
157+
## 7) Noise policy (what is and isn’t a “fix”)
158+
159+
### Acceptable fixes
160+
- Merge/report-layer improvements (e.g., merge sliding windows into maximal regions) **without changing gating**.
161+
- Better evidence surfaced in HTML to explain matches.
162+
163+
### Not acceptable as a “quick fix”
164+
- Weakening detection rules to hide noisy test patterns, unless:
165+
- it is configurable
166+
- default remains honest
167+
- the change is justified by real-world repos
168+
- it includes tests for false-negative risk
169+
170+
### Preferred remediation for test-only FPs
171+
- Refactor tests to avoid long repetitive statement sequences:
172+
- replace chains of `assert "... in html"` with loops or aggregated checks.
173+
174+
---
175+
176+
## 8) How to propose changes (agent workflow)
177+
178+
When you implement something:
179+
180+
1. **State the intent** (what user-visible issue does it solve?)
181+
2. **List files touched** and why.
182+
3. **Call out contracts affected**:
183+
- baseline / cache / report schema
184+
- CLI exit codes / messages
185+
4. **Add/adjust tests** for:
186+
- normal-mode behavior
187+
- CI gating behavior
188+
- determinism (identical output on rerun)
189+
- legacy/untrusted scenarios where applicable
190+
5. Run:
191+
- `ruff`, `mypy`, `pytest`
192+
193+
Avoid changing unrelated files (locks, roadmap) unless required.
194+
195+
---
196+
197+
## 9) CLI behavior and exit codes
198+
199+
Agents must preserve these semantics:
200+
201+
- **0** — success (including “new clones detected” in non-gating mode)
202+
- **2** — baseline gating failure (untrusted/missing baseline when CI requires trusted baseline; invalid output extension, etc.)
203+
- **3** — analysis gating failure (e.g., `--fail-threshold` exceeded or new clones in `--ci` as designed)
204+
205+
If you introduce a new exit reason, document it and add tests.
206+
207+
---
208+
209+
## 10) Release hygiene (for agent-assisted releases)
210+
211+
Before cutting a release:
212+
213+
- Confirm baseline schema compatibility is unchanged, or properly versioned.
214+
- Ensure changelog has:
215+
- user-facing changes
216+
- migration notes if any
217+
- Validate `twine check dist/*` for built artifacts.
218+
- Smoke test install in a clean venv:
219+
- `pip install dist/*.whl`
220+
- `codeclone --version`
221+
- `codeclone . --ci` in a sample repo with baseline.
222+
223+
---
224+
225+
## 11) “Don’t do this” list
226+
227+
- Don’t add hidden behavior differences between report formats.
228+
- Don’t make baseline compatibility depend on package patch/minor version.
229+
- Don’t add project-root hashes or unstable machine-local fields to baseline.
230+
- Don’t embed suppressions into baseline unless explicitly designed as a versioned contract.
231+
- Don’t introduce nondeterministic ordering (dict iteration, set ordering, filesystem traversal without sort).
232+
233+
---
234+
235+
## 12) Where to put new code
236+
237+
## 13) Python language + typing rules (3.10 → 3.14)
238+
239+
These rules are **repo policy**. If you need to violate one, you must explain why in the PR.
240+
241+
### Supported Python versions
242+
- **Must run on Python 3.10, 3.11, 3.12, 3.13, 3.14**.
243+
- Do not rely on behavior that is new to only the latest version unless you provide a fallback.
244+
- Prefer **standard library** features that exist in 3.10+.
245+
246+
### Modern syntax (allowed / preferred)
247+
Use modern syntax when it stays compatible with 3.10+:
248+
- `X | Y` unions, `list[str]` / `dict[str, int]` generics (PEP 604 / PEP 585)
249+
- `from __future__ import annotations` is allowed, but keep behavior consistent across 3.10–3.14.
250+
- `match/case` (PEP 634) is allowed, but only if it keeps determinism/readability.
251+
- `typing.Self` (3.11+) **avoid** in public APIs unless you gate it with `typing_extensions`.
252+
- Prefer `pathlib.Path` over `os.path` for new code (but keep hot paths pragmatic).
253+
254+
### Typing standards
255+
- **Type hints are required** for all public functions, core pipeline surfaces, and any code that touches:
256+
baseline, cache, fingerprints, report models, serialization, CLI exit behavior.
257+
- Keep **`Any` to an absolute minimum**:
258+
- `Any` is allowed only at IO boundaries (JSON parsing, `argparse`, `subprocess`) and must be
259+
*narrowed immediately* into typed structures (dataclasses / TypedDict / Protocol / enums).
260+
- If `Any` appears in “core/domain” code, add a comment: `# Any: <reason>` and a TODO to remove.
261+
- Prefer **`Literal` / enums** for finite sets (e.g., status codes, kinds).
262+
- Prefer **`dataclasses`** (frozen where reasonable) for data models; keep models JSON‑serializable.
263+
- Use `collections.abc` types (`Iterable`, `Sequence`, `Mapping`) for inputs where appropriate.
264+
- Avoid `cast()` unless you also add an invariant check nearby.
265+
266+
### Dataclasses / models
267+
- Models that cross module boundaries should be:
268+
- explicitly typed
269+
- immutable when possible (`frozen=True`)
270+
- validated at construction (or via a dedicated `validate_*` function) if they are user‑provided.
271+
272+
### Error handling
273+
- Prefer explicit, typed error types over stringly‑typed errors.
274+
- Exit codes are part of the public contract; do not change them without updating tests + docs.
275+
276+
### Determinism requirements (language-level)
277+
- Never iterate over unordered containers (`set`, `dict`) without sorting first when it affects:
278+
hashes, IDs, report ordering, baseline payloads, or UI output.
279+
- Use stable formatting (sorted keys, stable ordering) in JSON output.
280+
281+
### Key PEPs to keep in mind
282+
- PEP 8, PEP 484 (typing), PEP 526 (variable annotations)
283+
- PEP 563 / PEP 649 (annotation evaluation changes across versions) — avoid relying on evaluation timing
284+
- PEP 585 (built-in generics), PEP 604 (X | Y unions)
285+
- PEP 634 (structural pattern matching)
286+
- PEP 612 (ParamSpec) / PEP 646 (TypeVarTuple) — only if it clearly helps, don’t overcomplicate
287+
288+
289+
290+
Prefer these rules:
291+
292+
- **Domain / contracts / enums** live near the domain owner (baseline statuses in baseline domain).
293+
- **Core logic** should not depend on HTML.
294+
- **Render** depends on report model, never the other way around.
295+
- If a module becomes a “god module”, split by:
296+
- model (types)
297+
- io/serialization
298+
- rules/validation
299+
- ui rendering
300+
301+
Avoid deep package hierarchies unless they clearly reduce coupling.
302+
303+
---
304+
305+
## 14) Minimal checklist for PRs (agents)
306+
307+
- [ ] Change is deterministic.
308+
- [ ] Contracts preserved or versioned.
309+
- [ ] Tests added for new behavior.
310+
- [ ] `ruff`, `mypy`, `pytest` green.
311+
- [ ] CLI messages remain helpful and stable (don’t break scripts).
312+
- [ ] Reports contain provenance fields and reflect trust model correctly.
313+
314+
---
315+
316+
If you are an AI agent and something here conflicts with an instruction from a maintainer in the PR/issue thread, **ask for clarification in the thread** and default to this document until resolved.

0 commit comments

Comments
 (0)