Skip to content

Commit 18a96b6

Browse files
committed
feat: complete spec 2.0.0 architecture and UX updates
1 parent a328bd1 commit 18a96b6

108 files changed

Lines changed: 16789 additions & 2716 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.gitignore

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,4 +32,7 @@ htmlcov/
3232
.DS_Store
3333

3434
# Logs
35-
*.log
35+
*.log
36+
/.claude/
37+
/docs/SPEC-2.0.0.md
38+
/.uv-cache/

AGENTS.md

Lines changed: 90 additions & 60 deletions
Original file line numberDiff line numberDiff line change
@@ -3,37 +3,44 @@
33
This document is the **source of truth** for how AI agents should work in this repository.
44
It is optimized for **determinism**, **CI stability**, and **reproducible changes**.
55

6-
> Repository goal: maximize **honesty**, **reproducibility**, **determinism**, and **precision** for real‑world CI usage.
6+
> Repository goal: maximize **honesty**, **reproducibility**, **determinism**, and **precision** for real‑world CI
7+
> usage.
78
89
---
910

1011
## 1) Operating principles (non‑negotiable)
1112

1213
1. **Do not break CI contracts.**
13-
- Treat baseline, cache, and report formats as **public APIs**.
14-
- Any contract change must be **versioned**, documented, and accompanied by tests.
14+
- Treat baseline, cache, and report formats as **public APIs**.
15+
- Any contract change must be **versioned**, documented, and accompanied by tests.
1516

1617
2. **Determinism > cleverness.**
17-
- Outputs must be stable across runs given identical inputs (same repo, tool version, python tag).
18+
- Outputs must be stable across runs given identical inputs (same repo, tool version, python tag).
1819

1920
3. **Evidence-based explainability.**
20-
- The core engine produces **facts/metrics**.
21-
- HTML/UI **renders facts**, it must not invent interpretations.
21+
- The core engine produces **facts/metrics**.
22+
- HTML/UI **renders facts**, it must not invent interpretations.
2223

2324
4. **Safety first.**
24-
- Never delete or overwrite user files outside repo.
25-
- Any write must be atomic where relevant (e.g., baseline `.tmp` + `os.replace`).
25+
- Never delete or overwrite user files outside repo.
26+
- Any write must be atomic where relevant (e.g., baseline `.tmp` + `os.replace`).
27+
28+
5. **Golden tests are contract sentinels.**
29+
- Do not update golden snapshots to “fix” failing tests unless the contract change is intentional, versioned where
30+
required, documented, and explicitly approved.
2631

2732
---
2833

2934
## 2) Quick orientation
3035

3136
CodeClone is an AST/CFG-informed clone detector for Python. It supports:
37+
3238
- **function clones** (strongest signal)
3339
- **block clones** (sliding window of statements, may be noisy on boilerplate)
3440
- **segment clones** (report-only unless explicitly gated)
3541

3642
Key artifacts:
43+
3744
- `codeclone.baseline.json` — trusted baseline snapshot (for CI comparisons)
3845
- `.cache/codeclone/cache.json` — analysis cache (integrity-checked)
3946
- `.cache/codeclone/report.html|report.json|report.txt` — reports
@@ -54,15 +61,18 @@ If you touched baseline/cache/report contracts, also run the repo’s audit runn
5461

5562
---
5663

57-
## 4) Baseline contract (v1, stable)
64+
## 4) Baseline contract (v2, stable)
5865

5966
### Baseline file structure (canonical)
6067

6168
```json
6269
{
6370
"meta": {
64-
"generator": { "name": "codeclone", "version": "X.Y.Z" },
65-
"schema_version": "1.0",
71+
"generator": {
72+
"name": "codeclone",
73+
"version": "X.Y.Z"
74+
},
75+
"schema_version": "2.0",
6676
"fingerprint_version": "1",
6777
"python_tag": "cp313",
6878
"created_at": "2026-02-08T14:20:15Z",
@@ -71,32 +81,37 @@ If you touched baseline/cache/report contracts, also run the repo’s audit runn
7181
"clones": {
7282
"functions": [],
7383
"blocks": []
84+
},
85+
"metrics": {
86+
"...": "optional embedded snapshot"
7487
}
7588
}
7689
```
7790

7891
### Rules
7992

8093
- `schema_version` is **baseline schema**, not package version.
94+
- Runtime writes baseline schema `2.0`.
95+
- Runtime accepts baseline schema `1.x` and `2.x` for compatibility checks.
8196
- Compatibility is tied to:
82-
- `fingerprint_version`
83-
- `python_tag`
84-
- `generator.name == "codeclone"`
97+
- `fingerprint_version`
98+
- `python_tag`
99+
- `generator.name == "codeclone"`
85100
- `payload_sha256` is computed from a **canonical payload**:
86-
- stable key order
87-
- clone id lists are **sorted and unique**
88-
- integrity check uses constant‑time compare (e.g., `hmac.compare_digest`)
101+
- stable key order
102+
- clone id lists are **sorted and unique**
103+
- integrity check uses constant‑time compare (e.g., `hmac.compare_digest`)
89104

90105
### Trust model
91106

92107
- A baseline is either **trusted** (`baseline_status = ok`) or **untrusted**.
93108
- **Normal mode**:
94-
- warn
95-
- ignore untrusted baseline
96-
- compare vs empty baseline
109+
- warn
110+
- ignore untrusted baseline
111+
- compare vs empty baseline
97112
- **CI gating mode** (`--ci` / `--fail-on-new`):
98-
- fail‑fast if baseline untrusted
99-
- exit code **2** for untrusted baseline
113+
- fail‑fast if baseline untrusted
114+
- exit code **2** for untrusted baseline
100115

101116
### Legacy behavior
102117

@@ -108,9 +123,9 @@ If you touched baseline/cache/report contracts, also run the repo’s audit runn
108123

109124
- Cache is an **optimization**, never a source of truth.
110125
- If cache is invalid or too large:
111-
- warn
112-
- proceed without cache
113-
- ensure report meta reflects `cache_used=false`
126+
- warn
127+
- proceed without cache
128+
- ensure report meta reflects `cache_used=false`
114129

115130
Never “fix” cache by silently mutating it; prefer regenerate.
116131

@@ -119,6 +134,7 @@ Never “fix” cache by silently mutating it; prefer regenerate.
119134
## 6) Reports and explainability
120135

121136
Reports come in:
137+
122138
- HTML (`--html`)
123139
- JSON (`--json`)
124140
- Text (`--text`)
@@ -127,10 +143,10 @@ Reports come in:
127143

128144
- Ordering must be deterministic (stable sort keys).
129145
- All provenance fields must be consistent across formats:
130-
- baseline loaded / status
131-
- baseline fingerprint + schema versions
132-
- baseline generator version
133-
- cache path / cache used
146+
- baseline loaded / status
147+
- baseline fingerprint + schema versions
148+
- baseline generator version
149+
- cache path / cache used
134150

135151
### Explainability contract (core owns facts)
136152

@@ -147,6 +163,7 @@ For each clone group (especially block clones), the **core** should be able to p
147163
- `max_consecutive_<type>` (e.g., consecutive asserts)
148164

149165
UI can show **hints** only when the predicate is **formal & exact** (100% confidence), e.g.:
166+
150167
- `assert_only_block` (assert_ratio == 1.0 and consecutive_asserts == block_len)
151168
- `repeated_stmt_hash` (single stmt hash repeated across window)
152169

@@ -157,19 +174,22 @@ No UI-only heuristics that affect gating.
157174
## 7) Noise policy (what is and isn’t a “fix”)
158175

159176
### Acceptable fixes
177+
160178
- Merge/report-layer improvements (e.g., merge sliding windows into maximal regions) **without changing gating**.
161179
- Better evidence surfaced in HTML to explain matches.
162180

163181
### Not acceptable as a “quick fix”
182+
164183
- Weakening detection rules to hide noisy test patterns, unless:
165-
- it is configurable
166-
- default remains honest
167-
- the change is justified by real-world repos
168-
- it includes tests for false-negative risk
184+
- it is configurable
185+
- default remains honest
186+
- the change is justified by real-world repos
187+
- it includes tests for false-negative risk
169188

170189
### Preferred remediation for test-only FPs
190+
171191
- Refactor tests to avoid long repetitive statement sequences:
172-
- replace chains of `assert "... in html"` with loops or aggregated checks.
192+
- replace chains of `assert "... in html"` with loops or aggregated checks.
173193

174194
---
175195

@@ -180,15 +200,15 @@ When you implement something:
180200
1. **State the intent** (what user-visible issue does it solve?)
181201
2. **List files touched** and why.
182202
3. **Call out contracts affected**:
183-
- baseline / cache / report schema
184-
- CLI exit codes / messages
203+
- baseline / cache / report schema
204+
- CLI exit codes / messages
185205
4. **Add/adjust tests** for:
186-
- normal-mode behavior
187-
- CI gating behavior
188-
- determinism (identical output on rerun)
189-
- legacy/untrusted scenarios where applicable
206+
- normal-mode behavior
207+
- CI gating behavior
208+
- determinism (identical output on rerun)
209+
- legacy/untrusted scenarios where applicable
190210
5. Run:
191-
- `ruff`, `mypy`, `pytest`
211+
- `ruff`, `mypy`, `pytest`
192212

193213
Avoid changing unrelated files (locks, roadmap) unless required.
194214

@@ -199,7 +219,8 @@ Avoid changing unrelated files (locks, roadmap) unless required.
199219
Agents must preserve these semantics:
200220

201221
- **0** — success (including “new clones detected” in non-gating mode)
202-
- **2** — baseline gating failure (untrusted/missing baseline when CI requires trusted baseline; invalid output extension, etc.)
222+
- **2** — baseline gating failure (untrusted/missing baseline when CI requires trusted baseline; invalid output
223+
extension, etc.)
203224
- **3** — analysis gating failure (e.g., `--fail-threshold` exceeded or new clones in `--ci` as designed)
204225

205226
If you introduce a new exit reason, document it and add tests.
@@ -212,13 +233,13 @@ Before cutting a release:
212233

213234
- Confirm baseline schema compatibility is unchanged, or properly versioned.
214235
- Ensure changelog has:
215-
- user-facing changes
216-
- migration notes if any
236+
- user-facing changes
237+
- migration notes if any
217238
- Validate `twine check dist/*` for built artifacts.
218239
- Smoke test install in a clean venv:
219-
- `pip install dist/*.whl`
220-
- `codeclone --version`
221-
- `codeclone . --ci` in a sample repo with baseline.
240+
- `pip install dist/*.whl`
241+
- `codeclone --version`
242+
- `codeclone . --ci` in a sample repo with baseline.
222243

223244
---
224245

@@ -239,64 +260,70 @@ Before cutting a release:
239260
These rules are **repo policy**. If you need to violate one, you must explain why in the PR.
240261

241262
### Supported Python versions
263+
242264
- **Must run on Python 3.10, 3.11, 3.12, 3.13, 3.14**.
243265
- Do not rely on behavior that is new to only the latest version unless you provide a fallback.
244266
- Prefer **standard library** features that exist in 3.10+.
245267

246268
### Modern syntax (allowed / preferred)
269+
247270
Use modern syntax when it stays compatible with 3.10+:
271+
248272
- `X | Y` unions, `list[str]` / `dict[str, int]` generics (PEP 604 / PEP 585)
249273
- `from __future__ import annotations` is allowed, but keep behavior consistent across 3.10–3.14.
250274
- `match/case` (PEP 634) is allowed, but only if it keeps determinism/readability.
251275
- `typing.Self` (3.11+) **avoid** in public APIs unless you gate it with `typing_extensions`.
252276
- Prefer `pathlib.Path` over `os.path` for new code (but keep hot paths pragmatic).
253277

254278
### Typing standards
279+
255280
- **Type hints are required** for all public functions, core pipeline surfaces, and any code that touches:
256281
baseline, cache, fingerprints, report models, serialization, CLI exit behavior.
257282
- Keep **`Any` to an absolute minimum**:
258-
- `Any` is allowed only at IO boundaries (JSON parsing, `argparse`, `subprocess`) and must be
259-
*narrowed immediately* into typed structures (dataclasses / TypedDict / Protocol / enums).
260-
- If `Any` appears in “core/domain” code, add a comment: `# Any: <reason>` and a TODO to remove.
283+
- `Any` is allowed only at IO boundaries (JSON parsing, `argparse`, `subprocess`) and must be
284+
*narrowed immediately* into typed structures (dataclasses / TypedDict / Protocol / enums).
285+
- If `Any` appears in “core/domain” code, add a comment: `# Any: <reason>` and a TODO to remove.
261286
- Prefer **`Literal` / enums** for finite sets (e.g., status codes, kinds).
262287
- Prefer **`dataclasses`** (frozen where reasonable) for data models; keep models JSON‑serializable.
263288
- Use `collections.abc` types (`Iterable`, `Sequence`, `Mapping`) for inputs where appropriate.
264289
- Avoid `cast()` unless you also add an invariant check nearby.
265290

266291
### Dataclasses / models
292+
267293
- Models that cross module boundaries should be:
268-
- explicitly typed
269-
- immutable when possible (`frozen=True`)
270-
- validated at construction (or via a dedicated `validate_*` function) if they are user‑provided.
294+
- explicitly typed
295+
- immutable when possible (`frozen=True`)
296+
- validated at construction (or via a dedicated `validate_*` function) if they are user‑provided.
271297

272298
### Error handling
299+
273300
- Prefer explicit, typed error types over stringly‑typed errors.
274301
- Exit codes are part of the public contract; do not change them without updating tests + docs.
275302

276303
### Determinism requirements (language-level)
304+
277305
- Never iterate over unordered containers (`set`, `dict`) without sorting first when it affects:
278306
hashes, IDs, report ordering, baseline payloads, or UI output.
279307
- Use stable formatting (sorted keys, stable ordering) in JSON output.
280308

281309
### Key PEPs to keep in mind
310+
282311
- PEP 8, PEP 484 (typing), PEP 526 (variable annotations)
283312
- PEP 563 / PEP 649 (annotation evaluation changes across versions) — avoid relying on evaluation timing
284313
- PEP 585 (built-in generics), PEP 604 (X | Y unions)
285314
- PEP 634 (structural pattern matching)
286315
- PEP 612 (ParamSpec) / PEP 646 (TypeVarTuple) — only if it clearly helps, don’t overcomplicate
287316

288-
289-
290317
Prefer these rules:
291318

292319
- **Domain / contracts / enums** live near the domain owner (baseline statuses in baseline domain).
293320
- **Core logic** should not depend on HTML.
294321
- **Render** depends on report model, never the other way around.
295322
- If a module becomes a “god module”, split by:
296-
- model (types)
297-
- io/serialization
298-
- rules/validation
299-
- ui rendering
323+
- model (types)
324+
- io/serialization
325+
- rules/validation
326+
- ui rendering
300327

301328
Avoid deep package hierarchies unless they clearly reduce coupling.
302329

@@ -310,7 +337,10 @@ Avoid deep package hierarchies unless they clearly reduce coupling.
310337
- [ ] `ruff`, `mypy`, `pytest` green.
311338
- [ ] CLI messages remain helpful and stable (don’t break scripts).
312339
- [ ] Reports contain provenance fields and reflect trust model correctly.
340+
- [ ] Golden snapshots were **not** updated just to satisfy failing tests.
341+
- [ ] If any golden snapshot changed, the corresponding contract change is intentional, documented, and approved.
313342

314343
---
315344

316-
If you are an AI agent and something here conflicts with an instruction from a maintainer in the PR/issue thread, **ask for clarification in the thread** and default to this document until resolved.
345+
If you are an AI agent and something here conflicts with an instruction from a maintainer in the PR/issue thread, **ask
346+
for clarification in the thread** and default to this document until resolved.

0 commit comments

Comments
 (0)