Skip to content

Commit fee1818

Browse files
authored
Ship CodeClone 2.0.0b5 with coverage-aware metrics and baseline-honest review surfaces (#21)
## Summary Ship `2.0.0b5` as the next `v2` beta milestone. This release expands the canonical report with adoption, API-surface, and coverage-join layers, tightens cache and baseline-aware runtime behavior, and brings the MCP/HTML/client surfaces into closer agreement with the core contracts. ## Highlights - add canonical `coverage_adoption`, `api_surface`, and `coverage_join` metrics/report layers - add `golden_fixture_paths` to exclude intentional fixture clone groups from health/gates while preserving them as suppressed facts - separate measured coverage hotspots from coverage scope gaps - surface adoption/API/coverage facts across CLI, MCP, HTML, VS Code, Claude Desktop, and Codex plugin flows - make cache profile compatibility API-surface-aware (`Cache 2.5`) and keep warm/cold API behavior honest - stabilize benchmark and CLI baseline-path handling - refine HTML review surfaces, provenance badges, empty states, filters, and mobile behavior - add compact MCP threshold context for empty design checks so agents can tell "quiet" from "just below threshold" ## Validation - `uv run pytest --cov=codeclone --cov-report=term-missing --cov-fail-under=99 -q` - `uv run pre-commit run --all-files` - `uv run --with mkdocs --with mkdocs-material mkdocs build --strict` - MCP service/server tests - VS Code extension checks, tests, and `.vsix` packaging - Claude Desktop bundle checks, tests, and `.mcpb` build - Codex plugin manifest checks and tests - benchmark workflow green with strong warm-cache speedup ## Notes - no baseline update is included in this PR - `coverage_join` remains a current-run external signal, not baseline truth - `golden_fixture_paths` affects health/gates only for fully matching fixture clone groups; suppressed facts remain visible in the canonical report
1 parent bcbaca6 commit fee1818

File tree

142 files changed

+15984
-1427
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

142 files changed

+15984
-1427
lines changed

.github/workflows/codeclone.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ jobs:
1919
runs-on: ubuntu-latest
2020
steps:
2121
- name: Checkout
22-
uses: actions/checkout@v4
22+
uses: actions/checkout@v6.0.2
2323
with:
2424
fetch-depth: 0
2525

.github/workflows/tests.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ jobs:
4040
- name: Run tests
4141
# Smoke CLI tests intentionally disable subprocess coverage collection
4242
# to avoid runner-specific flakiness while keeping parent-process coverage strict.
43-
run: uv run pytest --cov=codeclone --cov-report=term-missing --cov-fail-under=98
43+
run: uv run pytest --cov=codeclone --cov-report=term-missing --cov-fail-under=99
4444

4545
- name: Verify baseline exists
4646
if: ${{ matrix.python-version == '3.13' }}

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,3 +39,4 @@ site/
3939
/.uv-cache/
4040
/package-lock.json
4141
extensions/vscode-codeclone/node_modules
42+
/coverage.xml

AGENTS.md

Lines changed: 90 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -135,6 +135,22 @@ uv run pytest -q tests/test_codex_plugin.py
135135

136136
## 4) Baseline contract (v2, stable)
137137

138+
### Versioned constants (single source of truth)
139+
140+
All schema/version constants live in `codeclone/contracts.py`. **Always read them from code, never copy
141+
from another doc.** Current values (verified at write time):
142+
143+
| Constant | Source | Current value |
144+
|-----------------------------------|------------------------------|---------------|
145+
| `BASELINE_SCHEMA_VERSION` | `codeclone/contracts.py` | `2.1` |
146+
| `BASELINE_FINGERPRINT_VERSION` | `codeclone/contracts.py` | `1` |
147+
| `CACHE_VERSION` | `codeclone/contracts.py` | `2.5` |
148+
| `REPORT_SCHEMA_VERSION` | `codeclone/contracts.py` | `2.8` |
149+
| `METRICS_BASELINE_SCHEMA_VERSION` | `codeclone/contracts.py` | `1.2` |
150+
151+
When updating any doc that mentions a version, re-read `codeclone/contracts.py` first. Do not derive
152+
versions from another document.
153+
138154
### Baseline file structure (canonical)
139155

140156
```json
@@ -144,7 +160,7 @@ uv run pytest -q tests/test_codex_plugin.py
144160
"name": "codeclone",
145161
"version": "X.Y.Z"
146162
},
147-
"schema_version": "2.0",
163+
"schema_version": "2.1",
148164
"fingerprint_version": "1",
149165
"python_tag": "cp313",
150166
"created_at": "2026-02-08T14:20:15Z",
@@ -163,8 +179,9 @@ uv run pytest -q tests/test_codex_plugin.py
163179
### Rules
164180

165181
- `schema_version` is **baseline schema**, not package version.
166-
- Runtime writes baseline schema `2.0`.
167-
- Runtime accepts baseline schema `1.x` and `2.x` for compatibility checks.
182+
- Runtime writes baseline schema `2.1`.
183+
- Runtime accepts baseline schema `1.0` and `2.0``2.1` (governed by
184+
`_BASELINE_SCHEMA_MAX_MINOR_BY_MAJOR` in `codeclone/baseline.py`).
168185
- Compatibility is tied to:
169186
- `fingerprint_version`
170187
- `python_tag`
@@ -358,8 +375,8 @@ Architecture is layered, but grounded in current code (not aspirational diagrams
358375
`codeclone/grouping.py`, `codeclone/scanner.py`) produces normalized structural facts and clone candidates.
359376
- **Domain/contracts layer** (`codeclone/models.py`, `codeclone/contracts.py`, `codeclone/errors.py`,
360377
`codeclone/domain/*.py`) defines typed entities and stable enums/constants used across layers.
361-
- **Persistence contracts** (`codeclone/baseline.py`, `codeclone/cache.py`, `codeclone/metrics_baseline.py`) store
362-
trusted comparison state and optimization state.
378+
- **Persistence contracts** (`codeclone/baseline.py`, `codeclone/cache.py`, `codeclone/cache_io.py`,
379+
`codeclone/metrics_baseline.py`) store trusted comparison state and optimization state.
363380
- **Canonical report + projections** (`codeclone/report/json_contract.py`, `codeclone/report/*.py`) converts analysis
364381
facts to deterministic, contract-shaped outputs.
365382
- **HTML/UI rendering** (`codeclone/html_report.py`, `codeclone/_html_report/*`, `codeclone/_html_*.py`,
@@ -411,8 +428,12 @@ Use this map to route changes to the right owner module.
411428
deterministic.
412429
- `codeclone/baseline.py` — baseline schema/trust/integrity/compatibility contract; all baseline format changes go here
413430
with explicit contract process.
414-
- `codeclone/cache.py` — cache schema/integrity/profile compatibility and serialization; cache remains
431+
- `codeclone/cache.py` — cache schema/status/profile compatibility and high-level serialization policy; cache remains
415432
optimization-only.
433+
- `codeclone/cache_io.py` — IO-layer helpers for the cache: atomic JSON read/write
434+
(`read_json_document`, `write_json_document_atomically`), canonical JSON (`canonical_json`), and
435+
HMAC signing/verification (`sign_cache_payload`, `verify_cache_payload_signature`); attribute these
436+
functions to `cache_io.py`, not `cache.py`.
416437
- `codeclone/report/json_contract.py` — canonical report schema builder/integrity payload; any JSON contract shape
417438
change belongs here.
418439
- `codeclone/report/*.py` (other modules) — deterministic projections/format transforms (
@@ -529,7 +550,7 @@ Policy:
529550
### Public / contract-sensitive surfaces
530551

531552
- CLI flags, defaults, exit codes, and stable script-facing messages.
532-
- Baseline schema/trust semantics/integrity compatibility (`2.0` baseline contract family).
553+
- Baseline schema/trust semantics/integrity compatibility (`BASELINE_SCHEMA_VERSION` contract family).
533554
- Cache schema/status/profile compatibility/integrity (`CACHE_VERSION` contract family).
534555
- Canonical report JSON schema/payload semantics (`REPORT_SCHEMA_VERSION` contract family).
535556
- Documented report projections and their machine/user-facing semantics (HTML/Markdown/SARIF/Text).
@@ -621,7 +642,68 @@ Avoid deep package hierarchies unless they clearly reduce coupling.
621642

622643
---
623644

624-
## 20) Minimal checklist for PRs (agents)
645+
## 20) Agent safety rules
646+
647+
These rules exist because of real incidents in this repo. They are non-negotiable.
648+
649+
### Scope discipline
650+
651+
- Touch only files directly related to your current task.
652+
- Do not "clean up", reformat, or refactor code in files outside your task scope.
653+
- Do not delete functions, classes, blocks, or whole files written by other contributors unless
654+
deletion is the explicit goal of your task.
655+
- If you discover unrelated issues, report them in your final message — do not fix them silently.
656+
- Before starting work, run `git status` and review uncommitted/untracked changes. They may belong
657+
to a parallel agent or to the maintainer; do not delete or overwrite them without explicit approval.
658+
659+
### Documentation hygiene
660+
661+
- Every doc claim about code (schema version, module path, function name, MCP tool count, exit code,
662+
CLI flag) must be verified against the **current** code before writing or editing.
663+
- Always read version constants from `codeclone/contracts.py` (see Section 4 table), never from
664+
another doc.
665+
- When updating a file that mentions schema versions, verify **every** version reference in that
666+
file — not only the one you came to change.
667+
- Do not remove narrative content from docs you did not author. Add or correct only.
668+
- Do not replace a multi-section doc with a "pointer" stub unless the maintainer explicitly asks for it.
669+
- Do not create new `*.md` design specs ("PROPOSED", "FUTURE", "RFC") inside `docs/`. Use the
670+
maintainer's planning channel instead — orphaned specs become stale and misleading.
671+
672+
### Audit completeness
673+
674+
- When the maintainer asks to audit "all" of something, list every file you actually opened in your
675+
final report. Selective audits silently skip the most error-prone files.
676+
- Prefer parallel `Explore` agents partitioned by file group over a single sequential pass —
677+
coverage is the contract, not effort.
678+
679+
### Shared helpers
680+
681+
- HTML/UI helpers (`_html_badges.py`, `_html_css.py`, `_html_js.py`, `_html_escape.py`,
682+
`_html_report/_glossary.py`) are imported, not duplicated locally inside `_html_report/_sections/*`.
683+
If you need a helper that doesn't exist, add it to the shared module.
684+
- Glossary terms used in stat-card labels live in `codeclone/_html_report/_glossary.py`. Adding a
685+
new label without a glossary entry is a contract gap.
686+
687+
### Conflict avoidance
688+
689+
- Do not force-push, `git reset --hard`, or `git checkout --` over uncommitted work without
690+
explicit maintainer approval.
691+
- If your changes conflict with recent commits or other agents' work, rebase or merge cleanly —
692+
never silently drop the other side.
693+
- Never use `--no-verify` to bypass pre-commit hooks; fix the underlying issue.
694+
695+
### Verification before "done"
696+
697+
- A task that touches HTML rendering is not complete until
698+
`pytest tests/test_html_report.py -x -q` is green.
699+
- A task that touches MCP is not complete until
700+
`pytest tests/test_mcp_service.py tests/test_mcp_server.py -x -q` is green.
701+
- A task that touches docs schema/version claims is not complete until you have grep'd the whole
702+
file for *all* version-shaped strings and verified each against `codeclone/contracts.py`.
703+
704+
---
705+
706+
## 21) Minimal checklist for PRs (agents)
625707

626708
- [ ] Change is deterministic.
627709
- [ ] Contracts preserved or versioned.

CHANGELOG.md

Lines changed: 41 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,46 @@
11
# Changelog
22

3-
## [2.0.0b4]
3+
## [2.0.0b5] - 2026-04-16
4+
5+
Expands the canonical contract with adoption, API-surface, and coverage-join layers; clarifies run interpretation
6+
across MCP/HTML/clients; tightens MCP launcher/runtime behavior.
7+
8+
### Contracts, metrics, and review surfaces
9+
10+
- Report schema `2.8`: add `coverage_adoption`, `api_surface`, `coverage_join`, and optional
11+
`clones.suppressed.*` (for `golden_fixture_paths`); separate coverage hotspots vs scope gaps.
12+
- Baselines: clone `2.1`, metrics `1.2`; compact `api_surface` payload (`local_name` on disk, qualnames at runtime);
13+
read-compatible with `2.0` / `1.1`.
14+
- Add public/private visibility classification for public-symbol metrics (no clone/fingerprint changes).
15+
- Add annotation/docstring adoption coverage: parameter, return, public docstrings, explicit `Any`.
16+
- Add opt-in API surface inventory + baseline diff (snapshots, additions, breaking changes).
17+
- Add coverage join (`--coverage`): per-function facts + findings for below-threshold or missing-in-scope functions;
18+
current-run only (not baseline truth, no fingerprint impact).
19+
- Add `golden_fixture_paths`: exclude matching clone groups from health/gates while keeping suppressed facts.
20+
- Add gates: `--min-typing-coverage`, `--min-docstring-coverage`, `--fail-on-typing-regression`,
21+
`--fail-on-docstring-regression`, `--fail-on-api-break`, `--fail-on-untested-hotspots`, `--coverage-min`.
22+
- Surface adoption/API/coverage-join in MCP, CLI Metrics, report payloads, and HTML (Overview + Quality subtab).
23+
- Preserve embedded metrics and optional `api_surface` in unified baselines.
24+
- Cache `2.5`: make analysis-profile compatibility API-surface-aware; invalidate stale non-API warm caches; preserve parameter order; align warm/cold API diffs.
25+
26+
### MCP, HTML, and client interpretation
27+
28+
- Surface effective analysis profile in report meta, MCP summary/triage, and HTML subtitle.
29+
- Add `health_scope`, `focus`, `new_by_source_kind` to MCP summary/triage.
30+
- Make baseline mismatch explicit (python tags + no-valid-baseline signal).
31+
- Surface `Coverage Join` facts and the optional `coverage` MCP help topic in
32+
the VS Code extension when the connected server supports them.
33+
- Prefer workspace-local launchers over `PATH` (Poetry fallback).
34+
- Add `workspace_root` to force project `.venv` selection.
35+
36+
### Safety and maintenance
37+
38+
- Validate `git_diff_ref` as safe single-revision expressions.
39+
- Replace segment digest `repr()` with canonical JSON bytes (determinism).
40+
- Align CI coverage gate (`fail_under = 99`) and refresh `actions/checkout` pin.
41+
- Refresh branch metadata/docs for `2.0.0b5`; update README badge to `89 (B)`.
42+
43+
## [2.0.0b4] - 2026-04-05
444

545
### MCP server
646

CONTRIBUTING.md

Lines changed: 25 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -138,10 +138,10 @@ CodeClone maintains several versioned schema contracts:
138138

139139
| Schema | Current version | Owner |
140140
|------------------|-----------------|-------------------------------------|
141-
| Baseline | `2.0` | `codeclone/baseline.py` |
142-
| Report | `2.1` | `codeclone/report/json_contract.py` |
143-
| Cache | `2.2` | `codeclone/cache.py` |
144-
| Metrics baseline | `1.0` | `codeclone/metrics_baseline.py` |
141+
| Baseline | `2.1` | `codeclone/baseline.py` |
142+
| Report | `2.8` | `codeclone/report/json_contract.py` |
143+
| Cache | `2.4` | `codeclone/cache_io.py` |
144+
| Metrics baseline | `1.2` | `codeclone/metrics_baseline.py` |
145145

146146
Any change to schema shape or semantics requires version review, documentation, and tests.
147147

@@ -209,6 +209,27 @@ uv run pytest -q tests/test_mcp_service.py tests/test_mcp_server.py
209209

210210
---
211211

212+
## Commit Messages
213+
214+
Use the repository's existing **Conventional Commits** style:
215+
216+
- format: `type(scope): imperative summary`
217+
- keep `type` lowercase (`feat`, `fix`, `docs`, `chore`, ...)
218+
- keep the summary short, imperative, and specific to the user-visible change
219+
- use a narrow scope when it helps (`metrics`, `mcp,vscode`, `core,ci`, ...)
220+
- split unrelated changes into separate commits instead of writing one broad summary
221+
222+
Examples from the current history:
223+
224+
- `fix(core,ci): harden git diff validation, make segment digests canonical, and align CI policy`
225+
- `feat(metrics): add adoption and public API baselines with compact schema-aware storage`
226+
- `chore(docs): align AGENTS and contract docs with current code`
227+
228+
If a commit needs extra context, keep the subject line concise and explain the
229+
rest in the commit body.
230+
231+
---
232+
212233
## Code Style
213234

214235
- Python **3.10 – 3.14**

0 commit comments

Comments
 (0)