Skip to content

Commit a328bd1

Browse files
authored
fix(cache): add analysis_profile to cache payload for min-loc/min-stmt compat (v1.3) (#10)
1 parent 43ec09d commit a328bd1

21 files changed

+811
-291
lines changed

CHANGELOG.md

Lines changed: 31 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,32 @@
11
# Changelog
22

3+
## [1.4.3] - 2026-03-03
4+
5+
### Cache Contract
6+
7+
- Cache schema bumped from `v1.2` to `v1.3`.
8+
- Added signed analysis profile to cache payload:
9+
- `payload.ap.min_loc`
10+
- `payload.ap.min_stmt`
11+
- Cache compatibility now requires `payload.ap` to match current CLI analysis thresholds. On mismatch, cache is ignored
12+
with `cache_status=analysis_profile_mismatch` and analysis continues without cache.
13+
14+
### CLI
15+
16+
- CLI now constructs cache context with effective `--min-loc` and `--min-stmt` values, so cache reuse is consistent
17+
with active analysis thresholds.
18+
19+
### Tests
20+
21+
- Added regression coverage for analysis-profile cache mismatch/match behavior in:
22+
- `tests/test_cache.py`
23+
- `tests/test_cli_inprocess.py`
24+
25+
### Contract Notes
26+
27+
- Baseline contract is unchanged (`schema v1.0`, `fingerprint version 1`).
28+
- Report schema is unchanged (`v1.1`); cache metadata adds a new `cache_status` enum value.
29+
330
## [1.4.2] - 2026-02-17
431

532
### Overview
@@ -44,10 +71,10 @@ unchanged.
4471
### Notes
4572

4673
- No changes to:
47-
- detection semantics / fingerprints
48-
- baseline hash inputs (`payload_sha256` semantic payload)
49-
- exit code contract and precedence
50-
- schema versions (baseline v1.0, cache v1.2, report v1.1)
74+
- detection semantics / fingerprints
75+
- baseline hash inputs (`payload_sha256` semantic payload)
76+
- exit code contract and precedence
77+
- schema versions (baseline v1.0, cache v1.2, report v1.1)
5178

5279
---
5380

README.md

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -117,12 +117,12 @@ Full contract details: [`docs/book/06-baseline.md`](docs/book/06-baseline.md)
117117

118118
CodeClone uses a deterministic exit code contract:
119119

120-
| Code | Meaning |
121-
|------|-----------------------------------------------------------------------------|
122-
| `0` | Success — run completed without gating failures |
120+
| Code | Meaning |
121+
|------|-------------------------------------------------------------------------------------------------------------------------------------|
122+
| `0` | Success — run completed without gating failures |
123123
| `2` | Contract error — baseline missing/untrusted, invalid output extensions, incompatible versions, unreadable source files in CI/gating |
124-
| `3` | Gating failure — new clones detected or threshold exceeded |
125-
| `5` | Internal error — unexpected exception |
124+
| `3` | Gating failure — new clones detected or threshold exceeded |
125+
| `5` | Internal error — unexpected exception |
126126

127127
**Priority:** Contract errors (`2`) override gating failures (`3`) when both occur.
128128

@@ -182,7 +182,7 @@ Canonical report contract: [`docs/book/08-report.md`](docs/book/08-report.md)
182182
"cache_path": "/path/to/.cache/codeclone/cache.json",
183183
"cache_used": true,
184184
"cache_status": "ok",
185-
"cache_schema_version": "1.2",
185+
"cache_schema_version": "1.3",
186186
"files_skipped_source_io": 0,
187187
"groups_counts": {
188188
"functions": {
@@ -263,7 +263,8 @@ Canonical report contract: [`docs/book/08-report.md`](docs/book/08-report.md)
263263
Cache is an optimization layer only and is never a source of truth.
264264

265265
- Default path: `<root>/.cache/codeclone/cache.json`
266-
- Schema version: **v1.2**
266+
- Schema version: **v1.3**
267+
- Compatibility includes analysis profile (`min_loc`, `min_stmt`)
267268
- Invalid or oversized cache is ignored with warning and rebuilt (fail-open)
268269

269270
Full contract details: [`docs/book/07-cache.md`](docs/book/07-cache.md)

codeclone/cache.py

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@ class CacheStatus(str, Enum):
3939
VERSION_MISMATCH = "version_mismatch"
4040
PYTHON_TAG_MISMATCH = "python_tag_mismatch"
4141
FINGERPRINT_MISMATCH = "mismatch_fingerprint_version"
42+
ANALYSIS_PROFILE_MISMATCH = "analysis_profile_mismatch"
4243
INTEGRITY_FAILED = "integrity_failed"
4344

4445

@@ -84,15 +85,22 @@ class CacheEntry(TypedDict):
8485
segments: list[SegmentDict]
8586

8687

88+
class AnalysisProfile(TypedDict):
89+
min_loc: int
90+
min_stmt: int
91+
92+
8793
class CacheData(TypedDict):
8894
version: str
8995
python_tag: str
9096
fingerprint_version: str
97+
analysis_profile: AnalysisProfile
9198
files: dict[str, CacheEntry]
9299

93100

94101
class Cache:
95102
__slots__ = (
103+
"analysis_profile",
96104
"cache_schema_version",
97105
"data",
98106
"fingerprint_version",
@@ -112,14 +120,21 @@ def __init__(
112120
*,
113121
root: str | Path | None = None,
114122
max_size_bytes: int | None = None,
123+
min_loc: int = 15,
124+
min_stmt: int = 6,
115125
):
116126
self.path = Path(path)
117127
self.root = _resolve_root(root)
118128
self.fingerprint_version = BASELINE_FINGERPRINT_VERSION
129+
self.analysis_profile: AnalysisProfile = {
130+
"min_loc": min_loc,
131+
"min_stmt": min_stmt,
132+
}
119133
self.data: CacheData = _empty_cache_data(
120134
version=self._CACHE_VERSION,
121135
python_tag=current_python_tag(),
122136
fingerprint_version=self.fingerprint_version,
137+
analysis_profile=self.analysis_profile,
123138
)
124139
self.legacy_secret_warning = self._detect_legacy_secret_warning()
125140
self.cache_schema_version: str | None = None
@@ -164,6 +179,7 @@ def _ignore_cache(
164179
version=self._CACHE_VERSION,
165180
python_tag=current_python_tag(),
166181
fingerprint_version=self.fingerprint_version,
182+
analysis_profile=self.analysis_profile,
167183
)
168184

169185
def _sign_data(self, data: Mapping[str, object]) -> str:
@@ -309,6 +325,28 @@ def _parse_cache_document(self, raw_obj: object) -> CacheData | None:
309325
)
310326
return None
311327

328+
analysis_profile = _as_analysis_profile(payload.get("ap"))
329+
if analysis_profile is None:
330+
self._ignore_cache(
331+
"Cache format invalid; ignoring cache.",
332+
status=CacheStatus.INVALID_TYPE,
333+
schema_version=version,
334+
)
335+
return None
336+
337+
if analysis_profile != self.analysis_profile:
338+
self._ignore_cache(
339+
"Cache analysis profile mismatch "
340+
f"(found min_loc={analysis_profile['min_loc']}, "
341+
f"min_stmt={analysis_profile['min_stmt']}; "
342+
f"expected min_loc={self.analysis_profile['min_loc']}, "
343+
f"min_stmt={self.analysis_profile['min_stmt']}); "
344+
"ignoring cache.",
345+
status=CacheStatus.ANALYSIS_PROFILE_MISMATCH,
346+
schema_version=version,
347+
)
348+
return None
349+
312350
files_obj = payload.get("files")
313351
files_dict = _as_str_dict(files_obj)
314352
if files_dict is None:
@@ -337,6 +375,7 @@ def _parse_cache_document(self, raw_obj: object) -> CacheData | None:
337375
"version": self._CACHE_VERSION,
338376
"python_tag": runtime_tag,
339377
"fingerprint_version": self.fingerprint_version,
378+
"analysis_profile": self.analysis_profile,
340379
"files": parsed_files,
341380
}
342381

@@ -356,6 +395,7 @@ def save(self) -> None:
356395
payload: dict[str, object] = {
357396
"py": current_python_tag(),
358397
"fp": self.fingerprint_version,
398+
"ap": self.analysis_profile,
359399
"files": wire_files,
360400
}
361401
signed_doc = {
@@ -371,6 +411,7 @@ def save(self) -> None:
371411
self.data["version"] = self._CACHE_VERSION
372412
self.data["python_tag"] = current_python_tag()
373413
self.data["fingerprint_version"] = self.fingerprint_version
414+
self.data["analysis_profile"] = self.analysis_profile
374415

375416
except OSError as e:
376417
raise CacheError(f"Failed to save cache: {e}") from e
@@ -508,11 +549,13 @@ def _empty_cache_data(
508549
version: str,
509550
python_tag: str,
510551
fingerprint_version: str,
552+
analysis_profile: AnalysisProfile,
511553
) -> CacheData:
512554
return {
513555
"version": version,
514556
"python_tag": python_tag,
515557
"fingerprint_version": fingerprint_version,
558+
"analysis_profile": analysis_profile,
516559
"files": {},
517560
}
518561

@@ -542,6 +585,22 @@ def _as_str_dict(value: object) -> dict[str, object] | None:
542585
return value
543586

544587

588+
def _as_analysis_profile(value: object) -> AnalysisProfile | None:
589+
obj = _as_str_dict(value)
590+
if obj is None:
591+
return None
592+
593+
if set(obj.keys()) != {"min_loc", "min_stmt"}:
594+
return None
595+
596+
min_loc = _as_int(obj.get("min_loc"))
597+
min_stmt = _as_int(obj.get("min_stmt"))
598+
if min_loc is None or min_stmt is None:
599+
return None
600+
601+
return {"min_loc": min_loc, "min_stmt": min_stmt}
602+
603+
545604
def _decode_wire_file_entry(value: object, filepath: str) -> CacheEntry | None:
546605
obj = _as_str_dict(value)
547606
if obj is None:

codeclone/cli.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -310,6 +310,8 @@ def _main_impl() -> None:
310310
cache_path,
311311
root=root_path,
312312
max_size_bytes=args.max_cache_size_mb * 1024 * 1024,
313+
min_loc=args.min_loc,
314+
min_stmt=args.min_stmt,
313315
)
314316
cache.load()
315317
if cache.load_warning:

codeclone/contracts.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
BASELINE_SCHEMA_VERSION: Final = "1.0"
1515
BASELINE_FINGERPRINT_VERSION: Final = "1"
1616

17-
CACHE_VERSION: Final = "1.2"
17+
CACHE_VERSION: Final = "1.3"
1818
REPORT_SCHEMA_VERSION: Final = "1.1"
1919

2020

docs/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ This directory has two documentation layers.
1919
- Config and defaults: [`docs/book/04-config-and-defaults.md`](book/04-config-and-defaults.md)
2020
- Core pipeline and invariants: [`docs/book/05-core-pipeline.md`](book/05-core-pipeline.md)
2121
- Baseline contract (schema v1): [`docs/book/06-baseline.md`](book/06-baseline.md)
22-
- Cache contract (schema v1.2): [`docs/book/07-cache.md`](book/07-cache.md)
22+
- Cache contract (schema v1.3): [`docs/book/07-cache.md`](book/07-cache.md)
2323
- Report contract (schema v1.1): [`docs/book/08-report.md`](book/08-report.md)
2424

2525
## Interfaces
@@ -44,4 +44,4 @@ This directory has two documentation layers.
4444

4545
- Status enums and typed contracts: [`docs/book/appendix/a-status-enums.md`](book/appendix/a-status-enums.md)
4646
- Schema layouts (baseline/cache/report): [`docs/book/appendix/b-schema-layouts.md`](book/appendix/b-schema-layouts.md)
47-
- Error catalog (contract vs internal): [`docs/book/appendix/c-error-catalog.md`](book/appendix/c-error-catalog.md)
47+
- Error catalog (contract vs internal): [`docs/book/appendix/c-error-catalog.md`](book/appendix/c-error-catalog.md)

docs/book/01-architecture-map.md

Lines changed: 29 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,74 +1,88 @@
11
# 01. Architecture Map
22

33
## Purpose
4+
45
Document the current module boundaries and ownership in the codebase.
56

67
## Public surface
8+
79
Main ownership layers:
10+
811
- Core detection pipeline: scanner → extractor → cfg/normalize → grouping.
912
- Contracts/IO: baseline, cache, CLI validation, exit semantics.
1013
- Report model/serialization: JSON/TXT generation and explainability facts.
1114
- Render layer: HTML rendering and template assets.
1215

1316
## Data model
14-
| Layer | Modules | Responsibility |
15-
| --- | --- | --- |
16-
| Contracts | `codeclone/contracts.py`, `codeclone/errors.py` | Shared schema versions, URLs, exit-code enum, typed exceptions |
17-
| Discovery + parsing | `codeclone/scanner.py`, `codeclone/extractor.py` | Enumerate files, parse AST, extract function/block/segment units |
18-
| Structural analysis | `codeclone/cfg.py`, `codeclone/normalize.py`, `codeclone/blockhash.py`, `codeclone/fingerprint.py`, `codeclone/blocks.py` | CFG, normalization, statement hashes, block/segment windows |
19-
| Grouping + report core | `codeclone/_report_grouping.py`, `codeclone/_report_blocks.py`, `codeclone/_report_segments.py`, `codeclone/_report_explain.py` | Build groups, merge windows, suppress segment noise, compute explainability facts |
20-
| Report serialization | `codeclone/_report_serialize.py`, `codeclone/_cli_meta.py` | Canonical JSON/TXT schema + shared report metadata |
21-
| Rendering | `codeclone/html_report.py`, `codeclone/_html_escape.py`, `codeclone/_html_snippets.py`, `codeclone/templates.py` | HTML-only view layer over report model |
22-
| Runtime orchestration | `codeclone/cli.py`, `codeclone/_cli_args.py`, `codeclone/_cli_paths.py`, `codeclone/_cli_summary.py`, `codeclone/ui_messages.py` | CLI UX, status handling, outputs, error category markers |
17+
18+
| Layer | Modules | Responsibility |
19+
|------------------------|----------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------|
20+
| Contracts | `codeclone/contracts.py`, `codeclone/errors.py` | Shared schema versions, URLs, exit-code enum, typed exceptions |
21+
| Discovery + parsing | `codeclone/scanner.py`, `codeclone/extractor.py` | Enumerate files, parse AST, extract function/block/segment units |
22+
| Structural analysis | `codeclone/cfg.py`, `codeclone/normalize.py`, `codeclone/blockhash.py`, `codeclone/fingerprint.py`, `codeclone/blocks.py` | CFG, normalization, statement hashes, block/segment windows |
23+
| Grouping + report core | `codeclone/_report_grouping.py`, `codeclone/_report_blocks.py`, `codeclone/_report_segments.py`, `codeclone/_report_explain.py` | Build groups, merge windows, suppress segment noise, compute explainability facts |
24+
| Report serialization | `codeclone/_report_serialize.py`, `codeclone/_cli_meta.py` | Canonical JSON/TXT schema + shared report metadata |
25+
| Rendering | `codeclone/html_report.py`, `codeclone/_html_escape.py`, `codeclone/_html_snippets.py`, `codeclone/templates.py` | HTML-only view layer over report model |
26+
| Runtime orchestration | `codeclone/cli.py`, `codeclone/_cli_args.py`, `codeclone/_cli_paths.py`, `codeclone/_cli_summary.py`, `codeclone/ui_messages.py` | CLI UX, status handling, outputs, error category markers |
2327

2428
Refs:
29+
2530
- `codeclone/report.py`
2631
- `codeclone/cli.py:_main_impl`
2732

2833
## Contracts
34+
2935
- Core pipeline does not depend on HTML modules.
3036
- HTML rendering receives already-computed report data/facts.
3137
- Baseline and cache contracts are validated before being trusted.
3238

3339
Refs:
40+
3441
- `codeclone/report.py`
3542
- `codeclone/html_report.py:build_html_report`
3643
- `codeclone/baseline.py:Baseline.load`
3744
- `codeclone/cache.py:Cache.load`
3845

3946
## Invariants (MUST)
47+
4048
- Report serialization is deterministic and schema-versioned.
4149
- UI is render-only and must not recompute detection semantics.
4250
- Status enums are domain-owned in baseline/cache modules.
4351

4452
Refs:
53+
4554
- `codeclone/_report_serialize.py:to_json_report`
4655
- `codeclone/_report_explain.py:build_block_group_facts`
4756
- `codeclone/baseline.py:BaselineStatus`
4857
- `codeclone/cache.py:CacheStatus`
4958

5059
## Failure modes
51-
| Condition | Layer |
52-
| --- | --- |
60+
61+
| Condition | Layer |
62+
|----------------------------------------|---------------------------------------------------|
5363
| Invalid CLI args / invalid output path | Runtime orchestration (`_cli_args`, `_cli_paths`) |
54-
| Baseline schema/integrity mismatch | Baseline contract layer |
55-
| Cache corruption/version mismatch | Cache contract layer (fail-open) |
56-
| HTML snippet read failure | Render layer fallback snippet |
64+
| Baseline schema/integrity mismatch | Baseline contract layer |
65+
| Cache corruption/version mismatch | Cache contract layer (fail-open) |
66+
| HTML snippet read failure | Render layer fallback snippet |
5767

5868
## Determinism / canonicalization
69+
5970
- File iteration and group key ordering are explicit sorts.
6071
- Report serializer uses fixed record layouts and sorted keys.
6172

6273
Refs:
74+
6375
- `codeclone/scanner.py:iter_py_files`
6476
- `codeclone/_report_serialize.py:GROUP_ITEM_LAYOUT`
6577

6678
## Locked by tests
79+
6780
- `tests/test_report.py::test_report_json_compact_v11_contract`
6881
- `tests/test_html_report.py::test_html_report_uses_core_block_group_facts`
69-
- `tests/test_cache.py::test_cache_v12_uses_relpaths_when_root_set`
82+
- `tests/test_cache.py::test_cache_v13_uses_relpaths_when_root_set`
7083
- `tests/test_cli_unit.py::test_argument_parser_contract_error_marker_for_invalid_args`
7184

7285
## Non-guarantees
86+
7387
- Internal module split may change in v1.x if public contracts are preserved.
7488
- Import tree acyclicity is a policy goal, not currently enforced by tooling.

0 commit comments

Comments
 (0)