|
4 | 4 |
|
5 | 5 | ### Overview |
6 | 6 |
|
7 | | -This release improves clone-detection precision and explainability with deterministic |
8 | | -normalization and CFG upgrades, adds segment-level internal clone reporting, refreshes |
9 | | -the HTML report UI, and introduces baseline versioning. |
10 | | - |
11 | | -**Breaking change:** CI workflows that reuse old baselines must regenerate them. |
12 | | - |
13 | | -### Clone Detection Accuracy |
14 | | - |
15 | | -- **Commutative normalization** |
16 | | - Canonicalized operand order for `+`, `*`, `|`, `&`, `^` only for provably safe constant |
17 | | - domains. Symbolic operands are no longer reordered. |
18 | | - |
19 | | -- **Local logical equivalence** |
20 | | - Normalized `not (x in y)` to `x not in y` and `not (x is y)` to `x is not y` without |
21 | | - De Morgan transformations or broader boolean rewrites. |
22 | | - |
23 | | -- **Call-target preservation** |
24 | | - Kept symbolic call targets during normalization to avoid conflating different APIs |
25 | | - (for example, `load_user(...)` vs `delete_user(...)`). |
26 | | - |
27 | | -### CFG Precision |
28 | | - |
29 | | -- **Short‑circuit modeling** |
30 | | - Represented `and`/`or` as micro‑CFGs with explicit branch splits after each operand. |
31 | | - |
32 | | -- **Exception linking** |
33 | | - Linked `try/except` only to statements that may raise (calls, attribute access, indexing, |
34 | | - `await`, `yield from`, `raise`) instead of blanket links. |
35 | | - |
36 | | -### Detection Integrity |
37 | | - |
38 | | -- **Internal CFG marker hardening** |
39 | | - Switched CFG metadata markers to an internal namespace (`__CC_META__::...`) emitted as |
40 | | - synthetic AST names, preventing collisions with user string literals. |
41 | | - |
42 | | -- **Ordered control-flow semantics** |
43 | | - Modeled `break`/`continue` as terminating loop transitions, added correct `for/while ... else` |
44 | | - semantics, preserved `match case` evaluation order, and preserved `except` handler order. |
45 | | - |
46 | | -- **Deterministic traversal order** |
47 | | - Sorted Python file discovery to stabilize processing and report ordering across runs/platforms. |
48 | | - |
49 | | -### Segment‑Level Detection |
50 | | - |
51 | | -- **Window fingerprints** |
52 | | - Added deterministic segment windows inside functions for internal clone discovery. |
53 | | - |
54 | | -- **Candidate generation** |
55 | | - Used an order‑insensitive signature for candidate grouping and a strict segment hash for |
56 | | - final confirmation. Segment matches do not affect baseline or CI failure logic. |
57 | | - |
58 | | -- **Noise reduction (report‑only)** |
59 | | - Merged overlapping segment windows into a single span per function and suppressed |
60 | | - boilerplate-only groups (attribute assignment wiring) with deterministic AST criteria. |
| 7 | +This release improves detection precision, determinism, and auditability, adds |
| 8 | +segment-level reporting, refreshes the HTML report UI, and hardens baseline/cache |
| 9 | +contracts for CI usage. |
| 10 | + |
| 11 | +**Breaking (CI):** baseline contract checks are stricter. Legacy or mismatched baselines |
| 12 | +must be regenerated. |
| 13 | + |
| 14 | +### Detection Engine |
| 15 | + |
| 16 | +- Safe normalization upgrades: local logical equivalence, proven-domain commutative |
| 17 | + canonicalization, and preserved symbolic call targets. |
| 18 | +- Internal CFG metadata markers were moved to the `__CC_META__::...` namespace and emitted |
| 19 | + as synthetic AST names to prevent collisions with user string literals. |
| 20 | +- CFG precision upgrades: short-circuit micro-CFG, selective `try/except` raise-linking, |
| 21 | + loop `break`/`continue` jump semantics, `for/while ... else`, and ordered `match`/`except`. |
| 22 | +- Deterministic traversal and ordering improvements for stable clone grouping/report output. |
| 23 | +- Segment-level internal detection added with strict candidate->hash confirmation; remains |
| 24 | + report-only (not part of baseline/CI fail criteria). |
| 25 | +- Segment report noise reduction: overlapping windows are merged and boilerplate-only groups |
| 26 | + are suppressed using deterministic AST criteria. |
61 | 27 |
|
62 | 28 | ### Baseline & CI |
63 | 29 |
|
64 | | -- Baselines are now **versioned** and include a schema version. |
65 | | -- Mismatched baseline versions **fail fast** and require regeneration. |
66 | | -- Added baseline tamper-evident integrity for v1.3+ files (`generator`, `payload_sha256`) |
67 | | - while keeping legacy baseline behavior as explicit regeneration-required fail-fast. |
68 | | -- Added configurable size guards (`--max-baseline-size-mb`, `--max-cache-size-mb`): |
69 | | - oversized cache is ignored with warning; oversized/invalid/untrusted baseline is ignored |
70 | | - outside gating mode and treated as empty baseline. |
71 | | -- Behavioral hardening (CLI): baseline validation is now an explicit contract |
72 | | - (legacy/version/schema/python/integrity/size states). In `--fail-on-new`/`--ci`, |
73 | | - untrusted baseline states fail fast with deterministic exit codes. |
| 30 | +- Baseline format is versioned (`baseline_version`, `schema_version`) and legacy baselines |
| 31 | + fail fast with regeneration guidance. |
| 32 | +- Added tamper-evident baseline integrity for v1.3+ (`generator`, `payload_sha256`). |
| 33 | +- Added configurable size guards: `--max-baseline-size-mb`, `--max-cache-size-mb`. |
| 34 | +- Behavioral hardening: in normal mode, untrusted baseline states are ignored with warning |
| 35 | + and compared as empty; in `--fail-on-new` / `--ci`, they fail fast with deterministic exit codes. |
74 | 36 |
|
75 | | -**Breaking (CI):** baseline version mismatch now fails hard; CI requires baseline regeneration on upgrade. |
76 | | - |
77 | | -Update the baseline: |
| 37 | +Update baseline after upgrade: |
78 | 38 |
|
79 | 39 | ```bash |
80 | 40 | codeclone . --update-baseline |
81 | 41 | ``` |
82 | 42 |
|
83 | | -### CLI UX (CI) |
84 | | - |
85 | | -- Added `--version` for standard version output. |
86 | | -- Added `--cache-path` (legacy alias: `--cache-dir`) and clarified cache help text. |
87 | | -- Added `--ci` preset (`--fail-on-new --no-color --quiet`). |
88 | | -- Improved `--fail-on-new` output with aggregated counts and clear next steps. |
89 | | -- Added strict report output extension validation (`.html`, `.json`, `.txt`). |
90 | | -- Centralized user-facing CLI strings in `codeclone/ui_messages.py` to keep text contracts |
91 | | - consistent and maintainable. |
92 | | -- Refined Summary output: a single compact table with deterministic metric order and |
93 | | - explicit `Files analyzed` semantics (cache-aware), plus stable compact output for |
94 | | - `--quiet/--ci`. |
95 | | - |
96 | | -### HTML Report UI |
97 | | - |
98 | | -- **Visual refresh** |
99 | | - Introduced a modernized HTML report layout with a sticky top bar and improved spacing. |
100 | | - |
101 | | -- **Interactive tooling** |
102 | | - Added a command palette, keyboard shortcuts, toast notifications, and quick actions |
103 | | - (export, stats, charts, navigation). |
| 43 | +### CLI & Reports |
104 | 44 |
|
105 | | -- **Reporting widgets** |
106 | | - Added a stats dashboard and chart container for high-level clone metrics. |
| 45 | +- Added `--version`, `--cache-path` (legacy alias: `--cache-dir`), and `--ci` preset. |
| 46 | +- Added strict output extension validation for `--html/.html`, `--json/.json`, `--text/.txt`. |
| 47 | +- Summary output was redesigned for deterministic, cache-aware metrics across standard and CI modes. |
| 48 | +- User-facing CLI messages were centralized in `codeclone/ui_messages.py`. |
| 49 | +- HTML/TXT/JSON reports now include consistent provenance metadata (baseline/cache status fields). |
| 50 | +- Clone group/report ordering is deterministic and aligned across HTML/TXT/JSON outputs. |
107 | 51 |
|
108 | | -- **Icon system** |
109 | | - Replaced emoji glyphs with inline SVG icons for consistent rendering and a fully |
110 | | - self-contained UI. |
| 52 | +### HTML UI |
111 | 53 |
|
112 | | -- **Segment reporting** |
113 | | - Added a dedicated “Segment clones” section and summary metric in HTML/TXT/JSON outputs. |
| 54 | +- Refreshed layout with improved navigation and dashboard widgets. |
| 55 | +- Added command palette and keyboard shortcuts. |
| 56 | +- Replaced emoji icons with inline SVG icons. |
| 57 | +- Hardened escaping (text + attribute context) and snippet fallback behavior. |
114 | 58 |
|
115 | | -- **Escaping and snippet resilience** |
116 | | - Hardened HTML escaping for text and attribute contexts, and added a safe fallback when |
117 | | - source snippets are unavailable during report rendering. |
118 | | - |
119 | | -### Cache & Internals |
120 | | - |
121 | | -- Extended cache schema to store segment fingerprints (cache version bump). |
122 | | -- Default cache location moved to `<root>/.cache/codeclone/cache.json` (project‑local). |
123 | | -- Added a legacy cache warning for `~/.cache/codeclone/cache.json` with guidance to |
124 | | - delete it and add `.cache/` to `.gitignore`. |
125 | | -- Strengthened cache integrity handling with constant-time signature checks and explicit |
126 | | - warnings for oversized cache files. |
127 | | -- Added deterministic deep-schema cache entry validation (`stat/units/blocks/segments`); |
128 | | - invalid cache entries are ignored instead of affecting analysis results. |
129 | | - |
130 | | -### Packaging |
131 | | - |
132 | | -- Removed an invalid PyPI classifier from the package metadata. |
133 | | - |
134 | | -### Documentation |
| 59 | +### Cache & Security |
135 | 60 |
|
136 | | -- Updated architecture and CFG documentation to reflect new normalization, CFG, and |
137 | | - segment‑level detection behavior. |
138 | | -- Updated README, SECURITY, and CONTRIBUTING guidance for 1.3.0. |
| 61 | +- Cache default moved to `<root>/.cache/codeclone/cache.json` with legacy path warning. |
| 62 | +- Cache schema was extended to include segment data (`CACHE_VERSION=1.1`). |
| 63 | +- Cache integrity uses constant-time signature checks and deep schema validation. |
| 64 | +- Invalid/oversized cache is ignored deterministically and rebuilt from source. |
| 65 | +- Added security regressions for traversal safety, report escaping, baseline/cache integrity, |
| 66 | + and deterministic report ordering across formats. |
| 67 | +- Fixed POSIX parser CPU guard to avoid lowering `RLIMIT_CPU` hard limit. |
139 | 68 |
|
140 | | -### Testing & Security |
| 69 | +### Documentation & Packaging |
141 | 70 |
|
142 | | -- Expanded security tests (HTML escaping and safety checks). |
143 | | -- Added regression tests for deterministic report ordering across HTML/TXT/JSON, |
144 | | - baseline/cache integrity edge cases, and symlink traversal/loop safety. |
145 | | -- Fixed POSIX parser CPU guard to avoid lowering `RLIMIT_CPU` hard limit, preventing |
146 | | - potential process termination in long CI test sessions. |
| 71 | +- Updated README and docs (`architecture`, `cfg`, `SECURITY`, `CONTRIBUTING`) to reflect |
| 72 | + current contracts and behaviors. |
| 73 | +- Removed an invalid PyPI classifier from package metadata. |
147 | 74 |
|
148 | 75 | --- |
149 | 76 |
|
|
0 commit comments