2.0.1 is a focused stability release for dead-code precision and cache/report
contract parity after the 2.0 line.
- Add framework-aware runtime reachability for dead-code analysis: FastAPI/Starlette
routes and
Annotated[..., Depends/Security(...)]dependencies, Django URL patterns, Dependency Injector providers, Typer/Click commands, Celery tasks, top-level__all__exports, package entry points, and Pydantic validator/serializer hooks. Supported registrations suppress false dead-code findings without framework execution or name-only heuristics. - Treat
typing.Protocolandtyping_extensions.Protocoldeclarations, including genericProtocol[T], as type-only contracts so structural interfaces do not produce false-positive dead-code findings. - Show a one-time interactive CLI migration note for projects upgrading from the 2.0.0 line when the refined reachability model may reduce dead-code findings.
- Bump cache schema to
2.7and report schema to2.11to carry reachability facts for cold/warm parity and report explainability.
2.0.0 promotes the completed 2.0 release line to the stable public contract.
- Mark the Python package as stable (
2.0.0) while keeping the established baseline, cache, report, and metrics baseline schemas unchanged. - Make stable install guidance the default across README, docs, MCP guides, and local integration surfaces; prerelease installs remain available only as explicit version pins.
- Align VS Code, Claude Desktop, and Codex integration metadata with the final CodeClone 2.0 MCP package.
- Preserve the 2.0 behavior set: canonical package layout, adaptive dependency depth profiling, Coverage Join, report-only Security Surfaces, read-only MCP, and native IDE/agent projections.
2.0.0b7 is a beta hotfix for packaging-only issues found after the 2.0.0b6 publish.
- Constrain the optional MCP extra to
httpx>=0.27.1,<1so prerelease install flows such asuv tool install --pre "codeclone[mcp]"do not resolve incompatiblehttpx 1.0.dev*builds through the upstream MCP dependency graph. - Pin the preview VS Code extension packaging tool to
@vscode/vsce@2.25.0, removing the vulnerable transitiveuuid<14chain frompackage-lock.jsonwhile preserving.vsixpackaging. - Keep local pre-commit runs stable after package builds by letting mypy use the configured source roots and ignoring
generated
build/andsite/artifacts.
The global package refactor lands here: the entire runtime moves onto the
canonical module layout and legacy shims are removed for good. On top of that,
dependency-depth scoring is replaced with an adaptive project-relative model,
and the report/cache contracts advance to surface the new depth profile and the
report-only security_surfaces layer.
- Move the runtime fully onto the canonical package layout:
main+surfaces/cli,surfaces/mcp,core,analysis,baseline,cache,contracts,report/document,report/renderers, andreport/html. - Remove remaining legacy root shims and stale compatibility modules in favor of direct canonical imports.
- Remove stale deleted-file cache entries and trim post-refactor import tails that were inflating dependency depth and clone pressure.
- Bump report schema to
2.10and cache schema to2.6for additive dependency depth profile fields andsecurity_surfacesfacts; keep clone baseline schema2.1and metrics-baseline schema1.2unchanged. - Preserve deterministic contracts and read-only MCP semantics across the new layout.
- Replace the old fixed dependency-depth penalty (
max_depth > 8) with an adaptive internal-graph profile based onavg_depth,p95_depth, andmax_depth. - Keep dependency cycles as the hard signal; treat acyclic depth as adaptive pressure relative to the project's own dependency profile.
- Limit dependency-depth scoring to the internal module graph instead of external imports such as
typingorargparse. - Surface the dependency depth profile in the canonical report, HTML Dependencies tab, and CLI/CI summaries.
- Add
metrics.families.security_surfaces: a report-only exact inventory of security-relevant capability surfaces and trust-boundary code. - Surface compact
security_surfacesfacts in canonical report JSON, CLI Metrics, HTML Quality, text/markdown projections, and MCP summaries /metrics_detail. - Keep the layer honest: no vulnerability claims, no score impact, no gates, no SARIF security findings, and no baseline truth.
- Refresh AGENTS, docs/book, and changelog content for the b6 package layout and report schema
2.10. - Tighten preview client metadata and install guidance for VS Code, Claude Desktop, and Codex.
- Replace the Codex plugin shell snippet with a repo-local shell-free launcher, and parallelize VS Code post-run MCP artifact hydration.
- Add a quiet one-time VS Code extension hint in interactive VS Code terminals, tracked per CodeClone version next to the resolved project cache path.
Expands the canonical contract with adoption, API-surface, and coverage-join layers; clarifies run interpretation across MCP/HTML/clients; tightens MCP launcher/runtime behavior.
- Report schema
2.8: addcoverage_adoption,api_surface,coverage_join, and optionalclones.suppressed.*(forgolden_fixture_paths); separate coverage hotspots vs scope gaps. - Baselines: clone
2.1, metrics1.2; compactapi_surfacepayload (local_nameon disk, qualnames at runtime); read-compatible with2.0/1.1. - Add public/private visibility classification for public-symbol metrics (no clone/fingerprint changes).
- Add annotation/docstring adoption coverage: parameter, return, public docstrings, explicit
Any. - Add opt-in API surface inventory + baseline diff (snapshots, additions, breaking changes).
- Add coverage join (
--coverage): per-function facts + findings for below-threshold or missing-in-scope functions; current-run only (not baseline truth, no fingerprint impact). - Add
golden_fixture_paths: exclude matching clone groups from health/gates while keeping suppressed facts. - Add gates:
--min-typing-coverage,--min-docstring-coverage,--fail-on-typing-regression,--fail-on-docstring-regression,--fail-on-api-break,--fail-on-untested-hotspots,--coverage-min. - Surface adoption/API/coverage-join in MCP, CLI Metrics, report payloads, and HTML (Overview + Quality subtab).
- Preserve embedded metrics and optional
api_surfacein unified baselines. - Cache
2.5: make analysis-profile compatibility API-surface-aware; invalidate stale non-API warm caches; preserve parameter order; align warm/cold API diffs.
- Surface effective analysis profile in report meta, MCP summary/triage, and HTML subtitle.
- Add
health_scope,focus,new_by_source_kindto MCP summary/triage. - Make baseline mismatch explicit (python tags + no-valid-baseline signal).
- Surface
Coverage Joinfacts and the optionalcoverageMCP help topic in the VS Code extension when the connected server supports them. - Prefer workspace-local launchers over
PATH(Poetry fallback). - Add
workspace_rootto force project.venvselection.
- Validate
git_diff_refas safe single-revision expressions. - Replace segment digest
repr()with canonical JSON bytes (determinism). - Align CI coverage gate (
fail_under = 99) and refreshactions/checkoutpin. - Refresh branch metadata/docs for
2.0.0b5; update README badge to89 (B).
- Add
help(topic=...)tool for workflow guidance, baseline semantics, analysis profile, and review-state routing (tool count: 20 → 21). - Add
analysis_profilehelp topic for explicit conservative-first / deeper-review threshold guidance. - Enrich
_SERVER_INSTRUCTIONSwith triage-first workflow, budget-aware drill-down, and conservative-first threshold guidance so MCP-capable clients receive structured behavioral context on connect. - Optimize MCP payloads: short finding IDs (sha256-based for block clones), compact
derivedsection projection, boundedmetrics_detailwith pagination. - Fix MCP initialize metadata so
serverInfo.versionreports the CodeClone package version rather than the underlyingmcpruntime version.
- Bump canonical report schema to
2.3. - Add
metrics.overloaded_modules— report-only module-hotspot ranking by size, complexity, and coupling pressure. - Surface Overloaded Modules across JSON, text/markdown, HTML, and MCP without affecting findings, health, or gates.
- Normalize the canonical family name and MCP/report output to
overloaded_modules;god_modulesremains accepted as a read-only MCP input alias during transition.
- Align CLI and HTML scope summaries with canonical inventory totals.
- Redesign Overview tab: Executive Summary becomes 2-column (Issue Breakdown + Source Breakdown) with scan scope in the section subtitle; Overloaded Modules section replaces the earlier stretched module-hotspot layout.
- Add Health Score chapter: scoring inputs, report-only layers, phased expansion policy.
- Document that future releases may lower scores due to broader scoring model, not only worse code.
- Add VS Code extension (
codeclone-mcpclient) with baseline-aware triage, source drill-down, Explorer decorations, and HTML-report bridging. - Add conservative, deeper-review, and custom analysis profiles to the VS Code extension and pass them through to MCP.
- Add limited Restricted Mode: onboarding works in untrusted workspaces, analysis stays gated until trust is granted.
- Add Node unit tests, extension-host smoke tests, and
.vsixpackaging. - Tighten the VS Code extension to current VS Code UX guidance: one primary editor action, titled Quick Picks,
per-view icons, non-button tree details, and a hard minimum local CodeClone version gate (
>= 2.0.0b4). - Add Claude Desktop
.mcpbbundle wrapper for the localcodeclone-mcplauncher with pre-loaded review instructions, explicit launcher settings, platform auto-discovery (macOS, Linux, Windows), local-stdio enforcement, signal forwarding, and deterministic package build smoke. - Add a native Codex plugin with repo-local discovery metadata, bundled
codeclone-mcpconfig, pre-loaded instructions, and two skills: conservative-first full review and quick hotspot discovery.
- Extract shared
_json_iomodule for deterministic JSON serialization across baseline, cache, and report paths. - Remove low-signal structural clone noise surfaced by stricter analysis passes without touching golden fixture debt.
2.0.0b3 is the release where CodeClone stops looking like "a strong analyzer with extras" and starts looking like a coherent platform: canonical-report-first, agent-facing, CI-native, and product-grade.
- Re-license source code to MPL-2.0 while keeping documentation under MIT.
- Ship dual
LICENSE/LICENSE-docsfiles and sync SPDX headers.
- Add optional
codeclone[mcp]extra withcodeclone-mcplauncher (stdioandstreamable-http). - Introduce a read-only MCP surface with 20 tools, fixed resources, and run-scoped URIs for analysis, changed-files review, run comparison, findings / hotspots / remediation, granular checks, and gate preview.
- Add bounded run retention (
--history-limit),--allow-remoteguard, and rejectcache_policy=refreshto preserve read-only semantics. - Optimize MCP payloads for agents with short ids, compact summaries/cards, bounded
metrics_detail, and slim changed-files / compare-runs responses — without changing the canonical report contract. - Make MCP explicitly triage-first and budget-aware: clients are guided toward summary/triage → hotspots /
check_*→ single-finding drill-down instead of broad early listing. - Add
cache.freshnessmarker andget_production_triage/codeclone://latest/triagefor compact production-first overview. - Improve run-comparison honesty:
compare_runsnow reportsmixed/incomparable, andclones_onlyruns surfacehealth: unavailableinstead of placeholder values. - Harden repository safety: MCP analysis now requires an absolute repository root and rejects relative roots like
.to avoid analyzing the wrong directory. - Fix hotlist key resolution for
production_hotspotsandtest_fixture_hotspots. - Bump cache schema to
2.3(stale metric entries rebuilt, not reused).
- Bump canonical report schema to
2.2. - Add canonical
meta.analysis_thresholds.design_findingsprovenance and move threshold-aware design findings fully into the canonical report, so MCP and HTML read the same design-finding universe. - Add
derived.overview.directory_hotspotsand render it in the HTML Overview tab asHotspots by Directory.
- Add
--changed-only,--diff-against, and--paths-from-git-difffor changed-scope review and gating with first-class summary output.
- Stabilize
primaryLocationLineHash(line numbers excluded), add run-uniqueautomationDetails.id/startTimeUtc, set explicitkind: "fail", and move ancillary fields toproperties.
- Add
Hotspots by Directoryto the Overview tab, surfacing directory-level concentration forall,clones, and low-cohesion findings with scope-aware badges and compact counts. - Add IDE picker (PyCharm, IDEA, VS Code, Cursor, Fleet, Zed) with persistent selection.
- Add clickable file-path deep links across all tabs and stable
finding-{id}anchors.
- Ship Composite Action v2 with configurable quality gates, SARIF upload to Code Scanning, and PR summary comments.
- Upgrade requests (dev dep) to 2.33.0 for extract_zipped_paths security fix (CVE-2026-25645)
- Fix page-level horizontal scrolling in wide table tabs by constraining overflow to local table wrappers (#14).
- Fix mobile header brand block layout on narrow viewports (#15).
- Make mobile navigation tabs sticky and horizontally scrollable with scroll-shadow affordance.
- Keep Overview KPI micro-badges inside cards at extreme browser/mobile widths.
- Restyle Report Provenance summary badges to match the card-style badge language used across the report.
Major upgrade: CodeClone evolves from a structural clone detector into a baseline-aware code-health and CI governance tool for Python.
- Stage-based pipeline (
pipeline.py): discovery → processing → analysis → reporting → gating. - Domain layers:
models.py,metrics/,report/,grouping.py. - Baseline schema
2.0, report schema2.1, cache schema2.2;fingerprint_versionremains1.
- Seven health dimensions: clones, complexity, coupling, cohesion, dead code, dependencies, coverage.
- Piecewise clone scoring curve: mild penalty below 5% density, steep 5–20%, aggressive above 20%.
- Dimension weights: clones 25%, complexity 20%, cohesion 15%, coupling 10%, dead code 10%, dependencies 10%, coverage 10%.
- Grade bands: A ≥90, B ≥75, C ≥60, D ≥40, F <40.
- Lowered function-level
--min-locfrom 15 to 10 (configurable via CLI/pyproject.toml). - Lowered block fragment gate from loc≥40/stmt≥10 to loc≥20/stmt≥8.
- Lowered segment fragment gate from loc≥30/stmt≥12 to loc≥20/stmt≥10.
- All six thresholds configurable via
[tool.codeclone]inpyproject.toml.
- Conservative dead-code detector: skips tests, dunders, visitors, protocol stubs.
- Module-level PEP 562 hooks (
__getattr__,__dir__) are treated as non-actionable dead-code candidates. - Exact qualname-based liveness with import-alias resolution.
- Canonical inline suppression syntax:
# codeclone: ignore[dead-code]on declarations. - Structural finding families:
duplicated_branches,clone_guard_exit_divergence,clone_cohort_drift.
- Config from
pyproject.tomlunder[tool.codeclone]; precedence: CLI > pyproject.toml > defaults. - Optional-value report flags:
--html,--json,--md,--sarif,--textwith deterministic default paths. --open-html-report,--timestamped-report-paths,--cipreset.- Explicit
--no-progress/--progress,--no-color/--colorflag pairs.
- Overview: KPI grid with health gauge (baseline delta arc), Executive Summary (issue breakdown + source breakdown), Health Profile radar chart.
- KPI cards show baseline-aware tone:
✓ baselinedpill when all items are accepted debt,+Nred badge for regressions. - Get Badge modal: grade-only and score+grade variants, shields.io preview, Markdown/HTML embeds, copy feedback.
- Report Provenance modal with section cards, SVG icons, boolean badges.
- Responsive layout with dark/light theme toggle and system theme detection.
- Unified baseline flow: clone keys + optional metrics in one file.
- Metrics snapshot integrity via
meta.metrics_payload_sha256. - Report contract: canonical
meta/inventory/findings/metrics+ derivedsuggestions/overview+integrity. - SARIF:
%SRCROOT%anchoring,baselineState, rich rule metadata. - Cache compatibility now keys off the full six-threshold analysis profile (function + block + segment thresholds), not only the top-level function gate.
- Unified AST collection pass (merged 3 separate walks).
- Suppression fast-path: skip tokenization when
codeclone:absent. - Cache dirty flag: skip
save()on warm path when nothing changed. - Adaptive multiprocessing, batch statement hashing, deferred HTML import.
- MkDocs site with Material theme and GitHub Pages workflow.
- Live sample reports (HTML, JSON, SARIF).
- PyPI-facing README now uses published docs URLs instead of repo-relative doc links.
- Package metadata stays explicitly beta (
2.0.0b1,Development Status :: 4 - Beta). pyproject.tomlmoved to SPDX-stylelicense = "MIT"andproject.license-filesfor modern setuptools builds without release-time deprecation warnings.
- Exit codes unchanged:
0/2/3/5. - Fingerprint contract unchanged:
BASELINE_FINGERPRINT_VERSION = "1". - Coverage gate:
>=99%.
- Backported report hot-path optimizations from
2.0.0b1to the1.4.xline:- file snippets now reuse cached full-file lines and slice ranges without repeated full-file scans
- Pygments modules are loaded once per importer identity instead of re-importing for each snippet
- Optimized block explainability range stats:
- replaced repeated full
ast.walk()scans per range with a per-file statement index +bisectwindow lookup
- replaced repeated full
- Preserved existing golden/contract behavior for
1.4.xand kept report output semantics unchanged while improving runtime overhead.
- No baseline/cache/report schema changes.
- No clone detection or fingerprint semantic changes.
- Cache schema bumped from
v1.2tov1.3. - Added signed analysis profile to cache payload:
payload.ap.min_locpayload.ap.min_stmt
- Cache compatibility now requires
payload.apto match current CLI analysis thresholds. On mismatch, cache is ignored withcache_status=analysis_profile_mismatchand analysis continues without cache.
- CLI now constructs cache context with effective
--min-locand--min-stmtvalues, so cache reuse is consistent with active analysis thresholds.
- Added regression coverage for analysis-profile cache mismatch/match behavior in:
tests/test_cache.pytests/test_cli_inprocess.py
- Baseline contract is unchanged (
schema v1.0,fingerprint version 1). - Report schema is unchanged (
v1.1); cache metadata adds a newcache_statusenum value.
This patch release is a maintenance update. Determinism remains guaranteed: reports are stable and ordering is unchanged.
process_file()now uses a singleos.stat()call to obtain both size (size guard) andst_mtime_ns/st_size(file stat signature), removing a redundantos.path.getsize()call.- Discovery logic was deduplicated by extracting
_discover_files(); quiet/non-quiet behavior differs only by UI status wrapper, not by semantics or filtering. - Cache path wiring now precomputes
wire_mapso_wire_filepath_from_runtime()is evaluated once per key.
extract_blocks()andextract_segments()accept optionalprecomputed_hashes. When provided, they reuse hashes instead of recomputing.- The extractor computes function body hashes once and passes them to both block and segment extraction when both analyses run for the same function.
iter_py_files()now filters candidates before sorting, so only valid candidates are sorted. The final order remains deterministic and equivalent to previous behavior.
precomputed_hashestype strengthened:list[str] | None→Sequence[str] | None(read-only intent in the type contract).- Added
assert len(precomputed_hashes) == len(body)in bothextract_blocks()andextract_segments()to catch mismatched inputs early (development-time invariant).
- Byte-identical JSON reports verified across repeated runs; differences, when present, are limited to volatile/provenance meta fields (e.g., cache status/path, timestamps), while semantic payload remains stable.
- Unit tests updated to mock
os.statinstead ofos.path.getsizewhere applicable (test_process_file_stat_error,test_process_file_size_limit).
- No changes to:
- detection semantics / fingerprints
- baseline hash inputs (
payload_sha256semantic payload) - exit code contract and precedence
- schema versions (baseline v1.0, cache v1.2, report v1.1)
- Semantic summary colors: clone counts →
bold yellow, file metrics → neutralbold - Phase separator, bold report paths, "Done in X.Xs" timing line
- HiDPI chart canvas, hit-line markers with Pygments, cross-browser
<select> - Platform-aware shortcut labels (
⌘/Ctrl+), color-coded section borders - Compact code lines, proper tab-bar for novelty filter, polished transitions
- Rounded-rect badges (
6px), tighter card radii (10px), cleaner empty states
This release stabilizes the baseline contract for long-term CI reuse without changing clone-detection semantics. Key improvements include baseline schema standardization, enhanced cache efficiency, and hardened IO/contract behavior for CI environments.
Stable v1 Schema
- Baseline now uses stable v1 schema with strict top-level
meta+clonesobjects - Compatibility gated by
schema_version,fingerprint_version, andpython_tag(independent of package patch/minor version) - Trust validation requires
meta.generator.nameto becodeclone - Legacy 1.3 baseline layouts treated as untrusted with explicit regeneration guidance
Integrity & Hash Calculation
- Baseline integrity uses canonical
payload_sha256over semantic payload (functions,blocks,fingerprint_version,python_tag) - Intentionally excluded from
payload_sha256:schema_version(compatibility gate only)meta.generator.name(trust gate only)meta.generator.versionandmeta.created_at(informational only)
- Hash inputs remain stable across future 1.x patch/minor releases
- Baseline regeneration required only when
fingerprint_versionorpython_tagchanges
Migration Notes
- Early 1.4.0 development snapshots (before integrity canonicalization fix) may require one-time
codeclone . --update-baseline - After this one-time update, baselines are stable for long-term CI use
Atomic Operations
- Baseline writes use atomic
*.tmp+os.replacepattern (same filesystem requirement) - Configurable size guards:
--max-baseline-size-mb--max-cache-size-mb
Baseline Trust Model
- Normal mode: Untrusted baseline triggers warning and comparison against empty baseline
- CI preset (
--ci): Untrusted baseline causes fast-fail with exit code2 - Deterministic behavior ensures predictable CI outcomes
Exit Code Contract (explicit and stable)
0- Success2- Contract error (unreadable files, untrusted baseline, integrity failures)3- Gating failure (new clones, threshold violations)5- Internal error
Exit Code Priority
- Contract errors (exit
2) override gating failures (exit3) when both conditions present
CI/Gating Modes
- In CI/gating modes (
--ci,--fail-on-new,--fail-threshold):- Unreadable or decode-failed source files treated as contract errors (exit
2) - Prevents incomplete analysis from passing CI checks
- Unreadable or decode-failed source files treated as contract errors (exit
Error Handling
- Standardized internal error UX:
INTERNAL ERRORwith reason and actionable next steps - New
--debugflag (alsoCODECLONE_DEBUG=1) includes traceback + runtime environment details - CLI help now includes canonical exit-code descriptions plus
Repository/Issues/Docslinks
JSON Report (v1.1 Schema)
- Compact deterministic layout with top-level
meta+files+groups - Explicit
group_item_layoutfor array-based group records - New
groups_splitstructure withnew/knownkeys per section - Deterministic
meta.groups_countsaggregates - Legacy alias sections removed (
function_clones,block_clones,segment_clones)
TXT Report (aligned to report meta v1.1)
- Normalized metadata/order as stable contract
- Explicit section metrics:
locfor functions,sizefor blocks/segments - Sections split into
(NEW)and(KNOWN)for functions/blocks/segments - With untrusted baseline:
(KNOWN)sections empty, all groups in(NEW)
HTML Report (aligned to report meta v1.1)
- New baseline split controls:
New duplicates/Known duplicates - Consistent filtering behavior across report types
- Block explainability now core-owned (
block_group_facts) - Expanded
Report Provenancesection displays full meta information block
Cross-Format Metadata
- All formats (HTML/TXT/JSON) now include:
baseline_payload_sha256andbaseline_payload_sha256_verifiedfor audit traceability- Cache contract fields:
cache_schema_version,cache_status,cache_used - Baseline audit fields and trust status
- Added the contract documentation book
docs/book/.
Baseline Contract Testing
- Expanded matrix coverage:
- Legacy format handling
- Type/shape validation
- Compatibility mismatch scenarios
- Integrity failure cases
- Canonical hash determinism
Golden Snapshot Testing
- New detector golden snapshot fixture with canonical runtime policy
- Golden assertions run on
cp313(consistency) - Full invariant suite maintains matrix-wide coverage
- Golden tests use same core
python_tagsource as CLI/baseline checks (prevents cross-layer drift)
Version 1.4.0 establishes a stable baseline/CI contract but revealed internal structure needs cleanup. Version 1.5 will focus on architecture refactoring for maintainability and orchestration, with strict constraints:
No changes to:
- Detection semantics
- Fingerprint algorithms
- Baseline hash inputs
- Determinism guarantees
The 1.4.0 contract remains stable and reliable for long-term CI integration.
This release improves detection precision, determinism, and auditability, adds segment-level reporting, refreshes the HTML report UI, and hardens baseline/cache contracts for CI usage.
Breaking (CI): baseline contract checks are stricter. Legacy or mismatched baselines must be regenerated.
- Safe normalization upgrades: local logical equivalence, proven-domain commutative canonicalization, and preserved symbolic call targets.
- Internal CFG metadata markers were moved to the
__CC_META__::...namespace and emitted as synthetic AST names to prevent collisions with user string literals. - CFG precision upgrades: short-circuit micro-CFG, selective
try/exceptraise-linking, loopbreak/continuejump semantics,for/while ... else, and orderedmatch/except. - Deterministic traversal and ordering improvements for stable clone grouping/report output.
- Segment-level internal detection added with strict candidate->hash confirmation; remains report-only (not part of baseline/CI fail criteria).
- Segment report noise reduction: overlapping windows are merged and boilerplate-only groups are suppressed using deterministic AST criteria.
- Baseline format is versioned (
baseline_version,schema_version) and legacy baselines fail fast with regeneration guidance. - Added tamper-evident baseline integrity for v1.3+ (
generator,payload_sha256). - Added configurable size guards:
--max-baseline-size-mb,--max-cache-size-mb. - Behavioral hardening: in normal mode, untrusted baseline states are ignored with warning
and compared as empty; in
--fail-on-new/--ci, they fail fast with deterministic exit codes.
Update baseline after upgrade:
codeclone . --update-baseline- Added
--version,--cache-path(legacy alias:--cache-dir), and--cipreset. - Added strict output extension validation for
--html/.html,--json/.json,--text/.txt. - Summary output was redesigned for deterministic, cache-aware metrics across standard and CI modes.
- User-facing CLI messages were centralized in
codeclone/ui_messages.py. - HTML/TXT/JSON reports now include consistent provenance metadata (baseline/cache status fields).
- Clone group/report ordering is deterministic and aligned across HTML/TXT/JSON outputs.
- Refreshed layout with improved navigation and dashboard widgets.
- Added command palette and keyboard shortcuts.
- Replaced emoji icons with inline SVG icons.
- Hardened escaping (text + attribute context) and snippet fallback behavior.
- Cache default moved to
<root>/.cache/codeclone/cache.jsonwith legacy path warning. - Cache schema moved to compact signed payload format (
CACHE_VERSION=1.2) with relative file keys and fixed-array entries for faster IO and smaller files. - Cache integrity uses constant-time signature checks and deep schema validation.
- Legacy
.cache_secretis now treated as obsolete and triggers an explicit cleanup warning. - Invalid/oversized cache is ignored deterministically and rebuilt from source.
- Added security regressions for traversal safety, report escaping, baseline/cache integrity, and deterministic report ordering across formats.
- Fixed POSIX parser CPU guard to avoid lowering
RLIMIT_CPUhard limit.
- Updated README and docs (
architecture,cfg,SECURITY,CONTRIBUTING) to reflect current contracts and behaviors. - Removed an invalid PyPI classifier from package metadata.
This release focuses on security hardening, robustness, and long-term maintainability. No breaking API changes were introduced.
The goal of this release is to provide users with a safe, deterministic, and CI-friendly tool suitable for security-sensitive and large-scale environments.
-
Path Traversal Protection Implemented strict path validation to prevent scanning outside the project root or accessing sensitive system directories, including macOS
/privatepaths. -
Cache Integrity Protection Added HMAC-SHA256 signing for cache files to prevent cache poisoning and detect tampering.
-
Parser Safety Limits Introduced AST parsing time limits to mitigate risks from pathological or adversarial inputs.
-
Resource Exhaustion Protection Enforced a maximum file size limit (10MB) and a maximum file count per scan to prevent excessive memory or CPU usage.
-
Structured Error Handling Introduced a dedicated exception hierarchy (
ParseError,CacheError, etc.) and replaced broad exception handling with graceful, user-friendly failure reporting.
-
Optimized AST Normalization Replaced expensive
deepcopyoperations with in-place AST normalization, significantly reducing CPU and memory overhead. -
Improved Memory Efficiency Added an LRU cache for file reading and optimized string concatenation during fingerprint generation.
-
HTML Report Memory Bounds HTML reports now read only the required line ranges instead of entire files, reducing peak memory usage on large codebases.
-
Strict Type Safety Migrated all optional typing to Python 3.10+
| Nonesyntax and achieved 100%mypystrict compliance. -
Modular CFG Design Split CFG data structures and builder logic into separate modules (
cfg_model.pyandcfg.py) for improved clarity and extensibility. -
Template Extraction Extracted HTML templates into a dedicated
templates.pymodule. -
Added a
py.typedmarker for downstream type checkers. -
Added
__slots__to performance-critical classes to reduce per-object memory overhead.
- Added a sequential execution fallback when process pools are unavailable (for example, in restricted or sandboxed environments).
- Emit clear, user-visible warnings when cache validation fails instead of silently ignoring corrupted state.
- Hardened HTML report template to safely embed JavaScript template literals and aligned it with linting requirements.
- Expanded unit and integration test coverage across the CLI, CFG construction, cache handling, scanner, and HTML reporting paths.
- Added security regression tests for dot-dot traversal and symlinked sensitive directories.
- Tightened cache mismatch assertions to verify full state reset.
- Achieved and enforced 98%+ line coverage, with coverage configuration added to
pyproject.toml. - Added GitHub Actions workflow with Python 3.10–3.14 test matrix, including
ruffandmypychecks. - CI baseline enforcement now runs on a single pinned Python version to avoid AST dump differences across interpreter versions.
Due to inherent differences in Python’s AST between interpreter versions, baseline generation and verification must be performed using the same Python version.
The baseline file now stores the Python version (major.minor) used during generation.
When running with --fail-on-new, codeclone verifies that the current interpreter version
matches the baseline and exits with code 2 if they differ.
This design ensures deterministic and reproducible clone detection results while preserving support for Python 3.10–3.14 across the test matrix.
-
CFG Exception Handling Fixed incorrect control-flow linking for
try/exceptblocks. -
Pattern Matching Support Added missing structural handling for
match/casestatements in the CFG. -
Block Detection Scaling Made
MIN_LINE_DISTANCEdynamic based on block size to improve clone detection accuracy across differently sized functions.
-
CLI Arguments Renamed output flags for brevity and consistency:
--json-out→--json--text-out→--text--html-out→--html--cache→--cache-dir
-
Baseline Behavior
- The default baseline file location changed from
~/.config/codeclone/baseline.jsonto./codeclone.baseline.json. - The CLI now warns if a baseline file is expected but missing (unless
--update-baselineis used).
- The default baseline file location changed from
-
Detection Engine
- Deep CFG analysis for
try/except/finally,with/async with, andmatch/case(Python 3.10+) statements. - Normalization for augmented assignments (
x += 1vsx = x + 1).
- Deep CFG analysis for
-
Rich Output
- Color-coded status messages.
- Progress indicators for long-running tasks.
- Formatted summary tables.
-
CI/CD Improvements
- Clearer argument grouping in
--helpoutput.
- Clearer argument grouping in
-
Baseline
- Safer JSON loading.
- Improved typing and cleaner construction API.
-
Cache
- Graceful recovery from corrupted cache files.
- Updated typing to modern Python standards.
-
Typing
- General typing improvements across reporting and normalization modules.
- Control Flow Graph (CFG v1) for structural clone detection.
- Deterministic CFG-based function fingerprints.
- Interactive HTML report with syntax highlighting.
- Block-level clone visualization.
- Function clone detection now based on CFG instead of pure AST.
- Improved robustness against refactoring and control-flow changes.
- Added
docs/cfg.mdwith CFG semantics and limitations. - Added
docs/architecture.mddescribing system design.
- AST-based function clone detection.
- Block-level clone detection (Type-3-lite).
- Baseline workflow for CI.
- JSON and text reports.