Skip to content

orenlab/codeclone

CodeClone

Structural code quality analysis for Python

PyPI Downloads Tests Benchmark Python codeclone 89 (B) License


CodeClone provides deterministic structural code quality analysis for Python. It detects architectural duplication, computes quality metrics, and enforces CI gates — all with baseline-aware governance that separates known technical debt from new regressions. An optional MCP interface exposes the same canonical analysis pipeline to AI agents and IDEs.

Docs: orenlab.github.io/codeclone · Live sample report: orenlab.github.io/codeclone/examples/report/

Note

This README and docs site track the in-development v2.0.x line from main. For the latest stable CodeClone documentation (v1.4.4), see the v1.4.4 README and the v1.4.4 docs tree.

Features

  • Clone detection — function (CFG fingerprint), block (statement windows), and segment (report-only) clones
  • Structural findings — duplicated branch families, clone guard/exit divergence, and clone-cohort drift
  • Quality metrics — cyclomatic complexity, coupling (CBO), cohesion (LCOM4), dependency cycles, dead code, health score, and overloaded-module profiling
  • Adoption & API — type/docstring annotation coverage, public API surface inventory and baseline diff
  • Coverage Join — fuse external Cobertura XML into the current run to surface coverage hotspots and scope gaps
  • Baseline governance — separates accepted legacy debt from new regressions; CI fails only on what changed
  • Reports — interactive HTML, JSON, Markdown, SARIF, and text from one canonical report
  • MCP server — optional read-only surface for AI agents and IDEs
  • IDE & agent clients — VS Code extension, Claude Desktop bundle, and Codex plugin over the same MCP contract
  • CI-first — deterministic output, stable ordering, exit code contract, pre-commit support
  • Fast — incremental caching, parallel processing, warm-run optimization

Quick Start

uv tool install codeclone      # use --pre for beta

codeclone .                    # analyze
codeclone . --html             # HTML report
codeclone . --html --open-html-report  # open in browser
codeclone . --json --md --sarif --text # all formats
codeclone . --ci               # CI mode
More examples
# timestamped report snapshots
codeclone . --html --json --timestamped-report-paths

# changed-scope gating against git diff
codeclone . --changed-only --diff-against main

# shorthand: diff source for changed-scope review
codeclone . --paths-from-git-diff HEAD~1
Run without install
uvx codeclone@latest .

CI Integration

# 1. Generate baseline (commit to repo)
codeclone . --update-baseline

# 2. Add to CI pipeline
codeclone . --ci
What --ci enables The --ci preset equals --fail-on-new --no-color --quiet. When a trusted metrics baseline is loaded, CI mode also enables --fail-on-new-metrics.

GitHub Action

CodeClone also ships a composite GitHub Action for PR and CI workflows:

- uses: orenlab/codeclone/.github/actions/codeclone@main
  with:
    fail-on-new: "true"
    sarif: "true"
    pr-comment: "true"

It can:

  • run baseline-aware gating
  • generate JSON and SARIF reports
  • upload SARIF to GitHub Code Scanning
  • post or update a PR summary comment

Action docs: .github/actions/codeclone/README.md

Quality Gates

# Metrics thresholds
codeclone . --fail-complexity 20 --fail-coupling 10 --fail-cohesion 4 --fail-health 60

# Structural policies
codeclone . --fail-cycles --fail-dead-code

# Regression detection vs baseline
codeclone . --fail-on-new-metrics

# Adoption and API governance
codeclone . --min-typing-coverage 80 --min-docstring-coverage 60
codeclone . --fail-on-typing-regression --fail-on-docstring-regression
codeclone . --api-surface --update-metrics-baseline
codeclone . --fail-on-api-break

# Coverage Join — fuse external Cobertura XML into the review
codeclone . --coverage coverage.xml --fail-on-untested-hotspots --coverage-min 50

Gate details: Metrics and quality gates

Pre-commit

repos:
  - repo: local
    hooks:
      - id: codeclone
        name: CodeClone
        entry: codeclone
        language: system
        pass_filenames: false
        args: [ ".", "--ci" ]
        types: [ python ]

MCP Server

Optional read-only MCP server for AI agents and IDE clients. Never mutates source, baselines, or repo state.

uv tool install --pre "codeclone[mcp]"       # or: uv pip install --pre "codeclone[mcp]"

codeclone-mcp --transport stdio            # local (Claude Code, Codex, Copilot, Gemini CLI)
codeclone-mcp --transport streamable-http  # remote / HTTP-only clients

MCP usage guide · MCP interface contract

Native Client Surfaces

Surface Location Purpose
VS Code extension VS Code Marketplace Triage-first structural review in the editor
Claude Desktop bundle extensions/claude-desktop-codeclone/ Local .mcpb install with pre-loaded instructions
Codex plugin plugins/codeclone/ Native discovery, two skills, and MCP definition

All three are thin wrappers over the same codeclone-mcp contract — no second analysis engine.

VS Code extension docs · Claude Desktop docs · Codex plugin docs

Configuration

CodeClone can load project-level configuration from pyproject.toml:

[tool.codeclone]
min_loc = 10
min_stmt = 6
baseline = "codeclone.baseline.json"
golden_fixture_paths = ["tests/fixtures/golden_*"]
skip_metrics = false
quiet = false
html_out = ".cache/codeclone/report.html"
json_out = ".cache/codeclone/report.json"
md_out = ".cache/codeclone/report.md"
sarif_out = ".cache/codeclone/report.sarif"
text_out = ".cache/codeclone/report.txt"
block_min_loc = 20
block_min_stmt = 8
segment_min_loc = 20
segment_min_stmt = 10

Precedence: CLI flags > pyproject.toml > built-in defaults.

Config reference: Config and defaults

Baseline Workflow

Baselines capture the current duplication state. Once committed, they become the CI reference point.

  • Clones are classified as NEW (not in baseline) or KNOWN (accepted debt)
  • --update-baseline writes both clone and metrics snapshots
  • Trust is verified via generator, fingerprint_version, and payload_sha256
  • In --ci mode, an untrusted baseline is a contract error (exit 2)

Full contract: Baseline contract

Exit Codes

Code Meaning
0 Success
2 Contract error — untrusted baseline, invalid config, unreadable sources in CI
3 Gating failure — new clones or metric threshold exceeded
5 Internal error

Contract errors (2) take precedence over gating failures (3).

Full policy: Exit codes and failure policy

Reports

Format Flag Default path
HTML --html .cache/codeclone/report.html
JSON --json .cache/codeclone/report.json
Markdown --md .cache/codeclone/report.md
SARIF --sarif .cache/codeclone/report.sarif
Text --text .cache/codeclone/report.txt

All formats are rendered from one canonical JSON report. --open-html-report opens the HTML in the default browser. --timestamped-report-paths appends a UTC timestamp to default filenames.

Report contract: Report contract · HTML render

Canonical JSON report shape (v2.8)
{
  "report_schema_version": "2.8",
  "meta": {
    "codeclone_version": "2.0.0b5",
    "project_name": "...",
    "scan_root": ".",
    "report_mode": "full",
    "analysis_profile": {
      "min_loc": 10,
      "min_stmt": 6,
      "block_min_loc": 20,
      "block_min_stmt": 8,
      "segment_min_loc": 20,
      "segment_min_stmt": 10
    },
    "analysis_thresholds": {
      "design_findings": {
        "...": "..."
      }
    },
    "baseline": {
      "...": "..."
    },
    "cache": {
      "...": "..."
    },
    "metrics_baseline": {
      "...": "..."
    },
    "runtime": {
      "analysis_started_at_utc": "...",
      "report_generated_at_utc": "..."
    }
  },
  "inventory": {
    "files": {
      "...": "..."
    },
    "code": {
      "...": "..."
    },
    "file_registry": {
      "encoding": "relative_path",
      "items": []
    }
  },
  "findings": {
    "summary": {
      "...": "..."
    },
    "groups": {
      "clones": {
        "functions": [],
        "blocks": [],
        "segments": []
      },
      "structural": {
        "groups": []
      },
      "dead_code": {
        "groups": []
      },
      "design": {
        "groups": []
      }
    }
  },
  "metrics": {
    "summary": {
      "...": "...",
      "coverage_adoption": { "...": "..." },
      "coverage_join": { "...": "..." },
      "api_surface": { "...": "..." }
    },
    "families": {
      "...": "...",
      "coverage_adoption": { "...": "..." },
      "coverage_join": { "...": "..." },
      "api_surface": { "...": "..." }
    }
  },
  "derived": {
    "suggestions": [],
    "overview": {
      "families": {},
      "top_risks": [],
      "source_scope_breakdown": {},
      "health_snapshot": {},
      "directory_hotspots": {}
    },
    "hotlists": {
      "most_actionable_ids": [],
      "highest_spread_ids": [],
      "production_hotspot_ids": [],
      "test_fixture_hotspot_ids": []
    }
  },
  "integrity": {
    "canonicalization": {
      "version": "1",
      "scope": "canonical_only"
    },
    "digest": {
      "algorithm": "sha256",
      "verified": true,
      "value": "..."
    }
  }
}

Full contract: Report contract

Inline Suppressions

When a symbol is invoked through runtime dynamics (framework callbacks, plugin loading, reflection), suppress the known false positive at the declaration site:

# codeclone: ignore[dead-code]
def handle_exception(exc: Exception) -> None:
    ...


class Middleware:  # codeclone: ignore[dead-code]
    ...

Suppression contract: Inline suppressions · Dead-code contract

How It Works

  1. Parse — Python source to AST
  2. Normalize — canonical structure (robust to renaming, formatting)
  3. CFG — per-function control flow graph
  4. Fingerprint — stable hash computation
  5. Group — function, block, and segment clone groups
  6. Metrics — complexity, coupling, cohesion, dependencies, dead code, health
  7. Gate — baseline comparison, threshold checks

Architecture: Architecture narrative · CFG semantics: CFG semantics

Documentation

Full docs and contract book: orenlab.github.io/codeclone

Quick links: Baseline · Report · Metrics & gates · MCP · CLI

Benchmarking Notes

Reproducible Docker Benchmark
./benchmarks/run_docker_benchmark.sh

The wrapper builds benchmarks/Dockerfile, runs isolated container benchmarks, and writes results to .cache/benchmarks/codeclone-benchmark.json.

Use environment overrides to pin the benchmark envelope:

CPUSET=0 CPUS=1.0 MEMORY=2g RUNS=16 WARMUPS=4 \
  ./benchmarks/run_docker_benchmark.sh

Performance claims are backed by the reproducible benchmark workflow documented in Benchmarking contract

License

  • Code: MPL-2.0
  • Documentation: MIT

Versions released before this change remain under their original license terms.

Links

About

Deterministic structural code quality analysis for Python with baseline-aware governance, canonical reporting, and an optional MCP interface for agents and IDEs.

Topics

Resources

License

MPL-2.0, Unknown licenses found

Licenses found

MPL-2.0
LICENSE
Unknown
LICENSE-docs

Contributing

Security policy

Stars

Watchers

Forks

Contributors