Skip to content

Commit 40b73f4

Browse files
committed
feat(cli iu): improvements to the cli UI, adding pre-commit hooks, updating documentation
1 parent 18a96b6 commit 40b73f4

67 files changed

Lines changed: 502 additions & 977 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/ISSUE_TEMPLATE/bug_report.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,4 +62,4 @@ body:
6262
id: notes
6363
attributes:
6464
label: Additional context
65-
description: CFG structure, HTML screenshots, logs, etc.
65+
description: CFG structure, HTML screenshots, logs, etc.

.github/ISSUE_TEMPLATE/cfg_semantics.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,4 +43,4 @@ body:
4343
attributes:
4444
label: Desired CFG behavior
4545
validations:
46-
required: true
46+
required: true

.github/ISSUE_TEMPLATE/false_positive.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,4 +43,4 @@ body:
4343
attributes:
4444
label: CFG-related?
4545
options:
46-
- label: Control flow structure differs meaningfully
46+
- label: Control flow structure differs meaningfully

.github/ISSUE_TEMPLATE/feature_request.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,4 +43,4 @@ body:
4343
- type: textarea
4444
id: alternatives
4545
attributes:
46-
label: Alternatives considered
46+
label: Alternatives considered

.github/actions/codeclone/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,4 +8,4 @@ Runs CodeClone to detect architectural code duplication in Python projects.
88
- uses: orenlab/codeclone/.github/actions/codeclone@v1
99
with:
1010
path: .
11-
fail-on-new: true
11+
fail-on-new: true

.github/actions/codeclone/action.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
name: CodeClone
22
description: >
3-
AST-based Python code clone detector focused on architectural duplication
4-
and CI-friendly baseline enforcement.
3+
Structural code quality analysis for Python with
4+
CI-friendly baseline enforcement.
55
66
author: OrenLab
77

.pre-commit-config.yaml

Lines changed: 31 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,31 +1,55 @@
1+
default_install_hook_types: [ pre-commit, pre-push ]
2+
13
repos:
4+
- repo: https://github.com/pre-commit/pre-commit-hooks
5+
rev: v6.0.0
6+
hooks:
7+
- id: check-merge-conflict
8+
- id: end-of-file-fixer
9+
- id: trailing-whitespace
10+
- id: check-added-large-files
11+
- id: check-toml
12+
- id: check-yaml
13+
214
- repo: local
315
hooks:
4-
- id: ruff-check
5-
name: Ruff (lint)
6-
entry: ruff check .
16+
- id: ruff-format
17+
name: Ruff (format)
18+
entry: ruff format .
719
language: system
820
pass_filenames: false
921
types: [ python ]
22+
stages: [ pre-commit ]
1023

11-
- id: ruff-format
12-
name: Ruff (format)
13-
entry: ruff format .
24+
- id: ruff-check
25+
name: Ruff (lint)
26+
entry: ruff check .
1427
language: system
1528
pass_filenames: false
1629
types: [ python ]
30+
stages: [ pre-commit ]
1731

1832
- id: mypy
1933
name: Mypy
2034
entry: mypy .
2135
language: system
2236
pass_filenames: false
2337
types: [ python ]
38+
stages: [ pre-commit ]
2439

2540
- id: codeclone
2641
name: CodeClone
2742
entry: codeclone
2843
language: system
2944
pass_filenames: false
3045
args: [ ".", "--ci" ]
31-
types: [ python ]
46+
types: [ python ]
47+
stages: [ pre-commit ]
48+
49+
- id: pytest
50+
name: Pytest
51+
entry: pytest -q
52+
language: system
53+
pass_filenames: false
54+
types: [ python ]
55+
stages: [ pre-push ]

AGENTS.md

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,12 +28,25 @@ It is optimized for **determinism**, **CI stability**, and **reproducible change
2828
5. **Golden tests are contract sentinels.**
2929
- Do not update golden snapshots to “fix” failing tests unless the contract change is intentional, versioned where
3030
required, documented, and explicitly approved.
31+
6. **Fingerprint-adjacent optimization policy**
32+
33+
- Performance work must not change AST normalization, fingerprint inputs, or clone identity semantics while
34+
`FINGERPRINT_VERSION` remains unchanged.
35+
36+
- If a change in AST/core analysis can affect fingerprint bytes, clone identity, NEW vs KNOWN classification, or
37+
baseline compatibility semantics, it is not a routine optimization. It must be treated as an explicit fingerprint
38+
contract change and requires:
39+
- `FINGERPRINT_VERSION` review or bump
40+
- documentation updates
41+
- migration/release notes
42+
- explicit maintainer approval
43+
- Performance alone is never a sufficient reason to change fingerprint semantics.
3144

3245
---
3346

3447
## 2) Quick orientation
3548

36-
CodeClone is an AST/CFG-informed clone detector for Python. It supports:
49+
CodeClone provides structural code quality analysis for Python. It supports:
3750

3851
- **function clones** (strongest signal)
3952
- **block clones** (sliding window of statements, may be noisy on boilerplate)

CHANGELOG.md

Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -473,57 +473,57 @@ codeclone . --update-baseline
473473

474474
### Overview
475475

476-
This release focuses on security hardening, robustness, and long-term maintainability.
476+
This release focuses on security hardening, robustness, and long-term maintainability.
477477
No breaking API changes were introduced.
478478

479479
The goal of this release is to provide users with a safe, deterministic, and CI-friendly
480480
tool suitable for security-sensitive and large-scale environments.
481481

482482
### Security & Robustness
483483

484-
- **Path Traversal Protection**
484+
- **Path Traversal Protection**
485485
Implemented strict path validation to prevent scanning outside the project root or
486486
accessing sensitive system directories, including macOS `/private` paths.
487487

488-
- **Cache Integrity Protection**
488+
- **Cache Integrity Protection**
489489
Added HMAC-SHA256 signing for cache files to prevent cache poisoning and detect tampering.
490490

491-
- **Parser Safety Limits**
491+
- **Parser Safety Limits**
492492
Introduced AST parsing time limits to mitigate risks from pathological or adversarial inputs.
493493

494-
- **Resource Exhaustion Protection**
494+
- **Resource Exhaustion Protection**
495495
Enforced a maximum file size limit (10MB) and a maximum file count per scan to prevent
496496
excessive memory or CPU usage.
497497

498-
- **Structured Error Handling**
498+
- **Structured Error Handling**
499499
Introduced a dedicated exception hierarchy (`ParseError`, `CacheError`, etc.) and replaced
500500
broad exception handling with graceful, user-friendly failure reporting.
501501

502502
### Performance Improvements
503503

504-
- **Optimized AST Normalization**
504+
- **Optimized AST Normalization**
505505
Replaced expensive `deepcopy` operations with in-place AST normalization, significantly
506506
reducing CPU and memory overhead.
507507

508-
- **Improved Memory Efficiency**
508+
- **Improved Memory Efficiency**
509509
Added an LRU cache for file reading and optimized string concatenation during fingerprint
510510
generation.
511511

512-
- **HTML Report Memory Bounds**
512+
- **HTML Report Memory Bounds**
513513
HTML reports now read only the required line ranges instead of entire files, reducing peak
514514
memory usage on large codebases.
515515

516516
### Architecture & Maintainability
517517

518-
- **Strict Type Safety**
518+
- **Strict Type Safety**
519519
Migrated all optional typing to Python 3.10+ `| None` syntax and achieved 100% `mypy` strict
520520
compliance.
521521

522-
- **Modular CFG Design**
522+
- **Modular CFG Design**
523523
Split CFG data structures and builder logic into separate modules (`cfg_model.py` and
524524
`cfg.py`) for improved clarity and extensibility.
525525

526-
- **Template Extraction**
526+
- **Template Extraction**
527527
Extracted HTML templates into a dedicated `templates.py` module.
528528

529529
- Added a `py.typed` marker for downstream type checkers.
@@ -565,13 +565,13 @@ support for Python 3.10–3.14 across the test matrix.
565565

566566
### Fixed
567567

568-
- **CFG Exception Handling**
568+
- **CFG Exception Handling**
569569
Fixed incorrect control-flow linking for `try`/`except` blocks.
570570

571-
- **Pattern Matching Support**
571+
- **Pattern Matching Support**
572572
Added missing structural handling for `match`/`case` statements in the CFG.
573573

574-
- **Block Detection Scaling**
574+
- **Block Detection Scaling**
575575
Made `MIN_LINE_DISTANCE` dynamic based on block size to improve clone detection accuracy
576576
across differently sized functions.
577577

@@ -581,7 +581,7 @@ support for Python 3.10–3.14 across the test matrix.
581581

582582
### BREAKING CHANGES
583583

584-
- **CLI Arguments**
584+
- **CLI Arguments**
585585
Renamed output flags for brevity and consistency:
586586
- `--json-out``--json`
587587
- `--text-out``--text`

CONTRIBUTING.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,8 @@
22

33
Thank you for your interest in contributing to **CodeClone**.
44

5-
CodeClone is an **AST + CFG-based code clone detector** focused on architectural duplication,
6-
not textual similarity.
5+
CodeClone provides **structural code quality analysis** for Python, including clone detection,
6+
quality metrics, and baseline-aware CI governance.
77

88
Contributions are welcome — especially those that improve **signal quality**, **CFG semantics**,
99
and **real-world CI usability**.

0 commit comments

Comments
 (0)