Skip to content

Commit 20cbaad

Browse files
committed
feat(core): UI improvements, updating report schemas; adding new formats (md, sarif); multiple hot path optimizations and other improvements
1 parent e151ede commit 20cbaad

73 files changed

Lines changed: 13698 additions & 2134 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

CHANGELOG.md

Lines changed: 15 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ This beta introduces:
99

1010
- a new stage-based architecture
1111
- unified clone + metrics baseline flow
12-
- report schema `2.0`, cache schema `2.0`, and richer report provenance
12+
- report schema `2.1`, cache schema `2.1`, and richer report provenance
1313
- expanded code-health analysis (complexity, coupling, cohesion, dependencies, dead code, health)
1414
- improved HTML and CLI reporting surfaces
1515
- substantial performance work for faster cold and warm runs
@@ -41,8 +41,15 @@ final `2.0.0` release.
4141
- Added unified baseline flow with optional top-level `metrics` stored in the same baseline file as clone keys.
4242
- Tracked embedded metrics snapshot integrity via `meta.metrics_payload_sha256`.
4343
- Preserved embedded metrics payload and hash when updating clone baseline content.
44-
- Bumped cache schema to `2.0`.
45-
- Bumped report schema to `2.0`.
44+
- Bumped cache schema to `2.1`.
45+
- Bumped report schema to `2.1`.
46+
- Consolidated report contract around canonical sections:
47+
`meta`, `inventory`, `findings`, `metrics`, with `derived` and `integrity`
48+
as explicit companion layers.
49+
- Structural findings now deduplicate repeated occurrences and use explicit
50+
`file_path` item layout instead of a sentinel `file_i=-1`.
51+
- Tightened `duplicated_branches` reporting to suppress trivial single-statement
52+
branch boilerplate without structural mass.
4653

4754
### Configuration and CLI UX
4855

@@ -52,6 +59,8 @@ final `2.0.0` release.
5259
- Added optional-value report flags with deterministic defaults when passed without a path:
5360
- `--html` -> `.cache/codeclone/report.html`
5461
- `--json` -> `.cache/codeclone/report.json`
62+
- `--md` -> `.cache/codeclone/report.md`
63+
- `--sarif` -> `.cache/codeclone/report.sarif`
5564
- `--text` -> `.cache/codeclone/report.txt`
5665
- Added optional-value path flags for default-path intent:
5766
- `--baseline`
@@ -110,7 +119,8 @@ final `2.0.0` release.
110119
- Improved warm-run responsiveness substantially while preserving deterministic behavior and output contracts.
111120
- Deferred HTML renderer import in CLI so non-HTML runs do not pay template/render startup cost.
112121
- Disabled transient status spinner contexts when `--no-progress` is active to reduce terminal I/O overhead.
113-
- Added canonical cache-entry fast-path for already validated runtime entries while preserving fallback validation for raw
122+
- Added canonical cache-entry fast-path for already validated runtime entries while preserving fallback validation for
123+
raw
114124
or externally mutated payloads.
115125
- Reused a shared parsed baseline payload when clone and metrics baselines point to the same file to avoid duplicate
116126
JSON reads/parses in one run.
@@ -137,7 +147,7 @@ final `2.0.0` release.
137147
`pre-commit run --all-files` passes with the CI gate enabled.
138148
- Added targeted branch and invariant tests for `baseline`, `cache`, `cli`, `html_report`, `extractor`,
139149
`pipeline.process`, and metrics modules.
140-
- Full suite now reaches `100%` coverage.
150+
- Coverage gate is enforced at `>=99%` with contract-focused branch/invariant tests.
141151

142152
### Stability Notes
143153

README.md

Lines changed: 34 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ all with baseline-aware governance that separates **known** technical debt from
2727
- **Quality metrics** — cyclomatic complexity, coupling (CBO), cohesion (LCOM4), dependency cycles, dead code, health
2828
score
2929
- **Baseline governance** — known debt stays accepted; CI blocks only new clones and metric regressions
30-
- **Reports** — interactive HTML, deterministic JSON (schema v2.0), plain text — all with NEW/KNOWN split
30+
- **Reports** — interactive HTML, deterministic JSON/TXT plus Markdown and SARIF projections from one canonical report
3131
- **CI-first** — deterministic output, stable ordering, exit code contract, pre-commit support
3232
- **Fast** — incremental caching, parallel processing, warm-run optimization
3333

@@ -38,6 +38,7 @@ pip install codeclone # or: uv tool install codeclone
3838

3939
codeclone . # analyze current directory
4040
codeclone . --html # generate HTML report
41+
codeclone . --json --md --sarif --text # generate machine-readable reports
4142
codeclone . --ci # CI mode (--fail-on-new --no-color --quiet)
4243
```
4344

@@ -103,6 +104,9 @@ skip_metrics = false
103104
quiet = true
104105
html_out = ".cache/codeclone/report.html"
105106
json_out = ".cache/codeclone/report.json"
107+
md_out = ".cache/codeclone/report.md"
108+
sarif_out = ".cache/codeclone/report.sarif"
109+
text_out = ".cache/codeclone/report.txt"
106110
```
107111

108112
Precedence: CLI flags > `pyproject.toml` > built-in defaults.
@@ -135,54 +139,48 @@ Contract errors (`2`) take precedence over gating failures (`3`).
135139
|--------|----------|--------------------------------|
136140
| HTML | `--html` | `.cache/codeclone/report.html` |
137141
| JSON | `--json` | `.cache/codeclone/report.json` |
142+
| Markdown | `--md` | `.cache/codeclone/report.md` |
143+
| SARIF | `--sarif` | `.cache/codeclone/report.sarif` |
138144
| Text | `--text` | `.cache/codeclone/report.txt` |
139145

140-
All reports include NEW/KNOWN split, matched code snippets, and provenance metadata.
146+
All report formats are rendered from one canonical JSON report document.
141147

142148
<details>
143-
<summary>JSON report shape (v2.0)</summary>
149+
<summary>JSON report shape (v2.1)</summary>
144150

145151
```json
146152
{
147-
"report_schema_version": "2.0",
153+
"report_schema_version": "2.1",
148154
"meta": {
155+
"codeclone_version": "2.0.0b1",
149156
"project_name": "...",
150-
"scan_root": "..."
157+
"scan_root": "...",
158+
"report_mode": "full",
159+
"baseline": { "...": "..." },
160+
"cache": { "...": "..." },
161+
"metrics_baseline": { "...": "..." },
162+
"runtime": { "report_generated_at_utc": "..." }
151163
},
152-
"files": [],
153-
"groups": {
154-
"functions": {},
155-
"blocks": {},
156-
"segments": {}
164+
"inventory": {
165+
"files": { "...": "..." },
166+
"code": { "...": "..." },
167+
"file_registry": { "encoding": "relative_path", "items": [] }
157168
},
158-
"groups_split": {
159-
"functions": {
160-
"new": [],
161-
"known": []
162-
},
163-
"blocks": {
164-
"new": [],
165-
"known": []
166-
},
167-
"segments": {
168-
"new": [],
169-
"known": []
169+
"findings": {
170+
"summary": { "...": "..." },
171+
"groups": {
172+
"clones": { "functions": [], "blocks": [], "segments": [] },
173+
"structural": { "groups": [] },
174+
"dead_code": { "groups": [] },
175+
"design": { "groups": [] }
170176
}
171177
},
172-
"group_item_layout": {
173-
"functions": [
174-
"..."
175-
],
176-
"blocks": [
177-
"..."
178-
],
179-
"segments": [
180-
"..."
181-
]
182-
},
183-
"facts": {},
184-
"metrics": {},
185-
"suggestions": []
178+
"metrics": { "summary": {}, "families": {} },
179+
"derived": { "suggestions": [], "overview": {}, "hotlists": {} },
180+
"integrity": {
181+
"canonicalization": { "version": "1", "scope": "canonical_only" },
182+
"digest": { "algorithm": "sha256", "verified": true, "value": "..." }
183+
}
186184
}
187185
```
188186

SECURITY.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ The following versions currently receive security updates:
99

1010
| Version | Supported |
1111
|---------|-----------|
12+
| 2.0.x | Yes |
1213
| 1.4.x | Yes |
1314
| 1.3.x | No |
1415
| 1.2.x | No |
@@ -42,7 +43,7 @@ Additional safeguards:
4243
- Report explainability fields are generated in Python core; UI is rendering-only and does not infer semantics.
4344
- Scanner traversal is root-confined and prevents symlink-based path escape.
4445
- Baseline files are schema/type validated with size limits and tamper-evident integrity fields
45-
(`meta.generator` as trust gate, `meta.payload_sha256` as integrity hash in baseline v1).
46+
(`meta.generator` as trust gate, `meta.payload_sha256` as integrity hash in baseline schema `2.0`).
4647
- Baseline integrity is tamper-evident (audit signal), not tamper-proof cryptographic signing.
4748
An actor who can rewrite baseline content and recompute `payload_sha256` can still alter it.
4849
- Baseline hash covers canonical payload only (`clones.functions`, `clones.blocks`,

codeclone/_cli_args.py

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,8 @@
2020
DEFAULT_BASELINE_PATH = "codeclone.baseline.json"
2121
DEFAULT_HTML_REPORT_PATH = ".cache/codeclone/report.html"
2222
DEFAULT_JSON_REPORT_PATH = ".cache/codeclone/report.json"
23+
DEFAULT_MARKDOWN_REPORT_PATH = ".cache/codeclone/report.md"
24+
DEFAULT_SARIF_REPORT_PATH = ".cache/codeclone/report.sarif"
2325
DEFAULT_TEXT_REPORT_PATH = ".cache/codeclone/report.txt"
2426

2527

@@ -255,6 +257,22 @@ def build_parser(version: str) -> argparse.ArgumentParser:
255257
const=DEFAULT_JSON_REPORT_PATH,
256258
help=ui.HELP_JSON,
257259
)
260+
out_group.add_argument(
261+
"--md",
262+
dest="md_out",
263+
nargs="?",
264+
metavar="FILE",
265+
const=DEFAULT_MARKDOWN_REPORT_PATH,
266+
help=ui.HELP_MD,
267+
)
268+
out_group.add_argument(
269+
"--sarif",
270+
dest="sarif_out",
271+
nargs="?",
272+
metavar="FILE",
273+
const=DEFAULT_SARIF_REPORT_PATH,
274+
help=ui.HELP_SARIF,
275+
)
258276
out_group.add_argument(
259277
"--text",
260278
dest="text_out",

codeclone/_cli_config.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,8 @@ class _ConfigKeySpec:
4848
"skip_dependencies": _ConfigKeySpec(bool),
4949
"html_out": _ConfigKeySpec(str, allow_none=True),
5050
"json_out": _ConfigKeySpec(str, allow_none=True),
51+
"md_out": _ConfigKeySpec(str, allow_none=True),
52+
"sarif_out": _ConfigKeySpec(str, allow_none=True),
5153
"text_out": _ConfigKeySpec(str, allow_none=True),
5254
"no_progress": _ConfigKeySpec(bool),
5355
"no_color": _ConfigKeySpec(bool),
@@ -62,6 +64,8 @@ class _ConfigKeySpec:
6264
"metrics_baseline",
6365
"html_out",
6466
"json_out",
67+
"md_out",
68+
"sarif_out",
6569
"text_out",
6670
}
6771
)

codeclone/_cli_meta.py

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,18 +4,24 @@
44
from __future__ import annotations
55

66
import sys
7+
from datetime import datetime, timezone
78
from pathlib import Path
89
from typing import TypedDict
910

1011
from .baseline import Baseline, current_python_tag
11-
from .contracts import REPORT_SCHEMA_VERSION
1212
from .metrics_baseline import MetricsBaseline
1313

1414

1515
def _current_python_version() -> str:
1616
return f"{sys.version_info.major}.{sys.version_info.minor}"
1717

1818

19+
def _current_report_timestamp_utc() -> str:
20+
return (
21+
datetime.now(timezone.utc).replace(microsecond=0).strftime("%Y-%m-%dT%H:%M:%SZ")
22+
)
23+
24+
1925
class ReportMeta(TypedDict):
2026
"""
2127
Canonical report metadata contract shared by HTML, JSON, and TXT reports.
@@ -28,7 +34,6 @@ class ReportMeta(TypedDict):
2834
- cache_*: cache status/provenance for run transparency
2935
"""
3036

31-
report_schema_version: str
3237
codeclone_version: str
3338
project_name: str
3439
scan_root: str
@@ -59,6 +64,7 @@ class ReportMeta(TypedDict):
5964
health_grade: str | None
6065
analysis_mode: str
6166
metrics_computed: list[str]
67+
report_generated_at_utc: str
6268

6369

6470
def _build_report_meta(
@@ -85,7 +91,6 @@ def _build_report_meta(
8591
) -> ReportMeta:
8692
project_name = scan_root.name or str(scan_root)
8793
return {
88-
"report_schema_version": REPORT_SCHEMA_VERSION,
8994
"codeclone_version": codeclone_version,
9095
"project_name": project_name,
9196
"scan_root": str(scan_root),
@@ -124,4 +129,5 @@ def _build_report_meta(
124129
"health_grade": health_grade,
125130
"analysis_mode": analysis_mode,
126131
"metrics_computed": list(metrics_computed),
132+
"report_generated_at_utc": _current_report_timestamp_utc(),
127133
}

codeclone/_cli_paths.py

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,19 +6,22 @@
66
import sys
77
from collections.abc import Callable
88
from pathlib import Path
9-
10-
from rich.console import Console
9+
from typing import Protocol
1110

1211
from .contracts import ExitCode
1312
from .ui_messages import fmt_contract_error
1413

1514

15+
class _Printer(Protocol):
16+
def print(self, *objects: object, **kwargs: object) -> None: ...
17+
18+
1619
def _validate_output_path(
1720
path: str,
1821
*,
1922
expected_suffix: str,
2023
label: str,
21-
console: Console,
24+
console: _Printer,
2225
invalid_message: Callable[..., str],
2326
invalid_path_message: Callable[..., str],
2427
) -> Path:

codeclone/_cli_summary.py

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,7 @@
44
from __future__ import annotations
55

66
from dataclasses import dataclass
7-
8-
from rich.console import Console
9-
from rich.rule import Rule
7+
from typing import Protocol
108

119
from . import ui_messages as ui
1210

@@ -26,9 +24,13 @@ class MetricsSnapshot:
2624
health_grade: str
2725

2826

27+
class _Printer(Protocol):
28+
def print(self, *objects: object, **kwargs: object) -> None: ...
29+
30+
2931
def _print_summary(
3032
*,
31-
console: Console,
33+
console: _Printer,
3234
quiet: bool,
3335
files_found: int,
3436
files_analyzed: int,
@@ -65,6 +67,8 @@ def _print_summary(
6567
)
6668
)
6769
else:
70+
from rich.rule import Rule
71+
6872
console.print()
6973
console.print(Rule(title=ui.SUMMARY_TITLE, style="dim", characters="\u2500"))
7074
console.print(
@@ -99,7 +103,7 @@ def _print_summary(
99103

100104
def _print_metrics(
101105
*,
102-
console: Console,
106+
console: _Printer,
103107
quiet: bool,
104108
metrics: MetricsSnapshot,
105109
) -> None:
@@ -119,6 +123,8 @@ def _print_metrics(
119123
)
120124
)
121125
else:
126+
from rich.rule import Rule
127+
122128
console.print()
123129
console.print(Rule(title=ui.METRICS_TITLE, style="dim", characters="\u2500"))
124130
console.print(ui.fmt_metrics_health(metrics.health_total, metrics.health_grade))

0 commit comments

Comments
 (0)