Skip to content

Commit b5401e3

Browse files
committed
chore(docs): normalize README headings/lists and clean markdown artifacts for release-ready formatting
1 parent f91ab18 commit b5401e3

1 file changed

Lines changed: 63 additions & 34 deletions

File tree

README.md

Lines changed: 63 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ Typical use cases:
6868
### Segment-level internal clone detection
6969

7070
- Detects repeated **segment windows** inside the same function.
71-
- Uses a twostep deterministic match (candidate signature → strict hash).
71+
- Uses a two-step deterministic match (candidate signature → strict hash).
7272
- Included in reports for explainability, **not** in baseline/CI failure logic.
7373

7474
### Control-Flow Awareness (CFG v1)
@@ -82,7 +82,7 @@ Typical use cases:
8282
- `with` / `async with`
8383
- `match` / `case` (Python 3.10+)
8484
- Current CFG semantics (v1):
85-
- `and` / `or` are modeled as shortcircuit microCFG branches,
85+
- `and` / `or` are modeled as short-circuit micro-CFG branches,
8686
- `try/except` links only from statements that may raise,
8787
- `break` / `continue` are modeled as terminating loop transitions with explicit targets,
8888
- `for/while ... else` semantics are preserved structurally,
@@ -115,9 +115,7 @@ This design keeps clone detection **stable, deterministic, and low-noise**.
115115
pip install codeclone
116116
```
117117

118-
Python **3.10+** is required.
119-
120-
---
118+
Python 3.10+ is required.
121119

122120
## Quick Start
123121

@@ -142,14 +140,6 @@ codeclone . \
142140
--text .cache/codeclone/report.txt
143141
```
144142

145-
All report formats include provenance metadata for auditability:
146-
`codeclone_version`, `python_version`, `baseline_path`, `baseline_version`,
147-
`baseline_schema_version`, `baseline_python_version`, `baseline_loaded`,
148-
`baseline_status` (and cache metadata when available).
149-
`baseline_status` values: `ok`, `missing`, `legacy`, `invalid`,
150-
`mismatch_version`, `mismatch_schema`, `mismatch_python`,
151-
`generator_mismatch`, `integrity_missing`, `integrity_failed`, `too_large`.
152-
153143
Generate an HTML report:
154144

155145
```bash
@@ -162,9 +152,35 @@ Check version:
162152
codeclone --version
163153
```
164154

155+
---
156+
157+
## Reports and Metadata
158+
159+
All report formats include provenance metadata for auditability:
160+
161+
`codeclone_version`, `python_version`, `baseline_path`, `baseline_version`,
162+
`baseline_schema_version`, `baseline_python_version`, `baseline_loaded`,
163+
`baseline_status` (and cache metadata when available).
164+
165+
baseline_status values:
166+
167+
- `ok`
168+
- `missing`
169+
- `legacy`
170+
- `invalid`
171+
- `mismatch_version`
172+
- `mismatch_schema`
173+
- `mismatch_python`
174+
- `generator_mismatch`
175+
- `integrity_missing`
176+
- `integrity_failed`
177+
- `too_large`
178+
179+
---
180+
165181
## Baseline Workflow (Recommended)
166182

167-
### 1. Create a baseline
183+
1. Create a baseline
168184

169185
Run once on your current codebase:
170186

@@ -174,18 +190,28 @@ codeclone . --update-baseline
174190

175191
Commit the generated baseline file to the repository.
176192

177-
Baselines are **versioned**. If CodeClone is upgraded, regenerate the baseline to keep
193+
Baselines are versioned. If CodeClone is upgraded, regenerate the baseline to keep
178194
CI deterministic and explainable.
179-
Baseline format in 1.3+ is tamper-evident (`generator`, `payload_sha256`) and validated
195+
196+
Baseline format in 1.3+ is tamper-evident (generator, payload_sha256) and validated
180197
before baseline comparison.
181198

182-
Trusted vs untrusted baseline behavior (`invalid`, `too_large`, `generator_mismatch`,
183-
`integrity_missing`, `integrity_failed`):
199+
2. Trusted vs untrusted baseline behavior
184200

185-
- ignored with warning in non-gating mode (comparison falls back to empty baseline),
186-
- fail-fast in `--fail-on-new` / `--ci` (exit code `2`).
201+
Baseline states considered untrusted:
187202

188-
### 2. Use in CI
203+
- `invalid`
204+
- `too_large`
205+
- `generator_mismatch`
206+
- `integrity_missing`
207+
- `integrity_failed`
208+
209+
Behavior:
210+
211+
- in normal mode, untrusted baseline is ignored with a warning (comparison falls back to empty baseline);
212+
- in `--fail-on-new` / `--ci`, untrusted baseline fails fast (exit code 2).
213+
214+
3. Use in CI
189215

190216
```bash
191217
codeclone . --ci
@@ -199,21 +225,23 @@ codeclone . --ci --html .cache/codeclone/report.html
199225

200226
`--ci` is equivalent to `--fail-on-new --no-color --quiet`.
201227

202-
---
203-
204228
Behavior:
205229

206230
- existing clones are allowed,
207-
- the build fails if *new* clones appear,
231+
- the build fails if new clones appear,
208232
- refactoring that removes duplication is always allowed.
209233

210-
`--fail-on-new` exits with a non-zero code when new clones are detected.
234+
`--fail-on-new` / `--ci` exits with a non-zero code when new clones are detected.
235+
236+
---
211237

212238
### Cache
213239

214240
By default, CodeClone stores the cache per project at:
215241

216-
`<root>/.cache/codeclone/cache.json`
242+
```bash
243+
<root>/.cache/codeclone/cache.json
244+
```
217245

218246
You can override this path with `--cache-path` (`--cache-dir` is a legacy alias).
219247

@@ -222,10 +250,13 @@ If you used an older version of CodeClone, delete the legacy cache file at
222250

223251
Cache integrity checks are strict: signature mismatch or oversized cache files are ignored
224252
with an explicit warning, then rebuilt from source.
253+
225254
Cache entries are validated against expected structure/types; invalid entries are ignored
226255
deterministically.
227256

228-
### Python Version Consistency for Baseline Checks
257+
---
258+
259+
## Python Version Consistency for Baseline Checks
229260

230261
Due to inherent differences in Python’s AST between interpreter versions, baseline
231262
generation and verification must be performed using the same Python version.
@@ -256,27 +287,25 @@ repos:
256287
257288
## What CodeClone Is (and Is Not)
258289
259-
### CodeClone **is**
290+
### CodeClone Is
260291
261292
- an architectural analysis tool,
262293
- a duplication radar,
263294
- a CI guard against copy-paste,
264295
- a control-flow-aware clone detector.
265296
266-
### CodeClone **is not**
297+
### CodeClone Is Not
267298
268299
- a linter,
269300
- a formatter,
270301
- a semantic equivalence prover,
271302
- a runtime analyzer.
272303
273-
---
274-
275304
## How It Works (High Level)
276305
277306
1. Parse Python source into AST.
278307
2. Normalize AST (names, constants, attributes, annotations).
279-
3. Build a **Control Flow Graph (CFG)** per function.
308+
3. Build a Control Flow Graph (CFG) per function.
280309
4. Compute stable CFG fingerprints.
281310
5. Extract segment windows for internal clone discovery.
282311
6. Detect function-level, block-level, and segment-level clones.
@@ -290,10 +319,10 @@ See the architectural overview:
290319
291320
## Control Flow Graph (CFG)
292321
293-
Starting from **version 1.1.0**, CodeClone uses a **Control Flow Graph (CFG)**
322+
Starting from version 1.1.0, CodeClone uses a Control Flow Graph (CFG)
294323
to improve structural clone detection robustness.
295324
296-
The CFG is a **structural abstraction**, not a runtime execution model.
325+
The CFG is a structural abstraction, not a runtime execution model.
297326
298327
See full design and semantics:
299328

0 commit comments

Comments
 (0)