Skip to content

Commit f9be591

Browse files
committed
fix(core,ci): harden git diff validation, make segment digests canonical, and align CI policy
1 parent ebcd474 commit f9be591

File tree

15 files changed

+127
-22
lines changed

15 files changed

+127
-22
lines changed

.github/workflows/codeclone.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ jobs:
1919
runs-on: ubuntu-latest
2020
steps:
2121
- name: Checkout
22-
uses: actions/checkout@v4
22+
uses: actions/checkout@v6.0.2
2323
with:
2424
fetch-depth: 0
2525

.github/workflows/tests.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ jobs:
4040
- name: Run tests
4141
# Smoke CLI tests intentionally disable subprocess coverage collection
4242
# to avoid runner-specific flakiness while keeping parent-process coverage strict.
43-
run: uv run pytest --cov=codeclone --cov-report=term-missing --cov-fail-under=98
43+
run: uv run pytest --cov=codeclone --cov-report=term-missing --cov-fail-under=99
4444

4545
- name: Verify baseline exists
4646
if: ${{ matrix.python-version == '3.13' }}

CHANGELOG.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,12 @@
1616
- Add `workspace_root` user-config field to the Claude Desktop bundle: setting it to the project directory forces the
1717
launcher to prefer `.venv` inside that path even when Claude Desktop starts with a different working directory
1818
(fixes python-tag mismatch caused by system-wide interpreter fallback).
19+
- Validate `git_diff_ref` inputs as safe single revision expressions in both
20+
CLI and MCP before invoking `git diff`.
21+
- Replace the segment-group raw digest `repr()` payload with canonical JSON
22+
bytes for cross-version-safe determinism.
23+
- Align the tests workflow coverage gate with the canonical `fail_under = 99`
24+
policy and refresh the remaining `actions/checkout` pin in `codeclone.yml`.
1925
- Refresh branch metadata and client docs for the `2.0.0b5` line.
2026
- Update the README repository health badge to `87 (B)`.
2127

codeclone/_git_diff.py

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
# This Source Code Form is subject to the terms of the Mozilla Public
2+
# License, v. 2.0. If a copy of the MPL was not distributed with this
3+
# file, You can obtain one at https://mozilla.org/MPL/2.0/.
4+
# SPDX-License-Identifier: MPL-2.0
5+
# Copyright (c) 2026 Den Rozhnovskiy
6+
7+
from __future__ import annotations
8+
9+
import re
10+
from typing import Final
11+
12+
_SAFE_GIT_DIFF_REF_RE: Final[re.Pattern[str]] = re.compile(
13+
r"^(?![-./])[A-Za-z0-9._/@{}^~+-]+$"
14+
)
15+
16+
17+
def validate_git_diff_ref(git_diff_ref: str) -> str:
18+
"""Validate a safe, single git revision expression for `git diff`.
19+
20+
CodeClone intentionally accepts a conservative subset of git revision
21+
syntax here: common branch names, tags, revision operators (`~`, `^`),
22+
reflog selectors (`@{...}`), and dotted range expressions. Whitespace,
23+
control characters, option-like prefixes, and unsupported punctuation are
24+
rejected before any subprocess call.
25+
"""
26+
27+
if git_diff_ref != git_diff_ref.strip():
28+
raise ValueError(
29+
"Invalid git diff ref "
30+
f"{git_diff_ref!r}: surrounding whitespace is not allowed."
31+
)
32+
if not git_diff_ref:
33+
raise ValueError("Invalid git diff ref '': value must not be empty.")
34+
if any(ch.isspace() or ord(ch) < 32 or ord(ch) == 127 for ch in git_diff_ref):
35+
raise ValueError(
36+
"Invalid git diff ref "
37+
f"{git_diff_ref!r}: whitespace and control characters are not allowed."
38+
)
39+
if not _SAFE_GIT_DIFF_REF_RE.fullmatch(git_diff_ref):
40+
raise ValueError(
41+
"Invalid git diff ref "
42+
f"{git_diff_ref!r}: expected a safe revision expression."
43+
)
44+
return git_diff_ref

codeclone/cli.py

Lines changed: 7 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,7 @@
9292
_print_metrics,
9393
_print_summary,
9494
)
95+
from ._git_diff import validate_git_diff_ref
9596
from .baseline import Baseline
9697
from .cache import Cache, CacheStatus, build_segment_report_projection
9798
from .contracts import (
@@ -270,16 +271,14 @@ def _normalize_changed_paths(
270271

271272

272273
def _git_diff_changed_paths(*, root_path: Path, git_diff_ref: str) -> tuple[str, ...]:
273-
if git_diff_ref.startswith("-"):
274-
console.print(
275-
ui.fmt_contract_error(
276-
f"Invalid git diff ref '{git_diff_ref}': must not start with '-'."
277-
)
278-
)
274+
try:
275+
validated_ref = validate_git_diff_ref(git_diff_ref)
276+
except ValueError as exc:
277+
console.print(ui.fmt_contract_error(str(exc)))
279278
sys.exit(ExitCode.CONTRACT_ERROR)
280279
try:
281280
completed = subprocess.run(
282-
["git", "diff", "--name-only", git_diff_ref, "--"],
281+
["git", "diff", "--name-only", validated_ref, "--"],
283282
cwd=str(root_path),
284283
check=True,
285284
capture_output=True,
@@ -294,7 +293,7 @@ def _git_diff_changed_paths(*, root_path: Path, git_diff_ref: str) -> tuple[str,
294293
console.print(
295294
ui.fmt_contract_error(
296295
"Unable to resolve changed files from git diff ref "
297-
f"'{git_diff_ref}': {exc}"
296+
f"'{validated_ref}': {exc}"
298297
)
299298
)
300299
sys.exit(ExitCode.CONTRACT_ERROR)

codeclone/mcp_service.py

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,7 @@
4747
)
4848
from ._coerce import as_float as _as_float
4949
from ._coerce import as_int as _as_int
50+
from ._git_diff import validate_git_diff_ref
5051
from .baseline import Baseline
5152
from .cache import Cache, CacheStatus
5253
from .contracts import (
@@ -793,13 +794,13 @@ def _git_diff_lines_payload(
793794
root_path: Path,
794795
git_diff_ref: str,
795796
) -> tuple[str, ...]:
796-
if git_diff_ref.startswith("-"):
797-
raise MCPGitDiffError(
798-
f"Invalid git diff ref '{git_diff_ref}': must not start with '-'."
799-
)
797+
try:
798+
validated_ref = validate_git_diff_ref(git_diff_ref)
799+
except ValueError as exc:
800+
raise MCPGitDiffError(str(exc)) from exc
800801
try:
801802
completed = subprocess.run(
802-
["git", "diff", "--name-only", git_diff_ref, "--"],
803+
["git", "diff", "--name-only", validated_ref, "--"],
803804
cwd=root_path,
804805
check=True,
805806
capture_output=True,
@@ -808,7 +809,7 @@ def _git_diff_lines_payload(
808809
)
809810
except (OSError, subprocess.CalledProcessError, subprocess.TimeoutExpired) as exc:
810811
raise MCPGitDiffError(
811-
f"Unable to resolve changed paths from git diff ref '{git_diff_ref}'."
812+
f"Unable to resolve changed paths from git diff ref '{validated_ref}'."
812813
) from exc
813814
return tuple(
814815
sorted({line.strip() for line in completed.stdout.splitlines() if line.strip()})

codeclone/pipeline.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,8 @@
1313
from pathlib import Path
1414
from typing import TYPE_CHECKING, Literal, cast
1515

16+
import orjson
17+
1618
from ._coerce import as_int, as_str
1719
from .cache import (
1820
Cache,
@@ -257,7 +259,7 @@ def _segment_groups_digest(segment_groups: GroupMap) -> str:
257259
for item in items
258260
]
259261
normalized_rows.append((group_key, tuple(normalized_items)))
260-
payload = repr(tuple(normalized_rows)).encode("utf-8")
262+
payload = orjson.dumps(tuple(normalized_rows), option=orjson.OPT_SORT_KEYS)
261263
return sha256(payload).hexdigest()
262264

263265

docs/architecture.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -224,7 +224,8 @@ Security boundaries:
224224
- `cache_policy=refresh` rejected to preserve read-only semantics.
225225
- Review markers are session-local in-memory state, never persisted.
226226
- Run history bounded by `--history-limit` to prevent unbounded memory growth.
227-
- `git_diff_ref` validated against strict regex to prevent injection.
227+
- `git_diff_ref` validated as a safe single revision expression before any
228+
`git diff` subprocess call.
228229

229230
---
230231

docs/book/09-cli.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,8 @@ Refs:
7575
- `--changed-only` requires either `--diff-against` or `--paths-from-git-diff`.
7676
- `--diff-against` requires `--changed-only`.
7777
- `--diff-against` and `--paths-from-git-diff` are mutually exclusive.
78+
- Git diff refs are validated as safe single revision expressions before
79+
subprocess execution.
7880
- Browser-open failure after a successful HTML write is warning-only and does not change the process exit code.
7981
- Baseline update write failure is contract error.
8082
- In gating mode, unreadable source files are contract errors with higher priority than clone gating failure.

docs/book/11-security-model.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -33,8 +33,9 @@ Security-relevant input classes:
3333
- `cache_policy=refresh` is rejected — MCP cannot trigger cache invalidation.
3434
- Review markers (`mark_finding_reviewed`) are session-local in-memory state;
3535
they are never persisted to disk or leaked into baselines/reports.
36-
- `git_diff_ref` parameter is validated against a strict regex to prevent
37-
command injection via shell-interpreted git arguments.
36+
- `git_diff_ref` is validated as a safe single revision expression before any
37+
`git diff` subprocess call. Leading option-like prefixes, whitespace/control
38+
characters, and unsupported punctuation are rejected.
3839
- Run history is bounded by `--history-limit` (default 10) to prevent
3940
unbounded memory growth.
4041

@@ -68,7 +69,7 @@ Refs:
6869
| HTML-injected payload in metadata/source | Escaped output |
6970
| `--allow-remote` not passed for HTTP | Transport rejected |
7071
| `cache_policy=refresh` requested | Policy rejected |
71-
| `git_diff_ref` fails regex | Parameter rejected |
72+
| `git_diff_ref` fails validation | Parameter rejected |
7273

7374
## Determinism / canonicalization
7475

0 commit comments

Comments
 (0)