Skip to content

Commit e83c17f

Browse files
committed
feat: use git to detect non-Python dependency file changes
Replace the fixed watched-file list with git-based change detection. mutmut now uses `git diff`/`git ls-files` to find every non-.py file changed since the last full run, falling back to the curated list when git is unavailable. A default exclude set (*.md, *.rst, docs/, LICENSE, etc.) drops files that never affect tests; users can extend it with `cache_invalidation_exclude`. The git commit and file hashes are persisted together as a baseline so a later git-less environment (e.g. a separate CI stage) can still detect changes to previously-tracked files by re-hashing them. New options: `use_git_change_detection` (default true) and `cache_invalidation_exclude`.
1 parent b539531 commit e83c17f

6 files changed

Lines changed: 415 additions & 18 deletions

File tree

README.rst

Lines changed: 31 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -409,8 +409,19 @@ Changes outside your Python source — a dependency upgrade, a data file, a
409409
config file — cannot be tied to a function, so they would otherwise be missed
410410
and you would get cached results that no longer reflect reality.
411411

412-
To catch this, mutmut hashes a set of build and dependency files and warns you
413-
when any of them change since the last run. By default it watches:
412+
To catch this, mutmut detects non-Python files that changed since the last full
413+
run and warns you about them. If your project is a git repository and git is
414+
installed, mutmut uses git (a soft dependency no extra package is required) to
415+
find every changed non-Python file, respecting your `.gitignore`. Python files
416+
are excluded because their changes are already tracked per function.
417+
418+
On a full run with git available, mutmut also records the content hashes of the
419+
tracked non-Python files. This means a later run in an environment without git
420+
(for example a different CI stage) can still detect changes to that known set of
421+
files, even though it cannot discover brand-new ones.
422+
423+
When git is unavailable, mutmut falls back to hashing a curated set of build and
424+
dependency files:
414425

415426
- `pyproject.toml`
416427
- `setup.cfg`
@@ -423,12 +434,22 @@ when any of them change since the last run. By default it watches:
423434

424435
You can watch additional files (for example data files your tests depend on)
425436
with the `cache_invalidation_files` config, which accepts glob patterns
426-
resolved against the project root:
437+
resolved against the project root. These are checked even when git ignores them,
438+
and are never dropped by the exclusions below:
427439

428440
.. code-block:: toml
429441
430442
cache_invalidation_files = [ "queries/*.sql", "config/*.yaml" ]
431443
444+
Git detection reports every changed non-Python file, so mutmut drops files that
445+
practically never affect tests (markdown, `LICENSE`, `CHANGELOG`, `docs/`, git
446+
and editor metadata, ...). Exclude additional noisy files with
447+
`cache_invalidation_exclude` (glob patterns, `*` spans directories):
448+
449+
.. code-block:: toml
450+
451+
cache_invalidation_exclude = [ "*.json", "fixtures/snapshots/*" ]
452+
432453
When a watched file changes, `on_dependency_change` controls what happens:
433454

434455
- `warn` (default): list the changed files and keep the cache.
@@ -439,6 +460,13 @@ When a watched file changes, `on_dependency_change` controls what happens:
439460
440461
on_dependency_change = "warn"
441462
463+
Git detection is on by default; disable it (forcing the curated-list fallback)
464+
with:
465+
466+
.. code-block:: toml
467+
468+
use_git_change_detection = false
469+
442470
Changes to mutmut's own result-affecting config (such as `pytest_add_cli_args`,
443471
`type_check_command`, or the timeout settings) are always detected and
444472
invalidate the affected cached results automatically.

src/mutmut/__main__.py

Lines changed: 167 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -835,18 +835,166 @@ def _invalidate_stale_dependency_edges() -> set[str]:
835835
"Pipfile.lock",
836836
)
837837

838+
# Files that practically never affect test behavior. Git change detection otherwise
839+
# surfaces every non-.py file in the repo, so these are dropped to cut the noise.
840+
# Users extend this via the ``cache_invalidation_exclude`` config; anything they
841+
# explicitly register in ``cache_invalidation_files`` is never excluded. Patterns are
842+
# matched with fnmatch (``*`` spans path separators).
843+
_DEFAULT_INVALIDATION_EXCLUDE = (
844+
"*.md",
845+
"*.rst",
846+
"LICENSE*",
847+
"COPYING*",
848+
"NOTICE*",
849+
"AUTHORS*",
850+
"CHANGELOG*",
851+
"CHANGES*",
852+
".gitignore",
853+
".gitattributes",
854+
".editorconfig",
855+
".pre-commit-config.yaml",
856+
"docs/*",
857+
"doc/*",
858+
)
859+
860+
861+
def _hash_files(paths: Iterable[str]) -> dict[str, str]:
862+
"""Content hash each existing path; missing files are simply omitted."""
863+
hashes: dict[str, str] = {}
864+
for p in paths:
865+
path = Path(p)
866+
if path.is_file():
867+
hashes[p] = hashlib.sha256(path.read_bytes()).hexdigest()[:12]
868+
return hashes
869+
838870

839871
def compute_watched_file_hashes() -> dict[str, str]:
840872
"""Map watched-file path -> content hash for the default set plus user globs."""
841873
patterns = list(_DEFAULT_WATCHED_FILES) + list(Config.get().cache_invalidation_files)
842-
hashes: dict[str, str] = {}
843-
for pattern in patterns:
844-
for path in sorted(Path(".").glob(pattern)):
845-
if path.is_file():
846-
hashes[str(path)] = hashlib.sha256(path.read_bytes()).hexdigest()[:12]
874+
paths = [str(path) for pattern in patterns for path in sorted(Path(".").glob(pattern))]
875+
return _hash_files(paths)
876+
877+
878+
def _run_git(args: list[str]) -> str | None:
879+
"""Run a git command at the project root. Returns stdout, or None on any failure
880+
(git not installed, not a repo, unknown ref, ...). Git is a soft dependency: this
881+
never raises so callers can silently fall back to content hashing.
882+
"""
883+
try:
884+
result = subprocess.run(["git", *args], capture_output=True, text=True, check=False)
885+
except OSError:
886+
return None
887+
if result.returncode != 0:
888+
return None
889+
return result.stdout
890+
891+
892+
def git_head() -> str | None:
893+
"""The current HEAD commit, or None when git / a repo / a commit is unavailable."""
894+
out = _run_git(["rev-parse", "HEAD"])
895+
return out.strip() if out else None
896+
897+
898+
def git_changed_non_py_files(since_ref: str) -> set[str] | None:
899+
"""Non-.py files changed since ``since_ref`` (tracked diffs against the working tree,
900+
including uncommitted edits, plus new untracked files). ``.py`` files are excluded
901+
because the per-function hashes already track them. Returns None if git cannot answer.
902+
"""
903+
diff = _run_git(["diff", "--name-only", since_ref, "--"])
904+
if diff is None:
905+
return None
906+
untracked = _run_git(["ls-files", "--others", "--exclude-standard"]) or ""
907+
files = {line for line in (diff + "\n" + untracked).splitlines() if line}
908+
return {f for f in files if not f.endswith(".py")}
909+
910+
911+
def git_tracked_non_py_files() -> set[str] | None:
912+
"""Every non-.py file git knows about (tracked + untracked-not-ignored), or None if
913+
git cannot answer. Recorded on a full run so a later git-less run can still detect
914+
changes to these files by re-hashing them.
915+
"""
916+
out = _run_git(["ls-files", "--cached", "--others", "--exclude-standard"])
917+
if out is None:
918+
return None
919+
return {line for line in out.splitlines() if line and not line.endswith(".py")}
920+
921+
922+
def _changed_hashed_files(restrict_to: list[str] | None = None) -> set[str]:
923+
"""Baseline files whose content changed, by re-hashing them now.
924+
925+
Re-hashes every path in the stored baseline (which, after a full run with git, is
926+
the comprehensive set of non-.py files) plus any newly-appearing curated/user-glob
927+
files. This is how a git-less run still detects changes to files git discovered.
928+
``restrict_to`` limits the result to paths matching those glob patterns.
929+
"""
930+
old = state().old_watched_file_hashes
931+
if not old:
932+
return set()
933+
new = _hash_files(old.keys())
934+
new.update(compute_watched_file_hashes()) # pick up newly-added curated/user files
935+
changed = {p for p in old.keys() | new.keys() if old.get(p) != new.get(p)}
936+
if restrict_to is not None:
937+
changed = {p for p in changed if any(fnmatch.fnmatch(p, pat) for pat in restrict_to)}
938+
return changed
939+
940+
941+
def _is_excluded(path: str, config: Config) -> bool:
942+
"""Whether ``path`` should be dropped from change reporting as noise.
943+
944+
Files explicitly registered in ``cache_invalidation_files`` are never excluded.
945+
"""
946+
if any(fnmatch.fnmatch(path, pat) for pat in config.cache_invalidation_files):
947+
return False
948+
patterns = list(_DEFAULT_INVALIDATION_EXCLUDE) + list(config.cache_invalidation_exclude)
949+
return any(fnmatch.fnmatch(path, pat) for pat in patterns)
950+
951+
952+
def _changed_dependency_files() -> set[str]:
953+
"""Files changed since the last full run that the per-function hashes cannot track.
954+
955+
Prefers git (catches every non-.py file in the repo and respects .gitignore) and
956+
falls back to hashing a curated set of build/dependency files when git is
957+
unavailable. Silent on the first run (no baseline to compare against). Noisy files
958+
(see ``_DEFAULT_INVALIDATION_EXCLUDE`` and ``cache_invalidation_exclude``) are dropped.
959+
"""
960+
config = Config.get()
961+
old_commit = state().old_git_commit
962+
if config.use_git_change_detection and old_commit is not None:
963+
git_changed = git_changed_non_py_files(old_commit)
964+
if git_changed is not None:
965+
# also catch explicitly-registered files that git ignores
966+
changed = git_changed | _changed_hashed_files(restrict_to=config.cache_invalidation_files)
967+
else:
968+
changed = _changed_hashed_files()
969+
else:
970+
changed = _changed_hashed_files()
971+
return {p for p in changed if not _is_excluded(p, config)}
972+
973+
974+
def _compute_baseline_file_hashes() -> dict[str, str]:
975+
"""The set of non-.py files to track, hashed. Always includes the curated/user-glob
976+
files; when git is available it also records every tracked non-.py file (minus noise)
977+
so a later git-less run can still detect changes to them.
978+
"""
979+
config = Config.get()
980+
hashes = compute_watched_file_hashes()
981+
if config.use_git_change_detection:
982+
tracked = git_tracked_non_py_files()
983+
if tracked is not None:
984+
hashes.update(_hash_files(sorted(p for p in tracked if not _is_excluded(p, config))))
847985
return hashes
848986

849987

988+
def _refresh_change_detection_baseline() -> None:
989+
"""Snapshot the current git commit and tracked-file hashes as the new baseline.
990+
991+
Only called on a full run; cached runs keep the previous baseline so a ``warn``
992+
keeps firing until the cache is actually rebuilt.
993+
"""
994+
state().git_commit = git_head()
995+
state().watched_file_hashes = _compute_baseline_file_hashes()
996+
997+
850998
def _reset_mutant_results(should_reset: Callable[[str, int], bool]) -> int:
851999
"""Reset cached verdicts to ``None`` (forcing a re-test) where ``should_reset`` holds.
8521000
@@ -871,27 +1019,24 @@ def _reset_mutant_results(should_reset: Callable[[str, int], bool]) -> int:
8711019

8721020

8731021
def _report_watched_file_changes() -> bool:
874-
"""Surface changes to watched config/dependency files.
1022+
"""Surface non-Python files that changed since the last full run.
8751023
8761024
Returns True only when the configured policy is ``rerun`` and something changed,
877-
asking the caller to reset all results. Silent when no prior hashes exist.
1025+
asking the caller to reset all results. Silent when there is no baseline yet.
8781026
"""
879-
old = state().old_watched_file_hashes
880-
if not old:
881-
return False
882-
new = compute_watched_file_hashes()
883-
changed = sorted(p for p in old.keys() | new.keys() if old.get(p) != new.get(p))
1027+
changed = _changed_dependency_files()
8841028
if not changed:
8851029
return False
8861030

8871031
policy = Config.get().on_dependency_change
8881032
if policy == "ignore":
8891033
return False
1034+
listed = sorted(changed)
8901035
if policy == "rerun":
891-
print(f" {len(changed)} watched file(s) changed; rerunning all mutants: {', '.join(changed)}")
1036+
print(f" {len(listed)} non-Python file(s) changed; rerunning all mutants: {', '.join(listed)}")
8921037
return True
8931038
# default: warn but keep the cache
894-
print(f" Warning: {len(changed)} watched file(s) changed since the last run: {', '.join(changed)}")
1039+
print(f" Warning: {len(listed)} non-Python file(s) changed since the last full run: {', '.join(listed)}")
8951040
print(" These cannot be tracked for behavioral changes, so cached results were kept.")
8961041
print(' If the changes affect your tests, delete the mutants/ directory or set on_dependency_change = "rerun".')
8971042
return False
@@ -945,6 +1090,8 @@ def collect_or_load_stats(
9451090
force_full = _apply_config_change_invalidation(mutants_caught_by_type_checker or {})
9461091

9471092
if not did_load or force_full:
1093+
# A full run rebuilds the cache, so reset the change-detection baseline to "now".
1094+
_refresh_change_detection_baseline()
9481095
# Run full stats
9491096
run_stats_collection(runner)
9501097
else:
@@ -986,6 +1133,10 @@ def load_stats() -> bool:
9861133
state().function_dependencies[k] = set(v)
9871134
state().old_config_fingerprint = data.pop("config_fingerprint", {})
9881135
state().old_watched_file_hashes = data.pop("watched_file_hashes", {})
1136+
state().old_git_commit = data.pop("git_commit", None)
1137+
# Preserve the loaded baseline; only a full run refreshes it.
1138+
state().watched_file_hashes = state().old_watched_file_hashes
1139+
state().git_commit = state().old_git_commit
9891140
assert not data, data
9901141
did_load = True
9911142
except (FileNotFoundError, JSONDecodeError):
@@ -1003,7 +1154,8 @@ def save_stats() -> None:
10031154
function_hashes=state().current_function_hashes,
10041155
function_dependencies={k: list(v) for k, v in state().function_dependencies.items()},
10051156
config_fingerprint=Config.get().config_fingerprint(),
1006-
watched_file_hashes=compute_watched_file_hashes(),
1157+
watched_file_hashes=state().watched_file_hashes,
1158+
git_commit=state().git_commit,
10071159
),
10081160
f,
10091161
indent=4,

src/mutmut/configuration.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -146,7 +146,9 @@ def _load_config() -> Config:
146146
track_dependencies=s("track_dependencies", True),
147147
dependency_tracking_depth=s("dependency_tracking_depth", None),
148148
cache_invalidation_files=s("cache_invalidation_files", []),
149+
cache_invalidation_exclude=s("cache_invalidation_exclude", []),
149150
on_dependency_change=s("on_dependency_change", "warn"),
151+
use_git_change_detection=s("use_git_change_detection", True),
150152
)
151153

152154

@@ -172,7 +174,9 @@ class Config:
172174
track_dependencies: bool
173175
dependency_tracking_depth: int | None
174176
cache_invalidation_files: list[str]
177+
cache_invalidation_exclude: list[str]
175178
on_dependency_change: str
179+
use_git_change_detection: bool
176180

177181
def config_fingerprint(self) -> dict[str, str]:
178182
"""Hash the config fields that can change cached mutant *results*, grouped so the

src/mutmut/state.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,13 @@ class MutmutState:
1212
# changes the per-function source hashes cannot see. Empty when absent (pre-upgrade
1313
# cache or first run), in which case no invalidation is triggered.
1414
old_config_fingerprint: dict[str, str] = field(default_factory=dict)
15+
# Change-detection baselines describe the state at the *last full run*. The ``old_``
16+
# values are what we compare against; the others are what gets persisted (only
17+
# refreshed on a full run, so a ``warn`` keeps firing until the cache is rebuilt).
1518
old_watched_file_hashes: dict[str, str] = field(default_factory=dict)
19+
watched_file_hashes: dict[str, str] = field(default_factory=dict)
20+
old_git_commit: str | None = None
21+
git_commit: str | None = None
1622

1723

1824
_state: MutmutState | None = None

0 commit comments

Comments
 (0)