Skip to content

Commit f4fcf6d

Browse files
committed
ENH: v4 sanitize plugs ghostflow gaps for hash sidecars and *~ artifacts
Two ghostflow-check-main violations slipped through v4 sanitization on TextureFeatures ingest (PR InsightSoftwareConsortium#6238): 1. .sha512 / .cid content-link sidecars without trailing newline. is_skip_content() returned True for any single-line hex hash blob, bypassing apply_universal_text_fixers (and therefore fix_end_of_file). Hash content has zero risk from the universal fixers (rstrip + single '\n' append), so drop the hex-hash skip branch and let those blobs flow through the normal text-fixer path. 2. Editor backup files (*~) survived ingestion. Adds **/*~ to the filter-repo --invert-paths deny-pass alongside *.orig / *.rej / *.BACKUP.* / *.LOCAL.* / *.REMOTE.* / *.BASE.*. Documented both in INGESTION_STRATEGY_v4.md's sanitizer table.
1 parent 6d786b5 commit f4fcf6d

3 files changed

Lines changed: 7 additions & 6 deletions

File tree

Utilities/Maintenance/RemoteModuleIngest/INGESTION_STRATEGY_v4.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -275,7 +275,8 @@ by the v4 sanitizer; operators do not need to fix them by hand:
275275
| `cmake_minimum_required(VERSION X.Y.Z)` line at the top of an ingested module's `CMakeLists.txt` | `sanitize-history.py:patch_drop_cmake_minimum_required` | ITK's top-level CMakeLists pins a higher minimum; per-module declarations are redundant and frequently **lower** than the ITK floor (3.10.2 is common upstream). (@dzenanz, PR #6215 IOFDF) |
276276
| `README.rst` references in CMake `file(READ ...)` calls | `sanitize-history.py:patch_readme_reference` | Phase B archival promotes `MIGRATION_README.md` to `README.md`; in-tree consumers read the markdown form |
277277
| `get_filename_component(...) / file(READ README.md DOCUMENTATION)` preamble + `DESCRIPTION "${DOCUMENTATION}"` in `itk-module.cmake` | `sanitize-history.py:patch_dynamic_description` | Archival `README.md` contains semicolons and `[` characters that CMake list expansion splits into spurious `itk_module()` arguments, producing `CMake Warning (dev): Unknown argument` on every configure (observed: RLEImage, SplitComponents, IOFDF, IOMeshMZ3, IOMeshSTL; fixed in ITK PR #6220) |
278-
| `*.orig`, `*.rej`, `*.BACKUP.*`, `*.LOCAL.*`, `*.REMOTE.*`, `*.BASE.*` | deny-pass | Leftover merge-conflict artifacts |
278+
| `*.orig`, `*.rej`, `*.BACKUP.*`, `*.LOCAL.*`, `*.REMOTE.*`, `*.BASE.*`, `*~` | deny-pass | Leftover merge-conflict artifacts and editor backup files |
279+
| `.sha`, `.sha512`, `.cid` content-link sidecars missing trailing newline | `sanitize-history.py:apply_universal_text_fixers` (via `fix_end_of_file`) | ghostflow-check-main rejects commits with EOF-newline-missing in any text blob, including hash sidecars; the universal text fixers preserve hash content and append the missing `\n` (observed: TextureFeatures PR #6238) |
279280
| Scaffolding (`Dockerfile*`, `azure-pipelines*.yml`, `.github/`, `.travis.yml`, `.circleci/`, `tox.ini`, `pyproject.toml`, `setup.py`, `.clang-format`, `.pre-commit-config.yaml`, …) | deny-pass | Module's per-repo CI/packaging is irrelevant in-tree |
280281

281282
Each sanitizer prints a `<count> patches` line in the run summary so

Utilities/Maintenance/RemoteModuleIngest/ingest-module-v4.sh

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -246,6 +246,7 @@ info "Running scaffolding deny-pattern strip pass..."
246246
--path-glob '**/*.LOCAL.*' \
247247
--path-glob '**/*.REMOTE.*' \
248248
--path-glob '**/*.BASE.*' \
249+
--path-glob '**/*~' \
249250
--path-glob '**/.ExternalData_*' \
250251
--path-glob 'LICENSE' \
251252
--path-glob 'LICENSE.*' \

Utilities/Maintenance/RemoteModuleIngest/sanitize-history.py

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -232,13 +232,12 @@ def is_binary(head: bytes) -> bool:
232232

233233

234234
def is_skip_content(data: bytes, head: bytes) -> bool:
235-
"""Return True for content that should be left untouched (CID/sha
236-
content-links, VTK volumes, SVG, etc.)."""
235+
"""Return True for content that should be left untouched (VTK volumes,
236+
SVG, etc.). Hash sidecar files (.sha / .sha512 / .cid) are NOT skipped
237+
— they need universal text fixers so the hash gets a trailing newline,
238+
which ghostflow-check-main rejects when missing."""
237239
if any(s in head[:512] for s in SKIP_HINTS):
238240
return True
239-
# Single-token hex hash file (CID content-link sidecar)
240-
if data.count(b"\n") <= 1 and HEX_HASH_RE.match(data.strip()):
241-
return True
242241
return False
243242

244243

0 commit comments

Comments
 (0)