Skip to content

Commit 527c4d0

Browse files
authored
Merge pull request #6204 from hjmjohnson/ingest-strategy-v4
COMP: Refine v4 ingestion pipeline — nested-if fix and global whitelist
2 parents 64c794e + f6505da commit 527c4d0

12 files changed

Lines changed: 136 additions & 183 deletions

Utilities/Maintenance/RemoteModuleIngest/INGESTION_STRATEGY_v4.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -275,7 +275,8 @@ by the v4 sanitizer; operators do not need to fix them by hand:
275275
| `cmake_minimum_required(VERSION X.Y.Z)` line at the top of an ingested module's `CMakeLists.txt` | `sanitize-history.py:patch_drop_cmake_minimum_required` | ITK's top-level CMakeLists pins a higher minimum; per-module declarations are redundant and frequently **lower** than the ITK floor (3.10.2 is common upstream). (@dzenanz, PR #6215 IOFDF) |
276276
| `README.rst` references in CMake `file(READ ...)` calls | `sanitize-history.py:patch_readme_reference` | Phase B archival promotes `MIGRATION_README.md` to `README.md`; in-tree consumers read the markdown form |
277277
| `get_filename_component(...) / file(READ README.md DOCUMENTATION)` preamble + `DESCRIPTION "${DOCUMENTATION}"` in `itk-module.cmake` | `sanitize-history.py:patch_dynamic_description` | Archival `README.md` contains semicolons and `[` characters that CMake list expansion splits into spurious `itk_module()` arguments, producing `CMake Warning (dev): Unknown argument` on every configure (observed: RLEImage, SplitComponents, IOFDF, IOMeshMZ3, IOMeshSTL; fixed in ITK PR #6220) |
278-
| `*.orig`, `*.rej`, `*.BACKUP.*`, `*.LOCAL.*`, `*.REMOTE.*`, `*.BASE.*` | deny-pass | Leftover merge-conflict artifacts |
278+
| `*.orig`, `*.rej`, `*.BACKUP.*`, `*.LOCAL.*`, `*.REMOTE.*`, `*.BASE.*`, `*~` | deny-pass | Leftover merge-conflict artifacts and editor backup files |
279+
| `.sha`, `.sha512`, `.cid` content-link sidecars missing trailing newline | `sanitize-history.py:apply_universal_text_fixers` (via `fix_end_of_file`) | ghostflow-check-main rejects commits with EOF-newline-missing in any text blob, including hash sidecars; the universal text fixers preserve hash content and append the missing `\n` (observed: TextureFeatures PR #6238) |
279280
| Scaffolding (`Dockerfile*`, `azure-pipelines*.yml`, `.github/`, `.travis.yml`, `.circleci/`, `tox.ini`, `pyproject.toml`, `setup.py`, `.clang-format`, `.pre-commit-config.yaml`, …) | deny-pass | Module's per-repo CI/packaging is irrelevant in-tree |
280281

281282
Each sanitizer prints a `<count> patches` line in the run summary so

Utilities/Maintenance/RemoteModuleIngest/ingest-module-v4.sh

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@
1818
# --upstream-url URL Override the GIT_REPOSITORY parsed from
1919
# Modules/Remote/<Module>.remote.cmake.
2020
# --whitelist FILE Override the default whitelist location
21-
# (whitelists/<Module>.list, this directory).
21+
# (whitelists/default.list, this directory).
2222
# --dry-run Run filter-repo + sanitize passes but do
2323
# NOT merge into ITK. Reports what would
2424
# land.
@@ -108,12 +108,13 @@ if ! $DRY_RUN && [[ -n "$(git -C "$ITK_SRC" status --porcelain)" ]]; then
108108
die "Working tree not clean; commit or stash first (or use --dry-run)"
109109
fi
110110

111-
# Resolve whitelist
111+
# Resolve whitelist. Default to the shared `whitelists/default.list`
112+
# which covers the canonical remote-module layout; modules with
113+
# unusual upstream layouts can override via --whitelist FILE.
112114
if [[ -z "$WHITELIST" ]]; then
113-
WHITELIST="$SCRIPT_DIR/whitelists/$MODULE.list"
115+
WHITELIST="$SCRIPT_DIR/whitelists/default.list"
114116
fi
115-
[[ -r "$WHITELIST" ]] || die "Whitelist not readable: $WHITELIST
116-
Create it from a sibling module's whitelist as a starting point."
117+
[[ -r "$WHITELIST" ]] || die "Whitelist not readable: $WHITELIST"
117118

118119
# Resolve clang-format / gersemi config files from the destination ITK tree
119120
CLANG_FORMAT_STYLE="$ITK_SRC/.clang-format"
@@ -246,7 +247,12 @@ info "Running scaffolding deny-pattern strip pass..."
246247
--path-glob '**/*.LOCAL.*' \
247248
--path-glob '**/*.REMOTE.*' \
248249
--path-glob '**/*.BASE.*' \
250+
--path-glob '**/*~' \
249251
--path-glob '**/.ExternalData_*' \
252+
--path-glob 'LICENSE' \
253+
--path-glob 'LICENSE.*' \
254+
--path-glob 'COPYING' \
255+
--path-glob 'COPYING.*' \
250256
--prune-empty always
251257
) || die "filter-repo deny-pass failed"
252258

Utilities/Maintenance/RemoteModuleIngest/sanitize-history.py

Lines changed: 93 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -232,13 +232,12 @@ def is_binary(head: bytes) -> bool:
232232

233233

234234
def is_skip_content(data: bytes, head: bytes) -> bool:
235-
"""Return True for content that should be left untouched (CID/sha
236-
content-links, VTK volumes, SVG, etc.)."""
235+
"""Return True for content that should be left untouched (VTK volumes,
236+
SVG, etc.). Hash sidecar files (.sha / .sha512 / .cid) are NOT skipped
237+
— they need universal text fixers so the hash gets a trailing newline,
238+
which ghostflow-check-main rejects when missing."""
237239
if any(s in head[:512] for s in SKIP_HINTS):
238240
return True
239-
# Single-token hex hash file (CID content-link sidecar)
240-
if data.count(b"\n") <= 1 and HEX_HASH_RE.match(data.strip()):
241-
return True
242241
return False
243242

244243

@@ -451,17 +450,95 @@ def patch_dynamic_description(data: bytes) -> tuple[bytes, bool]:
451450
"""Remove the file(READ README.md DOCUMENTATION) preamble and replace
452451
DESCRIPTION "${DOCUMENTATION}" with a static one-liner.
453452
454-
The archival README.md contains semicolons and bracket characters that
455-
CMake list expansion splits into spurious itk_module() arguments,
456-
producing CMake (dev) configure warnings. Returns (new_data, changed).
453+
Only rewrite `${DOCUMENTATION}` when the file(READ) preamble was the
454+
source. When DOCUMENTATION is defined via a literal `set(DOCUMENTATION
455+
"...")` block, the variable expands to safe static text and may be
456+
used as DESCRIPTION as-is.
457457
"""
458458
new_data, n1 = _README_FILE_READ_RE.subn(b"", data)
459-
new_data, n2 = _DOCUMENTATION_DESCRIPTION_RE.subn(
460-
rb'\1"Module ingested from upstream."', new_data
461-
)
459+
n2 = 0
460+
if n1 > 0:
461+
new_data, n2 = _DOCUMENTATION_DESCRIPTION_RE.subn(
462+
rb'\1"Module ingested from upstream."', new_data
463+
)
462464
return new_data, (n1 + n2) > 0
463465

464466

467+
# Strip per-module CMake scaffolding that is dead code in-tree:
468+
# * itk_module_examples() — examples/ directory not ingested
469+
# * cmake_policy(CMP...) — ITK top-level pins all policies
470+
# * doc-build option blocks — doc/ directory not ingested
471+
# Reviewers flag each of these on every ingest (e.g. PRs #6238, #6240).
472+
_ITK_MODULE_EXAMPLES_RE = re.compile(
473+
rb"^[ \t]*itk_module_examples\s*\([^)]*\)[ \t]*\n",
474+
re.IGNORECASE | re.MULTILINE,
475+
)
476+
_CMAKE_POLICY_RE = re.compile(
477+
rb"^[ \t]*cmake_policy\s*\(\s*(?:SET\s+CMP|VERSION\s)[^)]*\)[ \t]*\n",
478+
re.IGNORECASE | re.MULTILINE,
479+
)
480+
_DOC_OPTION_BLOCK_RE = re.compile(
481+
rb"^[ \t]*(?:cmake_dependent_option|option)\s*\("
482+
rb"\s*Module_\$\{[^}]+\}_BUILD_DOCUMENTATION[^)]*\)[ \t]*\n",
483+
re.IGNORECASE | re.MULTILINE,
484+
)
485+
_DOC_IF_OPEN_RE = re.compile(
486+
rb"^[ \t]*if\s*\(\s*Module_\$\{[^}]+\}_BUILD_DOCUMENTATION[^)]*\)[ \t]*\n",
487+
re.IGNORECASE | re.MULTILINE,
488+
)
489+
_CMAKE_IF_OPEN_RE = re.compile(rb"^[ \t]*if\s*\(", re.IGNORECASE | re.MULTILINE)
490+
_CMAKE_ENDIF_RE = re.compile(
491+
rb"^[ \t]*endif\s*\([^)]*\)[ \t]*\n", re.IGNORECASE | re.MULTILINE
492+
)
493+
494+
495+
def _strip_doc_if_blocks(data: bytes) -> tuple[bytes, int]:
496+
"""Depth-aware removal of `if(Module_${...}_BUILD_DOCUMENTATION) ... endif()`
497+
blocks, accounting for nested `if(...)` so the regex doesn't close on an
498+
inner endif (greptile finding on PR #6263)."""
499+
out = bytearray()
500+
pos = 0
501+
removed = 0
502+
while True:
503+
m = _DOC_IF_OPEN_RE.search(data, pos)
504+
if not m:
505+
out.extend(data[pos:])
506+
break
507+
out.extend(data[pos : m.start()])
508+
depth = 1
509+
scan = m.end()
510+
while depth > 0:
511+
next_if = _CMAKE_IF_OPEN_RE.search(data, scan)
512+
next_endif = _CMAKE_ENDIF_RE.search(data, scan)
513+
if not next_endif:
514+
out.extend(data[m.start() :])
515+
return bytes(out), removed
516+
if next_if and next_if.start() < next_endif.start():
517+
depth += 1
518+
scan = next_if.end()
519+
else:
520+
depth -= 1
521+
scan = next_endif.end()
522+
pos = scan
523+
removed += 1
524+
return bytes(out), removed
525+
526+
527+
def patch_drop_module_scaffolding(data: bytes) -> tuple[bytes, bool]:
528+
"""Strip itk_module_examples(), cmake_policy(...), and doc-build
529+
option/if blocks from CMake blobs. Returns (new_data, changed)."""
530+
total = 0
531+
new_data, n = _ITK_MODULE_EXAMPLES_RE.subn(b"", data)
532+
total += n
533+
new_data, n = _CMAKE_POLICY_RE.subn(b"", new_data)
534+
total += n
535+
new_data, n = _strip_doc_if_blocks(new_data)
536+
total += n
537+
new_data, n = _DOC_OPTION_BLOCK_RE.subn(b"", new_data)
538+
total += n
539+
return new_data, total > 0
540+
541+
465542
# Upstream remote modules wrap their top-level CMakeLists.txt in a
466543
# standalone-build guard so the same file works both as a
467544
# remote-fetched module and as a stand-alone CMake project:
@@ -579,6 +656,7 @@ def __init__(
579656
self.dynamic_description_patches = 0
580657
self.standalone_guard_patches = 0
581658
self.cmake_min_required_drops = 0
659+
self.module_scaffolding_drops = 0
582660
self.blob_count = 0
583661
self.format_changes = 0
584662
self.kind_counts: dict[str, int] = {}
@@ -607,6 +685,9 @@ def blob_callback(self, blob: fr.Blob, _metadata: dict) -> None:
607685
new_data, cmake_min_dropped = patch_drop_cmake_minimum_required(new_data)
608686
if cmake_min_dropped:
609687
self.cmake_min_required_drops += 1
688+
new_data, scaffolding_dropped = patch_drop_module_scaffolding(new_data)
689+
if scaffolding_dropped:
690+
self.module_scaffolding_drops += 1
610691
new_data = fmt_cmake(new_data, self.gersemi_bin, self.gersemi_config)
611692
new_data, readme_patched = patch_readme_reference(new_data)
612693
if readme_patched:
@@ -792,6 +873,7 @@ def main() -> int:
792873
f" dynamic DESCRIPTION rm:{sanitizer.dynamic_description_patches}\n"
793874
f" standalone-guard rm: {sanitizer.standalone_guard_patches}\n"
794875
f" cmake_min_required rm: {sanitizer.cmake_min_required_drops}\n"
876+
f" module-scaffolding rm: {sanitizer.module_scaffolding_drops}\n"
795877
f" blobs scanned: {sanitizer.blob_count}\n"
796878
f" blobs reformatted: {sanitizer.format_changes}\n"
797879
f" by-kind sniff: "

Utilities/Maintenance/RemoteModuleIngest/whitelists/IOFDF.list

Lines changed: 0 additions & 24 deletions
This file was deleted.

Utilities/Maintenance/RemoteModuleIngest/whitelists/IOMeshMZ3.list

Lines changed: 0 additions & 24 deletions
This file was deleted.

Utilities/Maintenance/RemoteModuleIngest/whitelists/IOMeshSTL.list

Lines changed: 0 additions & 25 deletions
This file was deleted.

Utilities/Maintenance/RemoteModuleIngest/whitelists/PolarTransform.list

Lines changed: 0 additions & 21 deletions
This file was deleted.

Utilities/Maintenance/RemoteModuleIngest/whitelists/PrincipalComponentsAnalysis.list

Lines changed: 0 additions & 8 deletions
This file was deleted.

Utilities/Maintenance/RemoteModuleIngest/whitelists/SplitComponents.list

Lines changed: 0 additions & 21 deletions
This file was deleted.

Utilities/Maintenance/RemoteModuleIngest/whitelists/SubdivisionQuadEdgeMeshFilter.list

Lines changed: 0 additions & 20 deletions
This file was deleted.

0 commit comments

Comments
 (0)