Refactor/api redesign by galenlynch · Pull Request #60 · AllenNeuralDynamics/aind-zarr-utils

galenlynch · 2026-04-28T20:51:36Z

No description provided.

This is the first commit of the asset-centric API redesign described in the refactor plan. It moves I/O primitives out of zarr.py and pipeline_transformed.py into a new io/ subpackage: - io/metadata.py: direction_from_*, _units_to_meter, _unit_conversion - io/zarr.py: _open_zarr, zarr_to_numpy, _zarr_to_scaled, ensure_native_endian - io/paths.py: _asset_from_zarr_*, _zarr_base_name_*, alignment_zarr_uri_and_metadata_from_zarr_or_asset_pathlike - io/processing.py: _get_processing_pipeline_data, _get_zarr_import_process, _get_image_atlas_alignment_process, image_atlas_alignment_path_relative_from_processing - io/transforms.py: TransformChain, TemplatePaths, _PIPELINE_* constants, pipeline_transforms, all _pipeline_*_local_paths and pipeline_*_local_paths The original module paths re-export the moved symbols so existing imports and test patches keep working. mock_transform_path_resolution patches the new io.transforms location; mock_zarr_operations patches both io.zarr and the legacy zarr re-export so call sites in either module are intercepted. Behavior is unchanged. All 121 tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…age (C2) Second commit of the asset-centric API redesign. Splits the overlay system into two cohesive modules: - domain/overlays.py: concrete overlay classes (SpacingScaleOverlay, FlipIndexAxesOverlay, PermuteIndexAxesOverlay, ForceCornerAnchorOverlay, ForceCornerAnchorOverlayWithVersionWarning, SetLpsWorldSpacingOverlay) plus cardinal-direction helpers (_require_cardinal, lps_world_to_index_spacing_cardinal), Vec3 / _PIPELINE_MULTISCALE_FACTOR, and estimate_pipeline_multiscale. - domain/selector.py: the Overlay protocol, OverlayRule, OverlaySelector, apply_overlays, _as_date, _base_rules, get_selector, extend_selector, make_selector. Imports the two concrete overlays referenced by the default rule set from domain.overlays — selector → overlays only. pipeline_domain_selector.py becomes a re-export shim (also re-exporting fix_corner_compute_origin so the legacy patch target still resolves). mock_overlay_selector now patches both the new domain.selector home and the legacy pipeline_transformed re-export; the corner-anchor overlay test patches both domain.overlays and pipeline_domain_selector. Behavior is unchanged. All 121 tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Third commit of the asset-centric API redesign. Splits format-specific readers into their own subpackage: - formats/neuroglancer.py: full move of neuroglancer.py contents (neuroglancer_annotations_to_*, get_image_sources, all helpers). - formats/swc.py: a pure swc_data_to_indices(spacing, ...) helper that takes level-0 spacing explicitly and performs unit conversion + axis reordering + rounding. No Zarr opening, no Asset coupling. The legacy ``aind_zarr_utils.neuroglancer`` is now a re-export shim. The orchestration wrapper ``swc_data_to_zarr_indices`` in ``pipeline_transformed`` keeps its signature but delegates to ``formats.swc.swc_data_to_indices`` after fetching the spacing from ``_zarr_to_scaled``. mock_annotation_functions in conftest.py patches both the new formats.neuroglancer home and the legacy neuroglancer re-export so that call sites in either module are intercepted. Behavior is unchanged. All 121 tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… image.py (C4) Fourth commit of the asset-centric API redesign. Splits the heaviest duplication block (sitk vs ANTs apply-overlay paths, ~320 LOC) into one type-dispatching function plus a small pure helper for the level > 0 ANTs convention conversion. New module: - image.py: - _pipeline_anatomical_check_args (moved from pipeline_transformed) - _apply_pipeline_overlays_to_header (moved) - _build_pipeline_header (moved + renamed from _mimic_pipeline_anatomical_header) - _to_ants_convention(corrected_header_sitk, level, ants_shape, ants_direction) -> (spacing, origin) — pure helper. Encodes the spacing-reverse + scale + opposite-corner origin recompute that the level>0 ANTs path requires because ANTs and SimpleITK transpose numpy arrays relative to each other. - apply_pipeline_overlays(img, ...) — type-dispatches on sitk.Image vs ANTsImage. Level=0 path uses AnatomicalHeader.from_<backend>/update_<backend>; level>0 sitk scales spacing by 2**level; level>0 ants delegates to _to_ants_convention. Direction matrix is left unchanged (assumes overlays don't modify direction; true for the default rule set). In pipeline_transformed.py: - _pipeline_anatomical_check_args, _apply_pipeline_overlays_to_header, _mimic_pipeline_anatomical_header, _build_pipeline_header become re-exports from image.py. - apply_pipeline_overlays_to_sitk and _to_ants are now 6-line wrappers calling apply_pipeline_overlays. The old ~150-LOC bodies are gone. New regression tests (tests/test_image.py, +10 tests): - _to_ants_convention math is pinned for identity and RPI directions at levels 1, 2, 3. - The opposite-corner code derivation is verified. - Tests use the original inline formula as the regression oracle. Behavior is unchanged. All 121 existing tests still pass; 131 total now. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Fifth commit of the asset-centric API redesign. Introduces the new user-facing surface as a façade over the existing free functions — no behaviour change, no public-API removal. New modules: - origin.py: Origin tagged union with classmethods Origin.default(), Origin.at(point), Origin.at_corner(code, lps_point). Internal _legacy_kwargs() translates to the existing (set_origin, set_corner, set_corner_lps) triple so Asset can delegate to zarr_to_sitk / zarr_to_ants / zarr_to_sitk_stub without those functions changing. - asset.py: - TransformPaths: small frozen dataclass holding the four resolved transform-chain values (point_paths, point_invert, image_paths, image_invert) so callers don't have to remember tuple order. - Asset: dataclass owning zarr_uri + metadata + processing plus per-asset config. Three eager constructors (from_zarr, from_root, from_neuroglancer) that pre-open the Zarr and load metadata files; bare Asset(zarr_uri, metadata, processing) does no I/O. Lazy properties opened_zarr (caches _open_zarr) and transforms (caches pipeline_transforms_local_paths). Methods image(), stub(), apply_overlays() delegate to the existing free functions, threading opened_zarr=self.opened_zarr so a single workflow opens the Zarr exactly once. overlay_selector defaults via field(default_factory=get_selector) so the singleton is resolved at instance-creation time (the lru_cache guarantees it's the same singleton every call) — avoiding the import-time binding trap of `= get_selector()` default-arg. Asset and Origin are exported from the package __init__; the legacy free-function exports are unchanged. Tests (tests/test_asset.py, +14 tests): - Origin classmethods + frozen invariant + _legacy_kwargs translation. - Explicit Asset(...) does no I/O (mocks confirm zero S3 calls). - Default overlay_selector is the cached singleton. - opened_zarr is lazy and called exactly once per asset. - from_root pre-opens the Zarr; from_neuroglancer pulls the URL. - A single image() + stub() flow opens the Zarr once and threads the cached tuple to both delegated functions. - Origin defaults / at_corner propagate correctly to zarr_to_sitk. All 145 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Sixth commit of the asset-centric API redesign. Coordinate spaces become first-class types: every (N,3) point array now carries a Space tag, and Asset.transform(points, to=Space.X) walks a small transform graph instead of dispatching among a half-dozen named free functions. New module: - points.py: - Space enum: ZARR_INDICES, LS_SCALED_MM, LS_ANATOMICAL_MM, LS_PIPELINE_ANATOMICAL_MM, CCF_MM. - Points frozen dataclass: values (dict[str, NDArray]) + space + optional per-point descriptions (list[str|None]). - Points.from_neuroglancer wraps the existing reader and types descriptions as lists. Points.from_swc converts SWC coords to LS_SCALED_MM (unit conversion + axis reorder, no Zarr access). - Eight edge functions, one per directed adjacency: INDICES↔SCALED : per-axis multiply / divide INDICES↔ANAT : annotation_indices_to_anatomical with the base stub INDICES↔PIPE_ANAT: same with the pipeline-corrected stub PIPE_ANAT→CCF : per-layer apply_ants_transforms (point chain) CCF→PIPE_ANAT : batched apply_ants_transforms (image chain) Mirrors the math in the existing indices_to_ccf / ccf_to_indices helpers, including the per-layer-vs-batched distinction across direction. - Adjacency table forms a tree rooted at ZARR_INDICES; _path() walks it via BFS. transform_points(asset, pts, to=) does the dispatch. In asset.py: - Asset.transform(points, *, to) method delegates to transform_points via a local import (breaks the points → asset cycle that points.py resolves with TYPE_CHECKING). Asset, Origin, Points, Space all exported from the package __init__. Test/fixture updates: - conftest.py mock_apply_overlays now accepts **kwargs (the real apply_overlays gained a zarr_import_version kwarg in C2 — the mock was technically broken since then but didn't surface until C6 wired apply_overlays into the new path). - mock_overlay_selector now patches aind_zarr_utils.image.apply_overlays too (C4 introduced that call site). - mock_ants_transforms now patches aind_zarr_utils.points.apply_ants_transforms_to_point_arr for the new transform-graph edges. Tests (tests/test_points.py, +26 tests): - Points construction and from_swc / from_neuroglancer. - _path resolver: identity, adjacent, through-the-tree, sibling-via-root. - Each edge with the real on-disk Zarr fixture; CCF-crossing edges use mock_ants_transforms. - Asset.transform end-to-end across multi-hop paths. - Identity transform returns the input unchanged. - Descriptions propagate across hops. - Adjacency / dispatch consistency check. Behavior unchanged for legacy code paths. All 171 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Seventh commit of the asset-centric API redesign. Most C7 work landed alongside C5 and C6 (Asset/Origin/Points/Space tests, transform-graph tests, edge tests against the real Zarr fixture). This commit closes the two outstanding gaps the plan called out: 1. Multi-call cache invariant. A single Asset workflow that touches image() + stub() + transform() (across CCF) opens the Zarr exactly once and resolves the pipeline transform-chain paths exactly once. Catches future regressions where someone forgets to thread asset.opened_zarr or asset.transforms. 2. New-vs-legacy numerical equivalence. With identical metadata, processing, and overlay-selector inputs (and ANTs mocked deterministically), Asset.transform(Points.from_neuroglancer(ng), to=Space.CCF_MM) produces numerical values identical to the legacy neuroglancer_to_ccf(ng, ...) helper. Tests added: - tests/test_asset.py::TestAssetCachesAcrossCalls::test_full_workflow_opens_once_resolves_transforms_once - tests/test_points.py::test_neuroglancer_to_ccf_numerical_equivalence_new_vs_legacy The C4 regression test (level>0 ANTs convention) was already promoted to a permanent fixture as part of C4; no further action needed. The untracked reference demos (tests/demo_ng_ccf.py, tests/demo_swc.py, scripts/neuroglancer_injections_coords_utils.py) were also rewritten in-place against the new Asset/Points/Space API but remain untracked per the plan; they're not part of this commit. All 173 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Final commit of the asset-centric API redesign branch. Marks the four auto-discovery convenience helpers as deprecated (they have clean ``Asset``-based replacements with no internal use), bumps the version, and documents the redesign in the CHANGELOG. Deprecated (DeprecationWarning, will be removed in a future release): - ``neuroglancer_to_ccf_auto_metadata`` - ``swc_data_to_ccf_auto_metadata`` - ``indices_to_ccf_auto_metadata`` - ``ccf_to_indices_auto_metadata`` Each warning message names the precise ``Asset.from_*().transform(...)`` replacement. The lower-level free functions (``zarr_to_sitk`` / ``zarr_to_ants`` / ``zarr_to_sitk_stub`` / ``mimic_pipeline_zarr_to_anatomical_stub`` / ``neuroglancer_to_ccf`` / ``ccf_to_indices``) are NOT deprecated this release: they are still called internally by the new ``Asset`` façade and the transform-graph edges. Deprecating them would generate spurious warnings from internal use. They will be deprecated once ``Asset`` calls are routed away from them in a future commit. A pyproject ``filterwarnings`` entry suppresses the new DeprecationWarnings inside our own test suite — the legacy paths are still exercised as regression coverage for the deprecation cycle. Version: 0.14.0 → 0.15.0. Stays on 0.x per the agreed BC plan. All 173 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

+    @property
+    def name(self) -> str:
+        """Human-friendly identifier for the overlay (used in logs/audits)."""
+        ...


+    @property
+    def priority(self) -> int:
+        """Execution priority; lower numbers run earlier."""
+        ...


+        zarr_import_version: str | None = None,
+    ) -> AnatomicalHeader:
+        """Apply the overlay and return a new ``AnatomicalHeader``."""
+        ...


+from aind_zarr_utils.annotations import (
+    annotation_indices_to_anatomical as annotation_indices_to_anatomical,
+)


+from aind_zarr_utils.formats.neuroglancer import (
+    _extract_spacing as _extract_spacing,
+)


+from aind_zarr_utils.formats.neuroglancer import (
+    _get_layer_by_name as _get_layer_by_name,
+)


+from aind_zarr_utils.formats.neuroglancer import (
+    _process_annotation_layers as _process_annotation_layers,
+)


+from aind_zarr_utils.formats.neuroglancer import (
+    _process_layer_and_descriptions as _process_layer_and_descriptions,
+)


+from aind_zarr_utils.formats.neuroglancer import (
+    _resolve_layer_names as _resolve_layer_names,
+)


+from aind_zarr_utils.formats.neuroglancer import (
+    _sanitize_source_url as _sanitize_source_url,
+)


galenlynch and others added 8 commits April 28, 2026 11:01

github-code-quality Bot found potential problems Apr 28, 2026

View reviewed changes

refactor: port fixes on main to refactor branch

87678cd

galenlynch force-pushed the refactor/api-redesign branch from 55d3c35 to 87678cd Compare April 29, 2026 00:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor/api redesign#60

Refactor/api redesign#60
galenlynch wants to merge 9 commits intomainfrom
refactor/api-redesign

galenlynch commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

galenlynch commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant