ENH: image and transform baselines and experiment notebook testing by aylward · Pull Request #26 · Project-MONAI/physiomotion4d

aylward · 2026-02-09T22:28:08Z

Add TestTools (test_tools.py): compare 2D/3D image slices and ITK transforms to baselines with configurable tolerances; ITK .mha I/O
Add notebook_utils.running_as_test() for reduced params when run via pytest
Add --run-experiments flag, experiment marker, and tests/baselines
Use TestTools in test_register_time_series_images for image and transform comparison
Update experiment notebooks and test docs (EXPERIMENT_FLAG_USAGE, EXPERIMENT_TESTS_GUIDE)

…t support - Add TestTools (test_tools.py): compare 2D/3D image slices and ITK transforms to baselines with configurable tolerances; ITK .mha I/O - Add notebook_utils.running_as_test() for reduced params when run via pytest - Add --run-experiments flag, experiment marker, and tests/baselines - Use TestTools in test_register_time_series_images for image and transform comparison - Update experiment notebooks and test docs (EXPERIMENT_FLAG_USAGE, EXPERIMENT_TESTS_GUIDE)

Copilot

Pull request overview

This PR adds baseline-based regression testing utilities for images/transforms and introduces a “running as test” mechanism for experiment notebooks (via PHYSIOMOTION_RUNNING_AS_TEST) so notebooks can run with reduced parameters under pytest.

Changes:

Added TestTools utilities to write/compare ITK images/transforms against baselines with tolerances.
Added notebook_utils.running_as_test() and updated experiment test runner to set PHYSIOMOTION_RUNNING_AS_TEST=1 when executing notebooks.
Updated time-series registration tests and multiple experiment notebooks/docs to use the new testing flow.

Reviewed changes

Copilot reviewed 39 out of 40 changed files in this pull request and generated 11 comments.

Show a summary per file

File	Description
tests/test_register_time_series_images.py	Switches from “write artifacts” to baseline comparison workflow for key outputs.
tests/test_experiments.py	Passes `PHYSIOMOTION_RUNNING_AS_TEST=1` into notebook execution subprocess env.
tests/conftest.py	Adds `tests/baselines` directory to the `test_directories` fixture; refactors a data-path fixture.
tests/baselines/.gitkeep	Ensures baselines directory is present in the repo.
tests/EXPERIMENT_TESTS_GUIDE.md	Documents `PHYSIOMOTION_RUNNING_AS_TEST` and recommended notebook checks.
tests/EXPERIMENT_FLAG_USAGE.md	Documents the new test-mode flag behavior and links to guide.
src/physiomotion4d/test_tools.py	New baseline comparison/writer utilities for ITK images/transforms.
src/physiomotion4d/physiomotion4d_base.py	Adds warning filters for specific SWIG-related DeprecationWarnings.
src/physiomotion4d/notebook_utils.py	Adds `running_as_test()` helper to detect test-mode in notebooks.
pyproject.toml	Enables always-on warnings (`-W always`) in pytest addopts.
experiments/README.md	Documents the test-mode flag for experiment notebooks.
experiments/Reconstruct4DCT/reconstruct_4d_ct_class.ipynb	Uses `running_as_test()` to select quick vs full run parameters.
experiments/Reconstruct4DCT/reconstruct_4d_ct.ipynb	Notebook execution metadata updated (timings).
experiments/Lung-GatedCT_To_USD/Experiment_SubSurfaceScatter.ipynb	Notebook execution metadata updated (timings).
experiments/Lung-GatedCT_To_USD/Experiment_SegReg.ipynb	Notebook execution metadata updated (timings).
experiments/Lung-GatedCT_To_USD/Experiment_CombineModels.ipynb	Notebook execution metadata updated (timings).
experiments/Lung-GatedCT_To_USD/Experiment_ArrangeOnStage.ipynb	Notebook execution metadata updated (timings).
experiments/Lung-GatedCT_To_USD/2-paint_dirlab_models.ipynb	Notebook execution metadata updated (timings).
experiments/Lung-GatedCT_To_USD/1-make_dirlab_models.ipynb	Notebook execution metadata updated (timings).
experiments/Lung-GatedCT_To_USD/0-register_dirlab_4dct.ipynb	Notebook execution metadata updated (timings).
experiments/Heart-VTKSeries_To_USD/1-heart_vtkseries_to_usd.ipynb	Notebook execution metadata updated (timings).
experiments/Heart-VTKSeries_To_USD/0-download_and_convert_4d_to_3d.ipynb	Notebook execution metadata updated (timings).
experiments/Heart-Statistical_Model_To_Patient/heart_model_to_patient.ipynb	Notebook execution metadata updated (timings).
experiments/Heart-Statistical_Model_To_Patient/heart_model_to_model_registration_pca.ipynb	Uses `running_as_test()` to reduce iterations when executed under pytest.
experiments/Heart-Statistical_Model_To_Patient/heart_model_to_model_icp_itk.ipynb	Notebook execution metadata updated (timings) and widget state changes.
experiments/Heart-GatedCT_To_USD/test_vista3d_inMem.ipynb	Notebook execution metadata and widget state changes.
experiments/Heart-GatedCT_To_USD/test_vista3d_class.ipynb	Notebook execution metadata and widget state changes.
experiments/Heart-GatedCT_To_USD/4-merge_dynamic_and_static_usd.ipynb	Notebook execution metadata updated (timings).
experiments/Heart-GatedCT_To_USD/3-transform_dynamic_and_static_contours.ipynb	Notebook execution metadata updated (timings).
experiments/Heart-GatedCT_To_USD/2-generate_segmentation.ipynb	Notebook execution metadata updated (timings).
experiments/Heart-GatedCT_To_USD/1-register_images.ipynb	Notebook execution metadata updated (timings).
experiments/Heart-GatedCT_To_USD/0-download_and_convert_4d_to_3d.ipynb	Now contains execution counts and outputs (not cleared).
experiments/Heart-Create_Statistical_Model/4-surfaces_aligned_correspond_to_pca_inputs.ipynb	Notebook execution metadata and widget state changes.
experiments/Heart-Create_Statistical_Model/2-input_surfaces_to_surfaces_aligned.ipynb	Notebook execution metadata and widget state changes.
experiments/Heart-Create_Statistical_Model/1-input_meshes_to_input_surfaces.ipynb	Notebook execution metadata and widget state changes.
experiments/Convert_VTK_To_USD/convert_vtk_to_usd_using_class.ipynb	Notebook execution metadata updated (timings).
experiments/Convert_VTK_To_USD/convert_chop_valve_to_usd.ipynb	Notebook execution metadata updated (timings).
experiments/Colormap-VTK_To_USD/colormap_vtk_to_usd.ipynb	Notebook execution metadata updated (timings).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-09T22:34:01Z

+            moving_image, "basic_time_series_registered_0.mha"
+        )
+        test_tools.compare_result_to_baseline_image(
+            "basic_time_series_registered_0.mha",
        )


The baseline image comparison result is ignored here as well, so the test can pass even if the registered image differs from the baseline. Please assert the boolean return (or raise on failure in TestTools).

Copilot · 2026-02-09T22:34:02Z

+            forward_transforms[0], "prior_forward_transform_0.hdf"
+        )
+        test_tools.compare_result_to_baseline_transform(
+            "prior_forward_transform_0.hdf",
+        )


The baseline transform comparison returns a boolean but isn't asserted, so regressions won't fail this test. Please assert the return value (or have TestTools raise on failure).

Copilot · 2026-02-09T22:34:02Z

+            registered_image, "transform_application_time_series_0.mha"
+        )
+        test_tools.compare_result_to_baseline_image(
+            "transform_application_time_series_0.mha",
        )


The baseline comparison return value is ignored here too, so this test can pass even if the image differs from the baseline. Please assert the return value (or raise on failure).

Copilot · 2026-02-09T22:34:02Z

+        if not baseline_path.exists():
+            shutil.copy(str(results_path), str(baseline_path))
+            self.log_warning(
+                "Baseline transform did not exist; copied results transform: %s",
+                results_path,


Auto-creating the baseline by copying the current result when the baseline file is missing makes the comparison meaningless on a fresh checkout/CI run (it will always pass). Consider failing when the baseline is missing by default, and only allowing baseline generation behind an explicit flag/env var (e.g. UPDATE_BASELINES=1).

Copilot · 2026-02-09T22:34:02Z

+        self._last_transform_num_values_above_tol = int(
+            np.sum(diff_squared > per_value_tol)
+        )


diff_squared is compared against per_value_tol here, which makes tolerance semantics inconsistent (squared values vs linear tolerance). Either compare abs(diff) > per_value_tol, or rename/document the argument as a squared tolerance.

Copilot · 2026-02-09T22:34:03Z

+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'./results//slice_fixed.mha'"


This notebook cell has committed execution output (non-empty outputs). Please clear cell outputs (and any widget state/output) before committing so the notebook remains deterministic and the repo doesn’t grow from embedded outputs.

Copilot · 2026-02-09T22:34:03Z

+            forward_transforms[0], "basic_forward_transform_0.hdf"
+        )
+        test_tools.compare_result_to_baseline_transform(
+            "basic_forward_transform_0.hdf",
+        )


The baseline transform comparison returns a boolean but the test doesn't assert it, so mismatches will not fail the test (only get logged). Please assert the return value (or have TestTools raise on failure).

Copilot · 2026-02-09T22:34:03Z

+            moving_image, "prior_time_series_registered_0.mha"
+        )
+        test_tools.compare_result_to_baseline_image(
+            "prior_time_series_registered_0.mha",
+        )


The baseline image comparison return value is ignored, so mismatches won't fail the test. Please assert the return value (or have TestTools raise on failure).

Copilot · 2026-02-09T22:34:03Z

+        transform = itk.transformread(str(results_path))
+        transform_params = np.array(transform[0].GetParameters())
+
+        baseline_transform = itk.transformread(str(baseline_path))
+        baseline_transform_params = np.array(baseline_transform[0].GetParameters())


transformread() returns a list; this code assumes both reads return at least one transform and that parameter vectors are the same length. Please validate the number of transforms read and that len(parameters) matches before computing the diff, and raise a clear error otherwise.

Copilot · 2026-02-09T22:34:04Z

   "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 1,
   "metadata": {
    "execution": {
-     "iopub.execute_input": "2026-02-04T02:35:36.101493Z",
-     "iopub.status.busy": "2026-02-04T02:35:36.100494Z",
-     "iopub.status.idle": "2026-02-04T02:35:51.037775Z",
-     "shell.execute_reply": "2026-02-04T02:35:51.036978Z"
+     "iopub.execute_input": "2026-02-09T04:51:59.431418Z",


This notebook is committed with non-null execution_count values. The experiment test runner is designed to clear outputs/execution counts to keep the repo clean, so please reset execution_count back to null before committing.

- TestTools: compare_result_to_baseline_transform and compare_result_to_baseline_image - TestRegisterTimeSeriesImages: baseline .hdf transforms and .mha images - pytest --create-baselines support; CI and test docs updated

Copilot AI review requested due to automatic review settings February 9, 2026 22:28

aylward changed the title ~~ENH: image and transform baseline testing and experiment notebook tes…~~ ENH: image and transform baselines and experiment notebook testing Feb 9, 2026

Copilot started reviewing on behalf of aylward February 9, 2026 22:29 View session

Copilot AI reviewed Feb 9, 2026

View reviewed changes

Add baseline comparison tests for time-series image registration

ee1cc10

- TestTools: compare_result_to_baseline_transform and compare_result_to_baseline_image - TestRegisterTimeSeriesImages: baseline .hdf transforms and .mha images - pytest --create-baselines support; CI and test docs updated

aylward merged commit a291bbf into main Feb 12, 2026
11 checks passed

aylward deleted the testing_img_tfm branch February 12, 2026 17:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: image and transform baselines and experiment notebook testing#26

ENH: image and transform baselines and experiment notebook testing#26
aylward merged 2 commits intomainfrom
testing_img_tfm

aylward commented Feb 9, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 9, 2026

Uh oh!

Copilot AI Feb 9, 2026

Uh oh!

Copilot AI Feb 9, 2026

Uh oh!

Copilot AI Feb 9, 2026

Uh oh!

Copilot AI Feb 9, 2026

Uh oh!

Copilot AI Feb 9, 2026

Uh oh!

Copilot AI Feb 9, 2026

Uh oh!

Copilot AI Feb 9, 2026

Uh oh!

Copilot AI Feb 9, 2026

Uh oh!

Copilot AI Feb 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

aylward commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

aylward commented Feb 9, 2026 •

edited

Loading