ENH: image and transform baselines and experiment notebook testing#26
ENH: image and transform baselines and experiment notebook testing#26
Conversation
…t support - Add TestTools (test_tools.py): compare 2D/3D image slices and ITK transforms to baselines with configurable tolerances; ITK .mha I/O - Add notebook_utils.running_as_test() for reduced params when run via pytest - Add --run-experiments flag, experiment marker, and tests/baselines - Use TestTools in test_register_time_series_images for image and transform comparison - Update experiment notebooks and test docs (EXPERIMENT_FLAG_USAGE, EXPERIMENT_TESTS_GUIDE)
There was a problem hiding this comment.
Pull request overview
This PR adds baseline-based regression testing utilities for images/transforms and introduces a “running as test” mechanism for experiment notebooks (via PHYSIOMOTION_RUNNING_AS_TEST) so notebooks can run with reduced parameters under pytest.
Changes:
- Added
TestToolsutilities to write/compare ITK images/transforms against baselines with tolerances. - Added
notebook_utils.running_as_test()and updated experiment test runner to setPHYSIOMOTION_RUNNING_AS_TEST=1when executing notebooks. - Updated time-series registration tests and multiple experiment notebooks/docs to use the new testing flow.
Reviewed changes
Copilot reviewed 39 out of 40 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_register_time_series_images.py | Switches from “write artifacts” to baseline comparison workflow for key outputs. |
| tests/test_experiments.py | Passes PHYSIOMOTION_RUNNING_AS_TEST=1 into notebook execution subprocess env. |
| tests/conftest.py | Adds tests/baselines directory to the test_directories fixture; refactors a data-path fixture. |
| tests/baselines/.gitkeep | Ensures baselines directory is present in the repo. |
| tests/EXPERIMENT_TESTS_GUIDE.md | Documents PHYSIOMOTION_RUNNING_AS_TEST and recommended notebook checks. |
| tests/EXPERIMENT_FLAG_USAGE.md | Documents the new test-mode flag behavior and links to guide. |
| src/physiomotion4d/test_tools.py | New baseline comparison/writer utilities for ITK images/transforms. |
| src/physiomotion4d/physiomotion4d_base.py | Adds warning filters for specific SWIG-related DeprecationWarnings. |
| src/physiomotion4d/notebook_utils.py | Adds running_as_test() helper to detect test-mode in notebooks. |
| pyproject.toml | Enables always-on warnings (-W always) in pytest addopts. |
| experiments/README.md | Documents the test-mode flag for experiment notebooks. |
| experiments/Reconstruct4DCT/reconstruct_4d_ct_class.ipynb | Uses running_as_test() to select quick vs full run parameters. |
| experiments/Reconstruct4DCT/reconstruct_4d_ct.ipynb | Notebook execution metadata updated (timings). |
| experiments/Lung-GatedCT_To_USD/Experiment_SubSurfaceScatter.ipynb | Notebook execution metadata updated (timings). |
| experiments/Lung-GatedCT_To_USD/Experiment_SegReg.ipynb | Notebook execution metadata updated (timings). |
| experiments/Lung-GatedCT_To_USD/Experiment_CombineModels.ipynb | Notebook execution metadata updated (timings). |
| experiments/Lung-GatedCT_To_USD/Experiment_ArrangeOnStage.ipynb | Notebook execution metadata updated (timings). |
| experiments/Lung-GatedCT_To_USD/2-paint_dirlab_models.ipynb | Notebook execution metadata updated (timings). |
| experiments/Lung-GatedCT_To_USD/1-make_dirlab_models.ipynb | Notebook execution metadata updated (timings). |
| experiments/Lung-GatedCT_To_USD/0-register_dirlab_4dct.ipynb | Notebook execution metadata updated (timings). |
| experiments/Heart-VTKSeries_To_USD/1-heart_vtkseries_to_usd.ipynb | Notebook execution metadata updated (timings). |
| experiments/Heart-VTKSeries_To_USD/0-download_and_convert_4d_to_3d.ipynb | Notebook execution metadata updated (timings). |
| experiments/Heart-Statistical_Model_To_Patient/heart_model_to_patient.ipynb | Notebook execution metadata updated (timings). |
| experiments/Heart-Statistical_Model_To_Patient/heart_model_to_model_registration_pca.ipynb | Uses running_as_test() to reduce iterations when executed under pytest. |
| experiments/Heart-Statistical_Model_To_Patient/heart_model_to_model_icp_itk.ipynb | Notebook execution metadata updated (timings) and widget state changes. |
| experiments/Heart-GatedCT_To_USD/test_vista3d_inMem.ipynb | Notebook execution metadata and widget state changes. |
| experiments/Heart-GatedCT_To_USD/test_vista3d_class.ipynb | Notebook execution metadata and widget state changes. |
| experiments/Heart-GatedCT_To_USD/4-merge_dynamic_and_static_usd.ipynb | Notebook execution metadata updated (timings). |
| experiments/Heart-GatedCT_To_USD/3-transform_dynamic_and_static_contours.ipynb | Notebook execution metadata updated (timings). |
| experiments/Heart-GatedCT_To_USD/2-generate_segmentation.ipynb | Notebook execution metadata updated (timings). |
| experiments/Heart-GatedCT_To_USD/1-register_images.ipynb | Notebook execution metadata updated (timings). |
| experiments/Heart-GatedCT_To_USD/0-download_and_convert_4d_to_3d.ipynb | Now contains execution counts and outputs (not cleared). |
| experiments/Heart-Create_Statistical_Model/4-surfaces_aligned_correspond_to_pca_inputs.ipynb | Notebook execution metadata and widget state changes. |
| experiments/Heart-Create_Statistical_Model/2-input_surfaces_to_surfaces_aligned.ipynb | Notebook execution metadata and widget state changes. |
| experiments/Heart-Create_Statistical_Model/1-input_meshes_to_input_surfaces.ipynb | Notebook execution metadata and widget state changes. |
| experiments/Convert_VTK_To_USD/convert_vtk_to_usd_using_class.ipynb | Notebook execution metadata updated (timings). |
| experiments/Convert_VTK_To_USD/convert_chop_valve_to_usd.ipynb | Notebook execution metadata updated (timings). |
| experiments/Colormap-VTK_To_USD/colormap_vtk_to_usd.ipynb | Notebook execution metadata updated (timings). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| moving_image, "basic_time_series_registered_0.mha" | ||
| ) | ||
| test_tools.compare_result_to_baseline_image( | ||
| "basic_time_series_registered_0.mha", | ||
| ) |
There was a problem hiding this comment.
The baseline image comparison result is ignored here as well, so the test can pass even if the registered image differs from the baseline. Please assert the boolean return (or raise on failure in TestTools).
| forward_transforms[0], "prior_forward_transform_0.hdf" | ||
| ) | ||
| test_tools.compare_result_to_baseline_transform( | ||
| "prior_forward_transform_0.hdf", | ||
| ) |
There was a problem hiding this comment.
The baseline transform comparison returns a boolean but isn't asserted, so regressions won't fail this test. Please assert the return value (or have TestTools raise on failure).
| registered_image, "transform_application_time_series_0.mha" | ||
| ) | ||
| test_tools.compare_result_to_baseline_image( | ||
| "transform_application_time_series_0.mha", | ||
| ) |
There was a problem hiding this comment.
The baseline comparison return value is ignored here too, so this test can pass even if the image differs from the baseline. Please assert the return value (or raise on failure).
| if not baseline_path.exists(): | ||
| shutil.copy(str(results_path), str(baseline_path)) | ||
| self.log_warning( | ||
| "Baseline transform did not exist; copied results transform: %s", | ||
| results_path, |
There was a problem hiding this comment.
Auto-creating the baseline by copying the current result when the baseline file is missing makes the comparison meaningless on a fresh checkout/CI run (it will always pass). Consider failing when the baseline is missing by default, and only allowing baseline generation behind an explicit flag/env var (e.g. UPDATE_BASELINES=1).
| self._last_transform_num_values_above_tol = int( | ||
| np.sum(diff_squared > per_value_tol) | ||
| ) |
There was a problem hiding this comment.
diff_squared is compared against per_value_tol here, which makes tolerance semantics inconsistent (squared values vs linear tolerance). Either compare abs(diff) > per_value_tol, or rename/document the argument as a squared tolerance.
| "outputs": [ | ||
| { | ||
| "data": { | ||
| "text/plain": [ | ||
| "'./results//slice_fixed.mha'" |
There was a problem hiding this comment.
This notebook cell has committed execution output (non-empty outputs). Please clear cell outputs (and any widget state/output) before committing so the notebook remains deterministic and the repo doesn’t grow from embedded outputs.
| forward_transforms[0], "basic_forward_transform_0.hdf" | ||
| ) | ||
| test_tools.compare_result_to_baseline_transform( | ||
| "basic_forward_transform_0.hdf", | ||
| ) |
There was a problem hiding this comment.
The baseline transform comparison returns a boolean but the test doesn't assert it, so mismatches will not fail the test (only get logged). Please assert the return value (or have TestTools raise on failure).
| moving_image, "prior_time_series_registered_0.mha" | ||
| ) | ||
| test_tools.compare_result_to_baseline_image( | ||
| "prior_time_series_registered_0.mha", | ||
| ) |
There was a problem hiding this comment.
The baseline image comparison return value is ignored, so mismatches won't fail the test. Please assert the return value (or have TestTools raise on failure).
| transform = itk.transformread(str(results_path)) | ||
| transform_params = np.array(transform[0].GetParameters()) | ||
|
|
||
| baseline_transform = itk.transformread(str(baseline_path)) | ||
| baseline_transform_params = np.array(baseline_transform[0].GetParameters()) |
There was a problem hiding this comment.
transformread() returns a list; this code assumes both reads return at least one transform and that parameter vectors are the same length. Please validate the number of transforms read and that len(parameters) matches before computing the diff, and raise a clear error otherwise.
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "execution_count": 1, | ||
| "metadata": { | ||
| "execution": { | ||
| "iopub.execute_input": "2026-02-04T02:35:36.101493Z", | ||
| "iopub.status.busy": "2026-02-04T02:35:36.100494Z", | ||
| "iopub.status.idle": "2026-02-04T02:35:51.037775Z", | ||
| "shell.execute_reply": "2026-02-04T02:35:51.036978Z" | ||
| "iopub.execute_input": "2026-02-09T04:51:59.431418Z", |
There was a problem hiding this comment.
This notebook is committed with non-null execution_count values. The experiment test runner is designed to clear outputs/execution counts to keep the repo clean, so please reset execution_count back to null before committing.
- TestTools: compare_result_to_baseline_transform and compare_result_to_baseline_image - TestRegisterTimeSeriesImages: baseline .hdf transforms and .mha images - pytest --create-baselines support; CI and test docs updated
Uh oh!
There was an error while loading. Please reload this page.