Date: 2026-02-08
python scripts/main.py --demo-> failed immediately:ModuleNotFoundError: No module named 'torch'.- Fastest declared suite from
tests/README.mdistests/unit/:pytest tests/unit -q-> command not found.python -m pytest tests/unit -q->No module named pytest.
- Existing lint/format commands configured in CI (
.github/workflows/tests.yml) includeblack,flake8,isort,mypy:black --check connectomics/-> command not found.python -m black --check connectomics/->No module named black.
Baseline conclusion: current environment is missing runtime/dev dependencies required to execute the repository's smoke/test/lint checks.
- Reduce high-complexity module hotspots into smaller, single-purpose units while preserving existing behavior.
- Eliminate duplicated helper logic and centralize shared utilities.
- Remove import-cycle and hidden coupling risks in dataset/training internals.
- Keep CLI, Hydra/OmegaConf config semantics, and public imports stable via compatibility facades/shims.
- Improve test coverage around refactor-sensitive behavior before moving logic.
- No training algorithm changes, model architecture changes, or metric/decoding logic changes.
- No changes to expected outputs/checkpoint formats beyond equivalent refactor-safe behavior.
- No config key renaming/removal unless fully backward-compatible shims are added.
- No large-scale package reorganization that breaks import paths in one step.
- No new heavyweight runtime dependencies.
scripts/main.pyCLI contract (arguments, defaults, override passthrough, and mode behavior).- Existing Hydra/OmegaConf structure and CLI override semantics for
system,data,model,optimization,monitor,inference,test, andtune. - Current public imports from:
connectomics.configconnectomics.training.litconnectomics.data.dataset
- Current run/checkpoint directory behavior and config save behavior for train/test/tune/tune-test modes.
- Tight coupling / circular import:
- Static import graph found one cycle:
connectomics.data.dataset.build <-> connectomics.data.dataset.dataset_volume. dataset_volume.pyimportscreate_data_dicts_from_pathsfrombuild.py, whilebuild.pylazily imports volume datasets.
- Static import graph found one cycle:
- Duplicated utilities:
expand_file_pathsexists in bothconnectomics/training/lit/config.pyandconnectomics/training/lit/utils.py.- Validation iteration auto-calculation logic appears in multiple branches of
connectomics/training/lit/config.py.
- Unclear module boundaries:
connectomics/training/lit/config.pymixes dataset building, interactive dataset download prompting, datamodule wrappers, run directory creation, and checkpoint mutation in one file (~1000 LOC).scripts/demo.pyduplicates training/datamodule/trainer assembly logic instead of reusing the same factories.
- Configuration sprawl / compatibility drift:
- Data location concepts are split across
data.*,test.data.*,tune.data.*, and legacy-lookinginference.datareferences in tutorial configs. justfiledefines--mode inferwhile CLI parser mode choices aretrain|test|tune|tune-test.- Overlap in similarly named knobs (
data.pad_size,data.image_transform.pad_size,inference.sliding_window.pad_size) increases ambiguity.
- Data location concepts are split across
connectomics/training/lit/cli.py- CLI parse + high-level config assembly (
parse_args,setup_config) only.
- CLI parse + high-level config assembly (
connectomics/training/lit/data_factory.py- Datamodule/dataset assembly and mode-specific dataset selection only.
connectomics/training/lit/runtime.py- Run directory lifecycle and checkpoint state mutation only.
connectomics/training/lit/utils.py- Small pure helpers only (no orchestration).
connectomics/data/dataset/data_dicts.py- Shared data-dict constructors used by both builders and datasets (break cycle).
- Compatibility policy:
- Keep old import locations (
lit/config.py,dataset/build.py) as facades that re-export moved functions.
- Keep old import locations (
Files touched (planned):
tests/unit/test_lit_utils.pytests/unit/test_hydra_config.pytests/unit/test_main_cli_contract.py(new)tests/unit/test_run_directory_contract.py(new)
Risk: Low
Verification commands:
python scripts/main.py --demopython -m pytest tests/unit/test_lit_utils.py tests/unit/test_hydra_config.py tests/unit/test_main_cli_contract.py tests/unit/test_run_directory_contract.py -q
Files touched (planned):
connectomics/training/lit/utils.pyconnectomics/training/lit/config.pyconnectomics/training/lit/path_utils.py(new)tests/unit/test_lit_utils.py
Scope:
- Single canonical
expand_file_pathsimplementation. - Single canonical validation-iter computation helper.
- Keep legacy function names as wrappers.
Risk: Low-Medium
Verification commands:
python scripts/main.py --demopython -m pytest tests/unit/test_lit_utils.py -q
Files touched (planned):
connectomics/training/lit/config.pyconnectomics/training/lit/runtime.py(new)connectomics/training/lit/__init__.pyscripts/main.pytests/unit/test_run_directory_contract.py
Scope:
- Move
setup_run_directory,cleanup_run_directory,modify_checkpoint_stateinto dedicated module. - Preserve call signatures and re-export through existing import paths.
Risk: Medium
Verification commands:
python scripts/main.py --demopython -m pytest tests/unit/test_run_directory_contract.py tests/unit/test_lit_utils.py -q
Files touched (planned):
connectomics/data/dataset/build.pyconnectomics/data/dataset/dataset_volume.pyconnectomics/data/dataset/data_dicts.py(new)connectomics/data/dataset/__init__.pytests/unit/test_monai_transforms.pytests/integration/test_dataset_multi.py
Scope:
- Move shared data-dict helper(s) out of
build.pyto cycle-free module. - Keep public factory API in
build.pyintact via re-export.
Risk: Medium
Verification commands:
python scripts/main.py --demopython -m pytest tests/integration/test_dataset_multi.py tests/unit/test_monai_transforms.py -q
Files touched (planned):
connectomics/training/lit/config.pyconnectomics/training/lit/data_factory.py(new)connectomics/training/lit/__init__.pytests/unit/test_lit_utils.pytests/integration/test_config_integration.py
Scope:
- Move datamodule construction logic from
lit/config.pyintolit/data_factory.pyin small steps. - Keep existing function entrypoint (
create_datamodule) and behavior.
Risk: Medium-High
Verification commands:
python scripts/main.py --demopython -m pytest tests/unit/test_lit_utils.py tests/integration/test_config_integration.py -q
Per milestone:
- Always run smoke command:
python scripts/main.py --demo. - Run smallest relevant unit/integration slice for touched area.
- Run formatter/linter commands already configured by repo CI when available in env:
black --check connectomics/flake8 connectomics/ --max-line-length=100isort --check connectomics/
Before final completion of all milestones:
python scripts/main.py --demopython -m pytest tests/unit -qpython -m pytest tests/integration -q- Optional (if environment/time allows):
python -m pytest tests/e2e -q
- Demo smoke passes.
- Refactor-touched tests pass.
- No CLI argument regressions in
scripts/main.py. - No breakage for existing config hierarchy and CLI overrides.
- No import-path breakage for existing public module entrypoints.
- No new runtime dependencies introduced.