|
| 1 | +# Feedback on V3 Refactor Plan |
| 2 | + |
| 3 | +Reviewed against the current tree after commit `ba0f482`. The plan has the |
| 4 | +right architectural direction, but it should not be implemented exactly as |
| 5 | +written. The main issue is that several "dead code" and "zero caller" claims |
| 6 | +are too broad, while the ordering understates breakage risk. |
| 7 | + |
| 8 | +## Executive Summary |
| 9 | + |
| 10 | +The strongest parts of the plan are: |
| 11 | + |
| 12 | +- Moving tuning orchestration out of `connectomics.decoding`. |
| 13 | +- Moving evaluation logic out of Lightning. |
| 14 | +- Moving runtime dispatch out of `scripts/main.py`. |
| 15 | +- Enforcing strict config loading instead of warning on unknown keys. |
| 16 | +- Removing ghost config reads such as `inference.test_time_augmentation.act`. |
| 17 | +- Separating raw inference artifacts from decoding and evaluation concerns. |
| 18 | + |
| 19 | +The weakest parts are: |
| 20 | + |
| 21 | +- Theme A overclaims dead code. Some entries are public APIs, builder paths, or |
| 22 | + test-covered behavior rather than unreachable code. |
| 23 | +- The proposed PR order is unsafe. Public API trimming, strict config, schema |
| 24 | + migration, and runtime extraction depend on each other. |
| 25 | +- The file-size target is too mechanical. Splits should follow ownership and |
| 26 | + dependency direction first, not a hard line-count rule. |
| 27 | +- The high-performance objective is not developed enough. Most items improve |
| 28 | + architecture, but few directly address runtime hot spots. |
| 29 | + |
| 30 | +Bottom line: implement v3 as a boundary-first refactor with import-boundary |
| 31 | +tests and small benchmarks, not as a broad deletion and file-split sweep. |
| 32 | + |
| 33 | +## What The Plan Gets Right |
| 34 | + |
| 35 | +### Decoding/Tuning Boundary |
| 36 | + |
| 37 | +`connectomics/decoding/tuning/optuna_tuner.py` is still the worst boundary |
| 38 | +violation. It lazy-imports naming helpers from `connectomics.training`: |
| 39 | + |
| 40 | +- `tta_cache_suffix` |
| 41 | +- `tuning_best_params_filename` |
| 42 | +- `tuning_best_params_filename_candidates` |
| 43 | +- `tuning_study_db_filename` |
| 44 | + |
| 45 | +It also creates a datamodule and calls `trainer.test(...)` inside the decoding |
| 46 | +package. That orchestration belongs in `connectomics.runtime`, while decoding |
| 47 | +tuning should operate on arrays/artifacts and metric callables. |
| 48 | + |
| 49 | +The proposed split into pure tuner modules plus `runtime/tune_runner.py` is the |
| 50 | +right direction. |
| 51 | + |
| 52 | +### Inference/Decoding Boundary |
| 53 | + |
| 54 | +`connectomics/inference/chunked.py` currently contains a streamed affinity-CC |
| 55 | +decode path. That means inference imports decoding and performs segmentation |
| 56 | +inside an inference module. Moving `run_chunked_affinity_cc_inference` and its |
| 57 | +decode-kwargs resolver into a decoding/runtime module is a clean boundary fix. |
| 58 | + |
| 59 | +Keep `run_chunked_prediction_inference` in inference. It should remain raw |
| 60 | +prediction only. |
| 61 | + |
| 62 | +### Evaluation Boundary |
| 63 | + |
| 64 | +`training/lightning/test_pipeline.py` owns too much evaluation logic. Moving |
| 65 | +binary, instance, NERL, report writing, and metric aggregation into |
| 66 | +`connectomics.evaluation` would reduce private `module._*` coupling and make |
| 67 | +decode-only/evaluate-only workflows possible without Lightning. |
| 68 | + |
| 69 | +This should be one of the highest-priority v3 changes. |
| 70 | + |
| 71 | +### Strict Config |
| 72 | + |
| 73 | +`config/pipeline/config_io.py::_warn_unconsumed_keys` still warns for unknown |
| 74 | +top-level keys. With "no backward compatibility", unknown keys should raise. |
| 75 | +This is a good mechanical cleanup and would catch drift earlier. |
| 76 | + |
| 77 | +The ghost activation reads in `inference/tta.py` should also be removed: |
| 78 | + |
| 79 | +- `cfg.inference.test_time_augmentation.act` |
| 80 | +- `cfg.inference.output_act` |
| 81 | + |
| 82 | +The canonical path is `inference.test_time_augmentation.channel_activations`. |
| 83 | + |
| 84 | +## Corrections Needed Before Implementation |
| 85 | + |
| 86 | +### Theme A Is Too Aggressive |
| 87 | + |
| 88 | +Do not treat all listed items as safe deletion. |
| 89 | + |
| 90 | +Examples: |
| 91 | + |
| 92 | +- `RandMixupd` is not dead. It is imported by tests and referenced by the |
| 93 | + augmentation builder. Removing it is a behavior/API removal, not dead-code |
| 94 | + cleanup. |
| 95 | +- `auto_plan_config`, `AutoConfigPlanner`, and `AutoPlanResult` are not dead. |
| 96 | + They are intentionally tested under `connectomics.config.hardware`. Removing |
| 97 | + them is a product decision, not a cleanup. |
| 98 | +- `WandbConfig` may have no active logger consumer, but deleting the schema field |
| 99 | + still changes accepted config surface. That belongs in the schema-break PR, |
| 100 | + with tutorial/docs validation. |
| 101 | +- `SaveVolumed` and `TileLoaderd` may have no production caller, but they are |
| 102 | + exported from `connectomics.data.io`. Removing exported names should be part |
| 103 | + of an API-trim PR, not mixed into dead-code deletion. |
| 104 | +- `create_combined_loss`, `create_loss_from_config`, `CombinedLoss`, and |
| 105 | + `GANLoss` are exported through model/loss package APIs. If removed, update |
| 106 | + package exports and tests explicitly. |
| 107 | +- `TestConfig.output_path` and `TestConfig.cache_suffix` are not obviously |
| 108 | + "never read". There is stage fallback behavior in `inference/output.py`, and |
| 109 | + tests currently set `cfg.test.output_path`. |
| 110 | + |
| 111 | +The plan should split Theme A into three buckets: |
| 112 | + |
| 113 | +1. Safe artifacts/dead modules: delete after grep and import checks. |
| 114 | +2. Public API removals: delete only with export/test updates. |
| 115 | +3. Product feature removals: require an explicit decision. |
| 116 | + |
| 117 | +### PR Ordering Is Not Safe |
| 118 | + |
| 119 | +The statement "PRs 1-7 are mergeable in any order" is not correct. |
| 120 | + |
| 121 | +Safer dependencies: |
| 122 | + |
| 123 | +- `runtime/output_naming.py` should land before splitting `optuna_tuner.py`, |
| 124 | + because it removes decoding's dependency on Lightning naming helpers. |
| 125 | +- Public API trim should happen after target modules exist, not before. |
| 126 | +- Strict config should avoid removing schema fields before tutorial migration. |
| 127 | +- Schema split and tutorial migration must be one coordinated change. |
| 128 | +- Evaluation extraction should happen before broad `test_pipeline.py` file |
| 129 | + splitting, because it removes the largest and most meaningful coupling first. |
| 130 | + |
| 131 | +### File Size Should Be Secondary |
| 132 | + |
| 133 | +"Every file < 600 lines" is a useful smell check, not a design rule. Splitting |
| 134 | +large files without resolving ownership can create more import churn and weaker |
| 135 | +locality. |
| 136 | + |
| 137 | +Prioritize splits that reduce dependency direction problems: |
| 138 | + |
| 139 | +- decoding no longer imports training |
| 140 | +- inference no longer imports decoding |
| 141 | +- config no longer imports data execution machinery |
| 142 | +- scripts no longer own runtime orchestration |
| 143 | +- evaluation no longer depends on Lightning module internals |
| 144 | + |
| 145 | +After those are fixed, split remaining large files by stable concepts. |
| 146 | + |
| 147 | +### Performance Work Needs More Specific Targets |
| 148 | + |
| 149 | +The plan says "high-performance" but mostly proposes architecture cleanup. Add |
| 150 | +concrete performance targets: |
| 151 | + |
| 152 | +- Avoid unnecessary `tensor.cpu().float().numpy()` conversions where downstream |
| 153 | + operations can stay in torch or preserve dtype. |
| 154 | +- Benchmark raw chunked inference write throughput before changing artifact |
| 155 | + writing paths. |
| 156 | +- Verify HDF5 chunk shapes and compression settings for prediction artifacts. |
| 157 | +- Reduce duplicated raw prediction read/write loops between standard and |
| 158 | + chunked inference. |
| 159 | +- Measure TTA memory use and data movement, especially patch-first-local and |
| 160 | + distributed reductions. |
| 161 | +- Avoid forcing every streamed path through an artifact abstraction if it hurts |
| 162 | + large-volume throughput. |
| 163 | + |
| 164 | +## Recommended Revised Order |
| 165 | + |
| 166 | +### PR 0: Verification Guardrails |
| 167 | + |
| 168 | +Add tests before moving code: |
| 169 | + |
| 170 | +- Import-boundary tests: |
| 171 | + - `connectomics.decoding` must not import `connectomics.training` |
| 172 | + - `connectomics.inference` must not import `connectomics.decoding` |
| 173 | + - `connectomics.config` must not import MONAI/data execution code |
| 174 | +- Strict-config tests for unknown top-level and nested removed keys. |
| 175 | +- Snapshot tests for public imports that are intentionally kept. |
| 176 | +- Small chunked-inference artifact benchmark or smoke test. |
| 177 | + |
| 178 | +### PR 1: Safe Dead Artifacts Only |
| 179 | + |
| 180 | +Delete only files/symbols that are truly unreachable and not exported. For |
| 181 | +example, `decoding/tuning/auto_tuning.py` looks like a strong candidate because |
| 182 | +it imports `from .segmentation import decode_affinity_cc`, which is broken. |
| 183 | + |
| 184 | +Required cleanup: |
| 185 | + |
| 186 | +- Remove its re-exports from `connectomics/decoding/tuning/__init__.py`. |
| 187 | +- Add an import test for `connectomics.decoding.tuning`. |
| 188 | + |
| 189 | +Do not include public API removals in this PR. |
| 190 | + |
| 191 | +### PR 2: Runtime Naming Extraction |
| 192 | + |
| 193 | +Move cache suffix, checkpoint tag, output-head tag, and tuning filename helpers |
| 194 | +from `training/lightning/utils.py` to `connectomics/runtime/output_naming.py`. |
| 195 | + |
| 196 | +Then update `optuna_tuner.py`, `scripts/main.py`, `model.py`, and |
| 197 | +`test_pipeline.py` imports. |
| 198 | + |
| 199 | +This is the cleanest first boundary move because it removes lazy decoding to |
| 200 | +training imports without changing tuning behavior yet. |
| 201 | + |
| 202 | +### PR 3: Strict Config Pass |
| 203 | + |
| 204 | +Make unknown top-level keys raise in `load_config`. |
| 205 | + |
| 206 | +Delete verified ghost reads: |
| 207 | + |
| 208 | +- `inference.test_time_augmentation.act` |
| 209 | +- `inference.output_act` |
| 210 | + |
| 211 | +Keep genuinely optional section-root access where sections may be `None`. |
| 212 | + |
| 213 | +### PR 4: Evaluation Extraction |
| 214 | + |
| 215 | +Move binary/instance/NERL metrics and report writing into |
| 216 | +`connectomics.evaluation`. |
| 217 | + |
| 218 | +Target layout: |
| 219 | + |
| 220 | +```text |
| 221 | +connectomics/evaluation/ |
| 222 | + __init__.py |
| 223 | + stage.py |
| 224 | + metrics.py |
| 225 | + nerl.py |
| 226 | + report.py |
| 227 | + curvilinear.py |
| 228 | +``` |
| 229 | + |
| 230 | +`training/lightning/test_pipeline.py` should call evaluation as a service, not |
| 231 | +own the metric implementation. |
| 232 | + |
| 233 | +### PR 5: Tuning Runtime Split |
| 234 | + |
| 235 | +Move `run_tuning`, `load_and_apply_best_params`, and temporary inference |
| 236 | +override orchestration into `connectomics/runtime/tune_runner.py`. |
| 237 | + |
| 238 | +Keep `connectomics.decoding.tuning` focused on pure parameter search over saved |
| 239 | +arrays/artifacts and metric callables. |
| 240 | + |
| 241 | +### PR 6: Chunked Decode Boundary |
| 242 | + |
| 243 | +Move streamed affinity-CC decode out of `connectomics.inference`. |
| 244 | + |
| 245 | +Suggested target: |
| 246 | + |
| 247 | +- `connectomics/decoding/streamed_chunked.py`, if it is a decoder concern |
| 248 | +- or `connectomics/runtime/chunked_decode.py`, if it remains orchestration-heavy |
| 249 | + |
| 250 | +`connectomics/inference/chunked.py` should only produce raw prediction |
| 251 | +artifacts. |
| 252 | + |
| 253 | +### PR 7: Runtime CLI Extraction |
| 254 | + |
| 255 | +Move dispatch-heavy code out of `scripts/main.py` into runtime modules: |
| 256 | + |
| 257 | +- `runtime/cli.py` |
| 258 | +- `runtime/checkpoint_dispatch.py` |
| 259 | +- `runtime/cache_resolver.py` |
| 260 | +- `runtime/sharding.py` |
| 261 | +- `runtime/torch_safe_globals.py` |
| 262 | + |
| 263 | +Keep `scripts/main.py` as a thin entrypoint. |
| 264 | + |
| 265 | +### PR 8: Schema Split And Tutorial Migration |
| 266 | + |
| 267 | +Split decoding and evaluation config out of `config/schema/inference.py`. |
| 268 | + |
| 269 | +Do this atomically with tutorial migration because it changes config keys: |
| 270 | + |
| 271 | +- `inference.postprocessing.*` to `decoding.postprocessing.*` |
| 272 | +- `inference.saved_prediction_path` to `decoding.input_prediction_path` |
| 273 | +- `inference.decoding_path` to `decoding.output_path` |
| 274 | +- evaluation config into its own schema module |
| 275 | + |
| 276 | +Run tutorial config validation across all tutorial YAMLs in the same PR. |
| 277 | + |
| 278 | +### PR 9: Public API Trim |
| 279 | + |
| 280 | +Trim `__init__.py` exports after the new homes exist. |
| 281 | + |
| 282 | +Do this with explicit import tests so the kept public surface is intentional. |
| 283 | + |
| 284 | +### PR 10: File Splits |
| 285 | + |
| 286 | +Split remaining large files only after ownership boundaries are fixed. |
| 287 | + |
| 288 | +Focus on: |
| 289 | + |
| 290 | +- `inference/tta.py` |
| 291 | +- `inference/lazy.py` |
| 292 | +- `data/augmentation/transforms.py` |
| 293 | +- `data/processing/transforms.py` |
| 294 | +- `training/lightning/callbacks.py` |
| 295 | +- `config/pipeline/config_io.py` |
| 296 | + |
| 297 | +### PR 11: Architecture Rename And Docs Refresh |
| 298 | + |
| 299 | +Rename `nnunet_pretrained` to `nnunet` only with tutorial and docs migration. |
| 300 | + |
| 301 | +Refresh stale docs in: |
| 302 | + |
| 303 | +- `docs/source/tutorials/*.rst` |
| 304 | +- `docs/source/notes/*.rst` |
| 305 | +- `docs/source/modules/model.rst` |
| 306 | +- `docs/source/modules/models.rst` |
| 307 | + |
| 308 | +Delete duplicate docs only after confirming they are not linked by Sphinx |
| 309 | +toctrees. |
| 310 | + |
| 311 | +## Implementation Guardrails |
| 312 | + |
| 313 | +- Every deletion must pass `rg` checks across `connectomics`, `scripts`, |
| 314 | + `tests`, `tutorials`, and `docs`. |
| 315 | +- Public export removals need an explicit before/after import surface test. |
| 316 | +- Schema changes need tutorial validation in the same commit. |
| 317 | +- Boundary changes need import graph tests. |
| 318 | +- Runtime changes need focused tests for cache-hit, tune, tune-test, and |
| 319 | + checkpoint-derived output paths. |
| 320 | +- Performance-sensitive paths need at least smoke benchmarks before and after: |
| 321 | + chunked inference, TTA, lazy prediction, and artifact writes. |
| 322 | + |
| 323 | +## Verdict |
| 324 | + |
| 325 | +Use the v3 plan as a strong architectural roadmap, not as an execution spec. |
| 326 | + |
| 327 | +The high-value sequence is: |
| 328 | + |
| 329 | +1. Add guardrails. |
| 330 | +2. Extract runtime naming. |
| 331 | +3. Enforce strict config. |
| 332 | +4. Move evaluation out of Lightning. |
| 333 | +5. Move tuning orchestration out of decoding. |
| 334 | +6. Move chunked decoding out of inference. |
| 335 | +7. Split schema and migrate tutorials. |
| 336 | +8. Trim public APIs. |
| 337 | +9. Split large files by ownership. |
| 338 | +10. Refresh docs. |
| 339 | + |
| 340 | +This preserves the intended "no backward compatibility" cleanup while avoiding |
| 341 | +accidental deletion of still-reachable APIs and avoiding review-hostile churn. |
0 commit comments