PytorchConnectomics
diff --git a/‎.claude/refactor/affinity.md‎ ‎.claude/banis/affinity.md‎.claude/refactor/affinity.md renamed to .claude/banis/affinity.md b/‎.claude/refactor/affinity.md‎ ‎.claude/banis/affinity.md‎.claude/refactor/affinity.md renamed to .claude/banis/affinity.md
diff --git a/‎.claude/banis/inference.md‎
Lines changed: 77 additions & 0 deletions b/‎.claude/banis/inference.md‎
Lines changed: 77 additions & 0 deletions
diff --git a/‎.claude/refactor/config.md‎
Lines changed: 1 addition & 1 deletion b/‎.claude/refactor/config.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎.claude/refactor/inference-decoding-split.md‎
Lines changed: 89 additions & 0 deletions b/‎.claude/refactor/inference-decoding-split.md‎
Lines changed: 89 additions & 0 deletions
diff --git a/‎CLAUDE.md‎
Lines changed: 4 additions & 4 deletions b/‎CLAUDE.md‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎README.md‎
Lines changed: 2 additions & 2 deletions b/‎README.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎connectomics/config/pipeline/config_io.py‎
Lines changed: 6 additions & 3 deletions b/‎connectomics/config/pipeline/config_io.py‎
Lines changed: 6 additions & 3 deletions
diff --git a/‎connectomics/config/pipeline/profile_engine.py‎
Lines changed: 3 additions & 2 deletions b/‎connectomics/config/pipeline/profile_engine.py‎
Lines changed: 3 additions & 2 deletions
@@ -0,0 +1,77 @@
+# Inference: `base_banis.yaml` vs `lib/banis`
+
+Comparison of pytc whole-volume affinity inference (`tutorials/neuron_nisb/base_banis.yaml`) against the reference `lib/banis/inference.py` + `lib/banis/BANIS.py`.
+
+## Match
+
+| Aspect | Both |
+| --- | --- |
+| Window size | 128³ |
+| Overlap | 50% (BANIS: `small_size // 2 = 64` shift) |
+| Activation | `scale_sigmoid(x) = sigmoid(0.2·x)` on output channels |
+| Precision | fp16 autocast |
+| Input normalization | divide-255, XYZ layout, no transpose |
+| Decoded channels | first 3 (short-range) via `select_channel: [0,1,2]` |
+| Decoder | connected components, 6-connectivity, source-stored edges (`edge_offset: 0`) |
+| TTA | disabled (BANIS has no flip/rotate TTA) |
+| Whole-volume strategy | both load full image then patch |
+
+## Differences
+
+1. **Blending weight**
+   - pytc: `blending: gaussian, sigma_scale=0.25` (Gaussian importance map).
+   - BANIS: `distance_transform_cdt` of zero-padded ones cube → L1 chamfer distance from surface (zero on faces, max in center). `lib/banis/inference.py:209-210`.
+
+2. **Boundary handling**
+   - pytc: `padding_mode: replicate` — MONAI pads the *whole* volume up front so windows align.
+   - BANIS: no padding. `get_offsets` always sets the final offset to `big_size - small_size`, so every window fits fully inside the volume. `lib/banis/inference.py:189-191`.
+
+3. **Threshold**
+   - pytc: fixed `threshold: 0.5`.
+   - BANIS: sweeps over `eval_ranges = sigmoid(0.2 · range(-1, 12))` ≈ `[0.45, 0.55, 0.65, 0.73, 0.80, 0.85, 0.89, 0.91, ...]`, picks val-best by NERL, reuses on test. `lib/banis/BANIS.py:439`, `lib/banis/BANIS.py:209-211`.
+
+4. **Patch grid**
+   - pytc: regular MONAI grid at 50% stride.
+   - BANIS: base grid + 7 shifted sets (all combinations of `+small_size//2` per axis) unioned and de-duped. `lib/banis/inference.py:154-174`. Slightly more centers near boundaries.
+
+5. **Stored prediction channels**
+   - pytc: short-range only (`select_channel: [0,1,2]`).
+   - BANIS: all 6 channels written to `pred_aff_*.zarr`; decoding still reads `[:3]`. `lib/banis/BANIS.py:199-200`, `lib/banis/BANIS.py:217`.
+
+## To match BANIS exactly
+
+- Replace `blending: gaussian` with custom L1-distance window (or accept gaussian as a near-equivalent at 50% overlap).
+- Drop `padding_mode: replicate` and use BANIS-style snap-to-edge offsets (last offset = `image_size - roi_size`).
+- Run a decoding threshold sweep over BANIS' `eval_ranges` and pick best by NERL on val before testing.
+
+Items 1–2 are cosmetic at 50% overlap; #3 is the main accuracy lever.
+
+## Boundary handling in pytc
+
+Two paths matter:
+
+- **Lazy sliding-window path** (`connectomics/inference/lazy.py`, used when `inference.sliding_window.lazy_load=true`). Honors `snap_to_edge: true` (last window at `image_size - roi_size`, no whole-volume padding) and per-window `target_context` (read `roi + 2·context`, predict, central-crop). `base_banis.yaml` uses this path.
+- **Eager MONAI path** (`connectomics/inference/sliding.py`, MONAI's `SlidingWindowInferer`). Vanilla MONAI; ignores `snap_to_edge` / `target_context`. For BANIS-flavored boundary context here, just bump `window_size` larger than the training patch — see below.
+
+## The `window_size = roi + extra` hack
+
+Instead of per-window `target_context` oversample + central crop (extra code, extra forwards), set `window_size` larger than the training patch and rely on default gaussian blending to de-emphasize the outer band:
+
+```yaml
+sliding_window:
+  window_size: [144, 144, 144]   # 128 (training) + 16 context per axis
+  blending: gaussian
+  sigma_scale: 0.25
+  overlap: 0.5
+```
+
+- Interior windows naturally pick up real surrounding-volume voxels in the +16 band.
+- Default gaussian (`sigma_scale=0.125–0.25`) gives the outer band ~5× less weight than the central edge — soft taper, no hard mask, no boundary coverage hole.
+- Must be a multiple of the model's downsample stride. MedNeXt-S has 4 stages → 144 ✓ (128 + 16); 138 (BANIS training oversample) ✗.
+- `~2×` per-patch GPU memory at 144 vs 128. Verify it fits with fp16.
+
+This works for both the lazy and eager paths and replaces the need for an inference-time `target_context` config.
+
+## What's still BANIS-specific
+
+`snap_to_edge: true` (in the yaml) only affects the lazy path and is the BANIS-faithful behavior — model never sees padded volume data. The eager path uses MONAI's whole-volume padding, which is functionally close at 50% overlap.
@@ -77,7 +77,7 @@ Profiles are named YAML snippets in `tutorials/bases/*.yaml`, resolved pre-conve
 | `optimizer_profiles` | `{stage}.optimization.profile` | `{stage}.optimization` |
 | `loss_profiles` | `{stage}.model.loss.profile` | `{stage}.model.loss.losses` |
 | `label_profiles` | `{stage}.data.label_transform.profile` | `{stage}.data.label_transform` |
-| `decoding_profiles` | `{stage}.inference.decoding_profile` | `{stage}.inference.decoding` |
+| `decoding_templates` | list refs under `{stage}.decoding` | `{stage}.decoding` |
 | `activation_profiles` | `{stage}.inference.test_time_augmentation.activation_profile` | `{stage}.inference.test_time_augmentation.channel_activations` |
 
 Selectors are only accepted at canonical paths; non-canonical paths raise `ValueError`.
 
@@ -0,0 +1,89 @@
+# Inference / Decoding Split Refactor
+
+## Problem
+
+The current test path mixes three responsibilities:
+
+- deep learning inference,
+- prediction artifact storage,
+- decoding/postprocessing/evaluation.
+
+This is most visible in chunked inference: `run_chunked_affinity_cc_inference`
+predicts a chunk, immediately decodes it, stitches labels, and writes the final
+segmentation. That is memory efficient, but it cannot reproduce whole-volume
+decoding exactly because connected components are solved per chunk and then
+stitched heuristically.
+
+## Target Design
+
+Treat model inference and decoding as separate stages.
+
+1. Model inference writes a raw prediction artifact.
+   The artifact is file-backed, chunked, and has a stable layout:
+   `(C, Z, Y, X)` for one volume after inference-time crop/channel selection.
+
+2. Decoding consumes a raw prediction artifact and writes a segmentation
+   artifact.
+   It should not require model construction, checkpoint loading, or GPU setup.
+
+3. The combined test path remains a convenience wrapper.
+   It can run inference, then optionally decode the just-written artifact.
+
+4. Evaluation should become its own top-level stage.
+   It should not live under `decoding`, because metrics consume decoded
+   artifacts and labels regardless of which decoder or cache produced them.
+   The config tree now has a dedicated top-level/default/test/tune
+   `evaluation` section.
+
+## Config Contract
+
+`inference.decode_after_inference`
+
+- `true`: current convenience behavior; decode after prediction.
+- `false`: stop after writing raw predictions.
+
+`inference.chunking.output_mode`
+
+- `decoded`: current streaming chunk decode/stitch behavior.
+- `raw_prediction`: stream chunked model predictions into one raw prediction
+  HDF5, then optionally run the normal whole-volume decoding path.
+
+Existing decode-only mode remains:
+
+```yaml
+inference:
+  saved_prediction_path: /path/to/raw_prediction.h5
+decoding:
+  - name: decode_affinity_cc
+    kwargs:
+      threshold: 0.7
+      backend: numba
+      edge_offset: 0
+```
+
+## Implementation Plan
+
+1. Add schema fields for `decode_after_inference` and chunked `output_mode`.
+2. Split chunked inference code into two entry points:
+   `run_chunked_prediction_inference` for raw prediction writing, and
+   `run_chunked_affinity_cc_inference` for existing streamed decode/stitch.
+3. Route `test_pipeline` chunked mode based on `chunking.output_mode`.
+4. For `raw_prediction`, write the raw file first. If
+   `decode_after_inference=true`, load that file and reuse the standard
+   decode/postprocess/save/evaluate path.
+5. Keep decode-only via `inference.saved_prediction_path` as the standalone
+   decoding entry for now. A future CLI can expose it as `--mode decode`.
+
+## Implemented
+
+- `decoding` is a top-level/default/test/tune stage section.
+- `evaluation` is a top-level/default/test/tune stage section.
+- Tutorial YAMLs use `default.decoding`/`test.decoding` and
+  `default.evaluation`/`test.evaluation` instead of nested inference sections.
+
+## Follow-Ups
+
+- Store prediction artifact metadata such as channel order, crop, activation,
+  checkpoint, and value scale in a small sidecar or HDF5 attrs.
+- Add lazy/blockwise decode readers for decoders that can operate without
+  materializing the full prediction volume.
@@ -289,7 +289,7 @@ configs/                         # Canonical shared YAML registries
 │   ├── label_profiles.yaml      # Label-transform presets
 │   └── ...                      # system, dataloader, augmentation, pipeline, tune
 └── templates/                   # Explicit list-item templates
-    └── decoding_templates.yaml  # `inference.decoding` templates (`template: ...`)
+    └── decoding_templates.yaml  # top-level `decoding` templates (`template: ...`)
 
 tutorials/                       # Example configurations
 ├── misc/                        # Miscellaneous experiments
@@ -316,15 +316,15 @@ The project uses **Hydra/OmegaConf** with dataclass-based configs for type safet
 Canonical YAML layout:
 
 - `connectomics/config/profiles/*.yaml`: section-level registries selected by `*.profile`
-- `connectomics/config/templates/*.yaml`: explicit list-item templates, currently for `inference.decoding`
+- `connectomics/config/templates/*.yaml`: explicit list-item templates, currently for top-level `decoding`
 - `tutorials/*.yaml`: runnable experiments that `_base_` the shared registries
 
 Canonical merge semantics:
 
 - Profile payloads are merged into the target section as the base config.
 - Explicit keys override profile keys.
 - Explicit lists replace profile lists; list overrides are not additive.
-- Canonical decoding expansion is explicit `template:` inside `inference.decoding`.
+- Canonical decoding expansion is explicit `template:` inside top-level `decoding`.
 - Do not introduce `decoding_profile` or `- profile: decoding_*` usages.
 
 **Config File Example** (`tutorials/lucchi.yaml`):
@@ -757,7 +757,7 @@ All previously identified technical debt items have been addressed. Below is the
 31. ~~Pass-through `create_volume_data_dicts()`~~ ✅ (removed)
 32. ~~Python 2 `__future__` imports~~ ✅ (removed)
 33. ~~`cfg.inference.*` references~~ ✅ (valid InferenceConfig in TestConfig, not legacy)
-34. ~~Legacy `test.decoding` fallback~~ ✅ (uses `inference.decoding` directly)
+34. ~~Legacy `test.decoding` fallback~~ ✅ (uses top-level `decoding` directly)
 35. ~~Unnecessary try-except for RSUNet import~~ ✅ (removed)
 36. ~~Hardcoded architecture list~~ ✅ (queries registry dynamically)
 37. ~~Duplicate `_to_plain_dict`/`_as_dict`~~ ✅ (consolidated to `config/dict_utils.py`)
 
@@ -186,14 +186,14 @@ just test lucchi++ outputs/lucchi++/$EXPERIMENT_DATE/checkpoints/best.ckpt
 
 - `tutorials/*.yaml`: runnable experiment configs
 - `connectomics/config/profiles/*.yaml`: section-level registries selected by `*.profile`
-- `connectomics/config/templates/*.yaml`: explicit list-item templates, currently used for `inference.decoding`
+- `connectomics/config/templates/*.yaml`: explicit list-item templates, currently used for top-level `decoding`
 
 Merge rule:
 
 - Profile payloads are merged into the target section as the base config.
 - Explicit keys in the tutorial/config override profile keys.
 - Explicit lists replace profile lists; they are not additive.
-- Canonical decoding syntax is explicit list expansion, for example `inference.decoding: [{template: decoding_waterz}]`.
+- Canonical decoding syntax is explicit list expansion, for example `decoding: [{template: decoding_waterz}]`.
 
 ---
 
 
@@ -393,6 +393,9 @@ def validate_config(cfg: Config) -> None:
         axes = str(getattr(chunking_cfg, "axes", "all")).lower()
         if axes not in {"all", "z"}:
             raise ValueError("inference.chunking.axes must be 'all' or 'z'")
+        output_mode = str(getattr(chunking_cfg, "output_mode", "decoded")).lower()
+        if output_mode not in {"decoded", "raw_prediction"}:
+            raise ValueError("inference.chunking.output_mode must be 'decoded' or 'raw_prediction'")
         chunk_size = getattr(chunking_cfg, "chunk_size", None)
         if not chunk_size or len(chunk_size) != 3:
             raise ValueError("inference.chunking.chunk_size must be a length-3 ZYX list")
@@ -615,7 +618,7 @@ def _validate_label_channel_capacity(selector_value: Any, *, path: str) -> None:
             )
 
     # 2d) Decoding kwargs channel selectors (*_channels)
-    decoding_cfg = getattr(cfg.inference, "decoding", None)
+    decoding_cfg = getattr(cfg, "decoding", None)
     decode_has_channel_selection = False
     decode_output_head = None
     decode_available_channels = out_channels
@@ -660,10 +663,10 @@ def _validate_label_channel_capacity(selector_value: Any, *, path: str) -> None:
                     continue
                 min_channels = infer_min_required_channels(
                     value,
-                    context=f"inference.decoding[{i}].kwargs.{key}",
+                    context=f"decoding[{i}].kwargs.{key}",
                 )
                 if min_channels is not None:
-                    path = f"inference.decoding[{i}].kwargs.{key}"
+                    path = f"decoding[{i}].kwargs.{key}"
                     if model_heads and decode_has_channel_selection:
                         if min_channels > decode_available_channels:
                             raise ValueError(
 
@@ -467,6 +467,7 @@ def apply(self, yaml_conf: DictConfig) -> DictConfig:
 _STAGE_TRAIN = "train"
 _STAGE_TEST = "test"
 _STAGE_TUNE = "tune"
+_STAGE_ROOT = ""
 
 
 def _stage_path(stage: str, rel_path: str) -> str:
@@ -606,8 +607,8 @@ def _build_reference_profile_specs() -> List[Tuple[str, List[str]]]:
 _LIST_REFERENCE_FAMILIES: List[Tuple[str, Tuple[str, ...], str, str]] = [
     (
         "decoding_templates",
-        (_STAGE_DEFAULT, _STAGE_TUNE, _STAGE_TEST),
-        "inference.decoding",
+        (_STAGE_ROOT, _STAGE_DEFAULT, _STAGE_TUNE, _STAGE_TEST),
+        "decoding",
         "decoding",
     ),
 ]