|
| 1 | +# em_pipeline Reference |
| 2 | + |
| 3 | +**Location**: `/projects/weilab/weidf/lib/em_pipeline` |
| 4 | + |
| 5 | +Multi-GPU/multi-core pipeline engine for large-scale 3D EM neuron segmentation and reconstruction. Implements a two-stage approach: voxel→supervoxel (waterz watershed) then supervoxel→instance (region graph + branch resolution). Designed for distributed, chunked processing of datasets like zebrafish brain (5700x10913x10664 voxels). |
| 6 | + |
| 7 | +## Directory Structure |
| 8 | + |
| 9 | +``` |
| 10 | +em_pipeline/ |
| 11 | +├── main.py # CLI entry point (106 lines), SLURM integration |
| 12 | +├── test.py # Dev/visualization script (983 lines) |
| 13 | +├── conf/ |
| 14 | +│ ├── j0126.yml # Project config (zebrafish example) |
| 15 | +│ └── cluster.yml # Cluster/environment config |
| 16 | +├── db/ # Local database/cache (HDF5 files) |
| 17 | +├── em_pipeline/ |
| 18 | +│ ├── tasks/ |
| 19 | +│ │ ├── __init__.py # Task factory |
| 20 | +│ │ ├── task.py # Base Task class (34 lines) |
| 21 | +│ │ ├── waterz.py # Waterz segmentation tasks (217 lines) |
| 22 | +│ │ ├── branch.py # Branch resolution tasks (363 lines) |
| 23 | +│ │ ├── region_graph.py # Region graph / soma BFS (104 lines) |
| 24 | +│ │ └── eval.py # ERL evaluation (25 lines) |
| 25 | +│ └── lib/ |
| 26 | +│ └── rpca.py # Robust PCA (68 lines) |
| 27 | +├── setup.py |
| 28 | +├── environment.yml |
| 29 | +└── README.md |
| 30 | +``` |
| 31 | + |
| 32 | +~1,933 lines total. |
| 33 | + |
| 34 | +## Architecture |
| 35 | + |
| 36 | +### Two-Stage Pipeline |
| 37 | + |
| 38 | +``` |
| 39 | +Affinity Predictions (HDF5/Zarr) |
| 40 | + ↓ Stage 1: Voxel → Supervoxel |
| 41 | +Waterz Segmentation (per-chunk supervoxels) |
| 42 | + ↓ Chunk merging (global segment IDs) |
| 43 | + ↓ Soma constraint (BFS-based soma assignment) |
| 44 | + ↓ Stage 2: Supervoxel → Instance |
| 45 | +Branch Resolution (S1/S2/S3 IOU+affinity checks) |
| 46 | + ↓ |
| 47 | +Skeleton Generation (multi-scale morphology) |
| 48 | +``` |
| 49 | + |
| 50 | +### Task Class Hierarchy |
| 51 | + |
| 52 | +``` |
| 53 | +Task (base) |
| 54 | +├── WaterzTask # Chunk-wise waterz segmentation |
| 55 | +├── WaterzSoma2DTask # 2D soma-aware waterz (per z-slice) |
| 56 | +├── WaterzStatsTask # Consolidate stats across chunks |
| 57 | +├── BranchChunkTask # Per-chunk branch resolution |
| 58 | +│ ├── BranchBorderTask # Cross-chunk boundary handling |
| 59 | +│ └── BranchAllTask # Global aggregation + skeletonization |
| 60 | +├── RegionGraphChunkTask # Soma constraints on region graphs |
| 61 | +├── RegionGraphBorderTask |
| 62 | +└── ERLTask # Evaluation against ground truth |
| 63 | +``` |
| 64 | + |
| 65 | +## Usage |
| 66 | + |
| 67 | +```bash |
| 68 | +# CLI entry point |
| 69 | +python main.py -c conf/j0126.yaml -t [task] -i [job_id] -n [job_num] -nc [num_cpu] |
| 70 | + |
| 71 | +# Stage 1: Voxel → Supervoxel |
| 72 | +python main.py -c conf/j0126.yaml -t waterz |
| 73 | +python main.py -c conf/j0126.yaml -t waterz-stats |
| 74 | +python main.py -c conf/j0126.yaml -t rg-border |
| 75 | +python main.py -c conf/j0126.yaml -t rg-all |
| 76 | + |
| 77 | +# Stage 2: Supervoxel → Instance |
| 78 | +python main.py -c conf/j0126.yaml -t branch-border -o relabel |
| 79 | +python main.py -c conf/j0126.yaml -t branch-all -o s2-4-8-8 |
| 80 | +``` |
| 81 | + |
| 82 | +CLI arguments: `-c` config file, `-t` task name, `-i` job index, `-n` total jobs, `-nc` CPUs per job, `-p` SLURM partition. |
| 83 | + |
| 84 | +## Configuration (YAML) |
| 85 | + |
| 86 | +### Project Config (`conf/j0126.yml`) |
| 87 | + |
| 88 | +| Section | Keys | Purpose | |
| 89 | +|---------|------|---------| |
| 90 | +| `im` | path, shape, tile_shape, res | Input image volume | |
| 91 | +| `mask` | blood_vessel, soma, border, soma_ratio, soma_id0/id1 | Segmentation constraints | |
| 92 | +| `aff` | path, aff_shape, low, high | Affinity predictions + thresholds | |
| 93 | +| `waterz` | mf, thres, num_z, nb, opt_frag, small_size, small_aff, small_dust, bg_thres | Watershed parameters | |
| 94 | +| `branch` | s1_iou, s1_sz, s1_rg, s3_iou, s3_sz, skel_dust | Branch resolution thresholds | |
| 95 | +| `rg` | thres_z | Region graph parameters | |
| 96 | +| `output` | path | Output directory | |
| 97 | +| `eval` | val, test | Evaluation datasets | |
| 98 | + |
| 99 | +### Cluster Config (`conf/cluster.yml`) |
| 100 | + |
| 101 | +Keys: `folder`, `env` (setup commands), `python` (executable), `num_gpu`, `memory`. |
| 102 | + |
| 103 | +## Key Algorithms |
| 104 | + |
| 105 | +### Waterz Task |
| 106 | +- Applies affinity masks (blood vessel, border) |
| 107 | +- Runs waterz agglomeration with configurable merge function and threshold |
| 108 | +- Generates per-chunk region graphs |
| 109 | +- Output: HDF5 with `seg`, `id`, `score` datasets |
| 110 | + |
| 111 | +### Soma-Aware Waterz (2D) |
| 112 | +- Processes each z-slice independently |
| 113 | +- Integrates soma mask constraints via seeded watershed |
| 114 | +- Resolves soma-based false splits/merges |
| 115 | + |
| 116 | +### Branch Resolution (3 stages) |
| 117 | +- **S1**: IOU-based merge scoring within chunks |
| 118 | +- **S2**: IOU best-buddy pairing (bidirectional agreement) |
| 119 | +- **S3**: One-sided IOU + affinity validation for remaining candidates |
| 120 | + |
| 121 | +### Soma BFS |
| 122 | +- Breadth-first search to grow soma regions outward |
| 123 | +- Assigns non-soma segments to nearest soma |
| 124 | +- Handles ambiguous segments (multiple soma connections) |
| 125 | + |
| 126 | +## Data Formats |
| 127 | + |
| 128 | +- **HDF5** (.h5): Primary I/O for volumes and results |
| 129 | +- **Zarr** (.zarr): Large-scale chunked arrays |
| 130 | +- **Pickle**: Dask arrays and serialized objects |
| 131 | +- **PNG/TIFF**: 2D mask/image files |
| 132 | + |
| 133 | +## Dependencies |
| 134 | + |
| 135 | +### Core (from environment.yml) |
| 136 | +numpy 1.24.3, scipy, h5py, pyyaml 6.0.2, pillow 10.4.0, cc3d 3.18.0, fastremap 1.15.0, mahotas 1.4.18, zarr 2.16.1, dask 2023.5.0, networkx 3.1, cloudpickle 3.0.0 |
| 137 | + |
| 138 | +### External (installed separately) |
| 139 | +- **em_util**: `git clone git@github.com:PytorchConnectomics/em_util.git && pip install -e .` |
| 140 | +- **waterz**: `git clone -b affuint8 git@github.com:donglaiw/waterz.git && pip install -e .` |
| 141 | +- **zwatershed**: `git clone git@github.com:donglaiw/zwatershed.git && pip install -e .` |
| 142 | +- **em_erl**: Evaluation utilities (referenced in code) |
| 143 | + |
| 144 | +## Design Principles |
| 145 | + |
| 146 | +1. **Chunked processing**: Overlapping chunks for parallel execution on large volumes |
| 147 | +2. **Hierarchical resolution**: Sequential refinement from coarse to fine |
| 148 | +3. **Constraint integration**: Soma masks and boundary constraints guide segmentation |
| 149 | +4. **IOU-based merging**: Intersection-over-union drives segment agglomeration |
| 150 | +5. **Skeleton output**: Multi-scale neuron skeletons for morphological analysis |
0 commit comments