Skip to content

Commit ecd3520

Browse files
Donglai Weiclaude
andcommitted
Reorganize tutorials, remove augmentation preset, fix skeleton coordinate bug
- Move mitoEM configs to tutorials/mitoEM/ with shortened names (H, R, HR, common) - Move neuron_nisb configs to tutorials/neuron_nisb/ subdirectory - Move config profiles/templates from configs/ to connectomics/config/ - Remove augmentation.preset tri-state; use per-transform enabled flags directly - Fix _batch_skeletonize: divide kimimaro physical coords by resolution to get voxel indices - Add root_path to mito_betaseg.yaml to simplify train/val/test paths - Add test_sdt_precomputed.py: verify precomputed skeleton SDT matches direct computation - Clean up obsolete tutorials/misc/ files and test_hydra_config.py Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 6f90acc commit ecd3520

65 files changed

Lines changed: 1001 additions & 4301 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.claude/reference/abiss.md

Lines changed: 0 additions & 51 deletions
This file was deleted.

.claude/reference/em_pipeline.md

Lines changed: 150 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,150 @@
1+
# em_pipeline Reference
2+
3+
**Location**: `/projects/weilab/weidf/lib/em_pipeline`
4+
5+
Multi-GPU/multi-core pipeline engine for large-scale 3D EM neuron segmentation and reconstruction. Implements a two-stage approach: voxel→supervoxel (waterz watershed) then supervoxel→instance (region graph + branch resolution). Designed for distributed, chunked processing of datasets like zebrafish brain (5700x10913x10664 voxels).
6+
7+
## Directory Structure
8+
9+
```
10+
em_pipeline/
11+
├── main.py # CLI entry point (106 lines), SLURM integration
12+
├── test.py # Dev/visualization script (983 lines)
13+
├── conf/
14+
│ ├── j0126.yml # Project config (zebrafish example)
15+
│ └── cluster.yml # Cluster/environment config
16+
├── db/ # Local database/cache (HDF5 files)
17+
├── em_pipeline/
18+
│ ├── tasks/
19+
│ │ ├── __init__.py # Task factory
20+
│ │ ├── task.py # Base Task class (34 lines)
21+
│ │ ├── waterz.py # Waterz segmentation tasks (217 lines)
22+
│ │ ├── branch.py # Branch resolution tasks (363 lines)
23+
│ │ ├── region_graph.py # Region graph / soma BFS (104 lines)
24+
│ │ └── eval.py # ERL evaluation (25 lines)
25+
│ └── lib/
26+
│ └── rpca.py # Robust PCA (68 lines)
27+
├── setup.py
28+
├── environment.yml
29+
└── README.md
30+
```
31+
32+
~1,933 lines total.
33+
34+
## Architecture
35+
36+
### Two-Stage Pipeline
37+
38+
```
39+
Affinity Predictions (HDF5/Zarr)
40+
↓ Stage 1: Voxel → Supervoxel
41+
Waterz Segmentation (per-chunk supervoxels)
42+
↓ Chunk merging (global segment IDs)
43+
↓ Soma constraint (BFS-based soma assignment)
44+
↓ Stage 2: Supervoxel → Instance
45+
Branch Resolution (S1/S2/S3 IOU+affinity checks)
46+
47+
Skeleton Generation (multi-scale morphology)
48+
```
49+
50+
### Task Class Hierarchy
51+
52+
```
53+
Task (base)
54+
├── WaterzTask # Chunk-wise waterz segmentation
55+
├── WaterzSoma2DTask # 2D soma-aware waterz (per z-slice)
56+
├── WaterzStatsTask # Consolidate stats across chunks
57+
├── BranchChunkTask # Per-chunk branch resolution
58+
│ ├── BranchBorderTask # Cross-chunk boundary handling
59+
│ └── BranchAllTask # Global aggregation + skeletonization
60+
├── RegionGraphChunkTask # Soma constraints on region graphs
61+
├── RegionGraphBorderTask
62+
└── ERLTask # Evaluation against ground truth
63+
```
64+
65+
## Usage
66+
67+
```bash
68+
# CLI entry point
69+
python main.py -c conf/j0126.yaml -t [task] -i [job_id] -n [job_num] -nc [num_cpu]
70+
71+
# Stage 1: Voxel → Supervoxel
72+
python main.py -c conf/j0126.yaml -t waterz
73+
python main.py -c conf/j0126.yaml -t waterz-stats
74+
python main.py -c conf/j0126.yaml -t rg-border
75+
python main.py -c conf/j0126.yaml -t rg-all
76+
77+
# Stage 2: Supervoxel → Instance
78+
python main.py -c conf/j0126.yaml -t branch-border -o relabel
79+
python main.py -c conf/j0126.yaml -t branch-all -o s2-4-8-8
80+
```
81+
82+
CLI arguments: `-c` config file, `-t` task name, `-i` job index, `-n` total jobs, `-nc` CPUs per job, `-p` SLURM partition.
83+
84+
## Configuration (YAML)
85+
86+
### Project Config (`conf/j0126.yml`)
87+
88+
| Section | Keys | Purpose |
89+
|---------|------|---------|
90+
| `im` | path, shape, tile_shape, res | Input image volume |
91+
| `mask` | blood_vessel, soma, border, soma_ratio, soma_id0/id1 | Segmentation constraints |
92+
| `aff` | path, aff_shape, low, high | Affinity predictions + thresholds |
93+
| `waterz` | mf, thres, num_z, nb, opt_frag, small_size, small_aff, small_dust, bg_thres | Watershed parameters |
94+
| `branch` | s1_iou, s1_sz, s1_rg, s3_iou, s3_sz, skel_dust | Branch resolution thresholds |
95+
| `rg` | thres_z | Region graph parameters |
96+
| `output` | path | Output directory |
97+
| `eval` | val, test | Evaluation datasets |
98+
99+
### Cluster Config (`conf/cluster.yml`)
100+
101+
Keys: `folder`, `env` (setup commands), `python` (executable), `num_gpu`, `memory`.
102+
103+
## Key Algorithms
104+
105+
### Waterz Task
106+
- Applies affinity masks (blood vessel, border)
107+
- Runs waterz agglomeration with configurable merge function and threshold
108+
- Generates per-chunk region graphs
109+
- Output: HDF5 with `seg`, `id`, `score` datasets
110+
111+
### Soma-Aware Waterz (2D)
112+
- Processes each z-slice independently
113+
- Integrates soma mask constraints via seeded watershed
114+
- Resolves soma-based false splits/merges
115+
116+
### Branch Resolution (3 stages)
117+
- **S1**: IOU-based merge scoring within chunks
118+
- **S2**: IOU best-buddy pairing (bidirectional agreement)
119+
- **S3**: One-sided IOU + affinity validation for remaining candidates
120+
121+
### Soma BFS
122+
- Breadth-first search to grow soma regions outward
123+
- Assigns non-soma segments to nearest soma
124+
- Handles ambiguous segments (multiple soma connections)
125+
126+
## Data Formats
127+
128+
- **HDF5** (.h5): Primary I/O for volumes and results
129+
- **Zarr** (.zarr): Large-scale chunked arrays
130+
- **Pickle**: Dask arrays and serialized objects
131+
- **PNG/TIFF**: 2D mask/image files
132+
133+
## Dependencies
134+
135+
### Core (from environment.yml)
136+
numpy 1.24.3, scipy, h5py, pyyaml 6.0.2, pillow 10.4.0, cc3d 3.18.0, fastremap 1.15.0, mahotas 1.4.18, zarr 2.16.1, dask 2023.5.0, networkx 3.1, cloudpickle 3.0.0
137+
138+
### External (installed separately)
139+
- **em_util**: `git clone git@github.com:PytorchConnectomics/em_util.git && pip install -e .`
140+
- **waterz**: `git clone -b affuint8 git@github.com:donglaiw/waterz.git && pip install -e .`
141+
- **zwatershed**: `git clone git@github.com:donglaiw/zwatershed.git && pip install -e .`
142+
- **em_erl**: Evaluation utilities (referenced in code)
143+
144+
## Design Principles
145+
146+
1. **Chunked processing**: Overlapping chunks for parallel execution on large volumes
147+
2. **Hierarchical resolution**: Sequential refinement from coarse to fine
148+
3. **Constraint integration**: Soma masks and boundary constraints guide segmentation
149+
4. **IOU-based merging**: Intersection-over-union drives segment agglomeration
150+
5. **Skeleton output**: Multi-scale neuron skeletons for morphological analysis

0 commit comments

Comments
 (0)