|
| 1 | +# pytc-deploy |
| 2 | + |
| 3 | +**Location:** `/projects/weilab/weidf/lib/pytc-deploy` |
| 4 | +**License:** MIT (2024, donglai) |
| 5 | +**Purpose:** Deployment/workflow management for EM connectomics data processing pipelines using PyTorch Connectomics. Orchestrates large-scale segmentation, instance merging, and visualization on SLURM clusters. |
| 6 | + |
| 7 | +## Repository Structure |
| 8 | + |
| 9 | +``` |
| 10 | +pytc-deploy/ |
| 11 | +├── util/ # Shared utility modules |
| 12 | +│ ├── __init__.py |
| 13 | +│ ├── args.py # CLI argument parsing |
| 14 | +│ └── task.py # Core segmentation algorithms |
| 15 | +├── mito-h01/ # H01 dataset mitochondria processing |
| 16 | +│ ├── main.py # Pipeline orchestration (305 lines) |
| 17 | +│ ├── const.py # Dataset constants |
| 18 | +│ └── param.yml # SLURM/path configuration |
| 19 | +├── nuc-worm/ # C. elegans nucleus/worm processing |
| 20 | +│ ├── main.py # Pipeline orchestration (302 lines, mirrors mito-h01) |
| 21 | +│ ├── const.py # Dataset constants |
| 22 | +│ └── param.yml # SLURM/path configuration |
| 23 | +└── syn-alzhemier/ # Alzheimer's synapse analysis |
| 24 | + └── main.py # Multi-step pipeline (858 lines) |
| 25 | +``` |
| 26 | + |
| 27 | +## CLI Entry Point |
| 28 | + |
| 29 | +All projects use: |
| 30 | +```bash |
| 31 | +python main.py -t <task> [flags] |
| 32 | +``` |
| 33 | + |
| 34 | +## util/args.py — `get_parser()` |
| 35 | + |
| 36 | +Returns an `ArgumentParser` with these flags: |
| 37 | + |
| 38 | +| Flag | Default | Purpose | |
| 39 | +|------|---------|---------| |
| 40 | +| `-t, --task` | `""` | Task name to execute | |
| 41 | +| `-s, --cmd` | `""` | SLURM command | |
| 42 | +| `-e, --env` | `"imu"` | Conda environment name | |
| 43 | +| `-ji, --job-id` | `0` | Job ID for parallel processing | |
| 44 | +| `-jn, --job-num` | `1` | Total number of jobs | |
| 45 | +| `-cn, --chunk-num` | `1` | Number of chunks | |
| 46 | +| `-n, --neuron` | `""` | Neuron IDs (comma-separated) | |
| 47 | +| `-r, --ratio` | `"1,1,1"` | Downsample ratio (Z,Y,X) | |
| 48 | +| `-cp, --partition` | `"lichtman"` | SLURM partition | |
| 49 | +| `-cm, --memory` | `"50GB"` | Memory allocation | |
| 50 | +| `-ct, --run-time` | `"0-12:00"` | Job runtime | |
| 51 | +| `-cg, --num_gpu` | `-1` | Number of GPUs | |
| 52 | + |
| 53 | +## util/task.py — Core Algorithms |
| 54 | + |
| 55 | +### `generate_jobs_dl(conf, neuron, job_num=1, mem='50GB', run_time='1-00:00', job_order=1)` |
| 56 | +Generates SLURM batch scripts for deep learning inference using PyTorch Connectomics. |
| 57 | + |
| 58 | +### `neuron_to_tile(neuron, zid, zran, f_box, f_seg)` |
| 59 | +Maps neuron IDs to tile coordinates. Returns bounding box and tile bounding boxes for a neuron. |
| 60 | + |
| 61 | +### `seg_zran_merge(f_zran_p, job_num)` |
| 62 | +Merges Z-range (min/max Z) data from parallel jobs. Returns merged ID array and Z-range array. |
| 63 | + |
| 64 | +### `seg_zran_p(f_box, job_id, job_num)` |
| 65 | +Computes Z-range for each segmentation ID in parallel. Returns array of `[ID, min_z, max_z]`. |
| 66 | + |
| 67 | +### `seg_bbox_p(f_seg, f_box, job_id, job_num)` |
| 68 | +Computes bounding boxes for all segmented objects in parallel, slice-by-slice. Writes to HDF5. |
| 69 | + |
| 70 | +### `remove_small_instances(segm, thres_small=128, mode='background')` |
| 71 | +Removes spurious small instances from segmentation. |
| 72 | +- **Modes:** `none`, `background` (3D), `background_2d`, `neighbor` (merge with nearest, 3D), `neighbor_2d` |
| 73 | + |
| 74 | +### `bc_watershed(volume, thres1=0.9, thres2=0.8, thres3=0.85, thres_small=128, scale_factors=(1.0,1.0,1.0), remove_small_mode='background', seed_thres=32, precomputed_seed=None)` |
| 75 | +Converts binary foreground probability + instance contour maps to instance masks using watershed. |
| 76 | +- `volume`: Shape `(C, Z, Y, X)` with 2 channels (foreground, boundary) |
| 77 | +- `thres1`: Seed threshold (0.9) |
| 78 | +- `thres2`: Contour threshold (0.8) |
| 79 | +- `thres3`: Foreground threshold (0.85) |
| 80 | + |
| 81 | +### `mito_watershed_iou(f_mito_ws_func, arr_mito)` |
| 82 | +Computes IoU between adjacent tiles for instance matching across X, Y, Z directions. |
| 83 | + |
| 84 | +### `mito_neuron_sid(f_mito_ws, arr_mito, ratio=0.6)` |
| 85 | +Finds mitochondrial instance IDs within a neuron mask, filtered by overlap ratio (default 60%). |
| 86 | + |
| 87 | +## mito-h01/main.py — Tasks |
| 88 | + |
| 89 | +| Task | Description | |
| 90 | +|------|-------------| |
| 91 | +| `seg-bbox` | Compute bounding boxes per segmentation slice | |
| 92 | +| `seg-zran_p` | Compute Z-ranges in parallel | |
| 93 | +| `seg-zran` | Merge Z-range data from parallel jobs | |
| 94 | +| `neuron-tile` | Map neuron IDs to tile coordinates | |
| 95 | +| `mito-folder` | Create output directory structure | |
| 96 | +| `mito-ts` | Write TensorStore config pickle | |
| 97 | +| `mito-neuron-watershed` | Decode U-Net predictions to instances via watershed | |
| 98 | +| `mito-neuron-watershed-iou` | Compute IoU between adjacent tiles | |
| 99 | +| `mito-neuron-check` | Verify file completeness | |
| 100 | +| `mito-neuron-sid` | Extract mito instance IDs within neuron mask | |
| 101 | +| `mito-neuron-sid-count` | Cumulative count of instance IDs | |
| 102 | +| `mito-neuron-sid-iou` | Merge instances across tiles using IoU + UnionFind | |
| 103 | +| `mito-neuron-export` | Generate final HDF5 with instance relabeling | |
| 104 | +| `mito-neuron-export-ds` | Downsample exported segmentation | |
| 105 | +| `mito-neuron-ng` | Create Neuroglancer-compatible tiles | |
| 106 | +| `mito-neuron-mesh` | Generate 3D mesh from segmentation | |
| 107 | +| `mito-neuron-test` | Debugging/testing | |
| 108 | +| `slurm` | Generate and submit SLURM batch jobs | |
| 109 | + |
| 110 | +## mito-h01/const.py — Dataset Constants |
| 111 | + |
| 112 | +- `neuron_volume_size = [1324, 15552, 27072]` (Z, Y, X voxels) |
| 113 | +- `neuron_volume_offset = [0, 2560, 3520]` |
| 114 | +- `neuron_tile_size = [25, 128, 128]` |
| 115 | +- `mito_volume_ratio = [4, 16, 16]` (mito resolution vs neuron) |
| 116 | +- `mito_tile_size = [100, 2048, 2048]` |
| 117 | +- `neuron_id = [590612150, 36750893213]` |
| 118 | + |
| 119 | +## nuc-worm/ — C. elegans Nucleus Processing |
| 120 | + |
| 121 | +Code-identical to `mito-h01/` (same pipeline structure, different dataset parameters). |
| 122 | + |
| 123 | +## syn-alzhemier/main.py — Alzheimer's Synapse Pipeline |
| 124 | + |
| 125 | +Multi-step pipeline driven by numeric option codes: |
| 126 | + |
| 127 | +| Option | Description | |
| 128 | +|--------|-------------| |
| 129 | +| `0.x` | Image preprocessing: extract frames, VAST-to-HDF5 conversion, downsampling | |
| 130 | +| `2.x` | Vesicle processing: extraction, annotation processing, mito mask application | |
| 131 | +| `3.x` | Data export & validation: range checks, consistency, bbox fixes | |
| 132 | +| `4.x` | Vesicle classification: patch extraction, Laplacian quality scores, sorting | |
| 133 | +| `5.0-5.2` | Load TIF stacks, convert to HDF5 | |
| 134 | +| `5.3-5.4` | Tissue sample preparation and decoding | |
| 135 | +| `5.5` | Generate test file list (72x9x8 = 5184 tiles) + SLURM jobs | |
| 136 | +| `5.6x` | Instance merging: extract IDs, merge across tiles (IoU + UnionFind), relabel | |
| 137 | +| `5.63` | TensorStore upload to Google Cloud (multi-scale pyramid: 1x, 4x, 8x) | |
| 138 | +| `6.x` | Cell segmentation visualization, Neuroglancer setup | |
| 139 | + |
| 140 | +### Key Functions in syn-alzhemier: |
| 141 | +- **`merge_syn_ins()`**: Merges pre/post-synaptic instances across tile boundaries using UnionFind |
| 142 | + |
| 143 | +## Data Flow (Mito-h01 Pipeline) |
| 144 | + |
| 145 | +``` |
| 146 | +Raw segmentation → [seg-bbox] → [seg-zran_p] → [seg-zran] |
| 147 | + → [neuron-tile] → U-Net inference → [mito-neuron-watershed] |
| 148 | + → [mito-neuron-sid] → [mito-neuron-sid-iou] |
| 149 | + → [mito-neuron-export] → [mito-neuron-export-ds] |
| 150 | + → [mito-neuron-ng/mesh] |
| 151 | +``` |
| 152 | + |
| 153 | +## Key Algorithms |
| 154 | + |
| 155 | +1. **Watershed Segmentation**: Seed detection + watershed flooding for pixel-to-instance conversion |
| 156 | +2. **UnionFind**: Disjoint set union for tracking/merging connected instances across tiles |
| 157 | +3. **IoU-Based Merging**: Matches instances across tile boundaries by overlap threshold |
| 158 | +4. **Tile-Based Parallelism**: SLURM job arrays for memory-efficient large-volume processing |
| 159 | +5. **Multi-Scale Pyramids**: Downsampled representations for TensorStore/Neuroglancer visualization |
| 160 | + |
| 161 | +## Dependencies |
| 162 | + |
| 163 | +- **Core:** numpy, scipy, h5py, cv2, scikit-image, imageio |
| 164 | +- **EM Utilities:** em_util (I/O, clustering, Neuroglancer helpers) |
| 165 | +- **Segmentation:** cc3d, fastremap |
| 166 | +- **Cloud Storage:** tensorstore (Google Cloud) |
| 167 | +- **Custom:** T_util, T_util_seg |
0 commit comments