You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[03/12/2025] We provide VISTA3D as a baseline for the challenge "CVPR 2025: Foundation Models for Interactive 3D Biomedical Image Segmentation"([link](https://www.codabench.org/competitions/5263/)). The simplified code based on MONAI 1.4 is provided in the [here](./cvpr_workshop/).
@@ -21,7 +22,7 @@ limitations under the License.
21
22
## Overview
22
23
23
24
The **VISTA3D** is a foundation model trained systematically on 11,454 volumes encompassing 127 types of human anatomical structures and various lesions. It provides accurate out-of-the-box segmentation that matches state-of-the-art supervised models which are trained on each dataset. The model also achieves state-of-the-art zero-shot interactive segmentation in 3D, representing a promising step toward developing a versatile medical image foundation model.
For supported 127 classes, the model can perform highly accurate out-of-box segmentation. The fully automated process adopts a patch-based sliding-window inference and only requires a class prompt.
@@ -66,52 +67,124 @@ This capability makes the model even more flexible and accelerates practical seg
66
67
</figure>
67
68
</div>
68
69
69
-
### Fine-tuning
70
-
VISTA3D checkpoint showed improvements when finetuning in few-shot settings. Once a few annotated examples are provided, user can start finetune with the VISTA3D checkpoint.
Download the [model checkpoint](https://drive.google.com/file/d/1DRYA2-AI-UJ23W1VbjqHsnHENGi0ShUl/view?usp=sharing) and save it at ./models/model.pt.
83
83
84
-
### Inference
85
-
The [NIM Demo (VISTA3D NVIDIA Inference Microservices)](https://build.nvidia.com/nvidia/vista-3d) does not support medical data upload due to legal concerns.
86
-
We provide scripts for inference locally. The automatic segmentation label definition can be found at [label_dict](./data/jsons/label_dict.json). For exact number of supported automatic segmentation class and the reason, please to refer to [issue](https://github.com/Project-MONAI/VISTA/issues/41).
87
-
88
-
#### MONAI Bundle
89
-
90
-
For automatic segmentation and batch processing, we highly recommend using the MONAI model zoo. The [MONAI bundle](https://github.com/Project-MONAI/model-zoo/tree/dev/models/vista3d) wraps VISTA3D and provides a unified API for inference, and the [NIM Demo](https://build.nvidia.com/nvidia/vista-3d) deploys the bundle with an interactive front-end. Although NIM Demo cannot run locally, the bundle is available and can run locally. The following command will download the vista3d standalone bundle. The documentation in the bundle contains a detailed explanation for finetuning and inference.
84
+
## Inference
85
+
The current repo is the research codebase for the CVPR2025 paper, which is built on MONAI1.3. We converted the model into [MONAI bundle](https://github.com/Project-MONAI/model-zoo/tree/dev/models/vista3d) with improved GPU utilization and speed (the backend for the [demo](https://build.nvidia.com/nvidia/vista-3d)). The automatic segmentation label definition can be found at [label_dict](./data/jsons/label_dict.json). For exact number of supported automatic segmentation class and the reason, please to refer to [issue](https://github.com/Project-MONAI/VISTA/issues/41).
MONAI bundle accepts multiple json config files and input arguments. The latter configs/arguments will overide the previous configs/arguments if they have overlapping keys.
102
+
```python
103
+
# Automatic Segment everything
104
+
python -m monai.bundle run --config_file configs/inference.json --input_dict "{'image':'spleen_03.nii.gz'}
105
+
```
106
+
```python
107
+
# Automatic Segment specific class
108
+
python -m monai.bundle run --config_file configs/inference.json --input_dict "{'image':'spleen_03.nii.gz','label_prompt':[3]}
109
+
```
110
+
```python
111
+
# Interactive segmentation
112
+
# Points must be three dimensional (x,y,z) in the shape of [[x,y,z],...,[x,y,z]]. Point labels can only be -1(ignore), 0(negative), 1(positive) and 2(negative for special overlaped class like tumor), 3(positive for special class). Only supporting 1 class per inference. The output 255 represents NaN value which means not processed region.
113
+
python -m monai.bundle run --config_file configs/inference.json --input_dict "{'image':'spleen_03.nii.gz','points':[[128,128,16], [100,100,16]],'point_labels':[1, 0]}"
"$dist.is_initialized() or dist.init_process_group(backend='nccl')",
146
+
"$torch.cuda.set_device(@device)"
147
+
],
148
+
"run": [
149
+
"$@evaluator.run()"
150
+
],
151
+
"finalize": [
152
+
"$dist.is_initialized() and dist.destroy_process_group()"
153
+
]
154
+
}
155
+
```
156
+
</details>
157
+
158
+
### 1.1 Overlapped classes and postprocessing with [ShapeKit](https://arxiv.org/pdf/2506.24003)
159
+
VISTA3D is trained with binary segmentation, and may produce false positives due to weak false positive supervision. ShapeKit solves this problem with sophisticated postprocessing. ShapeKit requires segmentation mask for each class. VISTA3D by default performs argmax and collaps overlapping classes. Change the `monai.apps.vista3d.transforms.VistaPostTransformd` in `inference.json` to save each class segmentation as a separate channel. Then follow [ShapeKit](https://github.com/BodyMaps/ShapeKit) codebase for processing.
160
+
```json
161
+
{
162
+
"_target_": "Activationsd",
163
+
"sigmoid": true,
164
+
"keys": "pred"
165
+
},
95
166
```
96
167
97
-
#### Debugger
168
+
###2. VISTA3D Research repository (this repo)
98
169
99
170
We provide the `infer.py` script and its light-weight front-end `debugger.py`. User can directly lauch a local interface for both automatic and interactive segmentation.
100
171
101
-
```
172
+
```bash
102
173
python -m scripts.debugger run
103
174
```
104
175
or directly call infer.py to generate automatic segmentation. To segment a liver (label_prompt=1 as defined in [label_dict](./data/jsons/label_dict.json)), run
The output path and other configs can be changed in the `configs/infer.yaml`.
113
-
114
-
115
188
```
116
189
NOTE: `infer.py` does not support `lung`, `kidney`, and `bone` class segmentation while MONAI bundle supports those classes. MONAI bundle uses better memory management and will not easily face OOM issue.
117
190
```
@@ -146,6 +219,8 @@ For zero-shot, we perform iterative point sampling. To create a new zero-shot ev
146
219
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7;torchrun --nnodes=1 --nproc_per_node=8 -m scripts.validation.val_multigpu_point_iterative run --config_file "['configs/zeroshot_eval/infer_iter_point_hcc.yaml']"
147
220
```
148
221
### Finetune
222
+
VISTA3D checkpoint showed improvements when finetuning in few-shot settings. Once a few annotated examples are provided, user can start finetune with the VISTA3D checkpoint.
For finetuning, user need to change `label_set` and `mapped_label_set` in the json config, where `label_set` matches the index values in the groundtruth files. The `mapped_label_set` can be random selected but we recommend pick the most related global index defined in [label_dict](./data/jsons/label_dict.json). User should modify the transforms, resolutions, patch sizes e.t.c regarding to their dataset for optimal finetuning performances, we recommend using configs generated by auto3dseg. The learning rate 5e-5 should be good enough for finetuning purposes.
150
225
```
151
226
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7;torchrun --nnodes=1 --nproc_per_node=8 -m scripts.train_finetune run --config_file "['configs/finetune/train_finetune_word.yaml']"
We provide scripts to run SAM2 evaluation. Modify SAM2 source code to support background remove: Add `z_slice` to `sam2_video_predictor.py`. Require SAM2 package [installation](https://github.com/facebookresearch/segment-anything-2)
<figcaption> Initial comparison with SAM2's zero-shot performance. </figcaption>
189
-
</figure>
190
-
</div>
191
-
192
233
193
234
## Community
194
235
@@ -205,10 +246,10 @@ The codebase is under Apache 2.0 Licence. The model weight is released under [NV
205
246
206
247
```
207
248
@article{he2024vista3d,
208
-
title={VISTA3D: Versatile Imaging SegmenTation and Annotation model for 3D Computed Tomography},
249
+
title={VISTA3D: A Unified Segmentation Foundation Model For 3D Medical Imaging},
209
250
author={He, Yufan and Guo, Pengfei and Tang, Yucheng and Myronenko, Andriy and Nath, Vishwesh and Xu, Ziyue and Yang, Dong and Zhao, Can and Simon, Benjamin and Belue, Mason and others},
0 commit comments