forked from WEHI-SODA-Hub/sp_segment
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathro-crate-metadata.json
More file actions
274 lines (274 loc) · 23.5 KB
/
Copy pathro-crate-metadata.json
File metadata and controls
274 lines (274 loc) · 23.5 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
{
"@context": "https://w3id.org/ro/crate/1.1/context",
"@graph": [
{
"@id": "./",
"@type": "Dataset",
"creativeWorkStatus": "InProgress",
"datePublished": "2025-08-06T02:29:12+00:00",
"description": "<h1>\n <picture>\n <source media=\"(prefers-color-scheme: dark)\" srcset=\"assets/wehi-soda-hub-sp_segment_logo_dark.png\">\n <img alt=\"WEHI-SODA-Hub/sp_segment\" src=\"assets/wehi-soda-hub-sp_segment_logo_light.png\">\n </picture>\n</h1>\n\n[](https://github.com/WEHI-SODA-Hub/sp_segment/actions/workflows/ci.yml)\n[](https://github.com/WEHI-SODA-Hub/sp_segment/actions/workflows/linting.yml)[](https://doi.org/10.5281/zenodo.17103183)\n[](https://www.nf-test.com)\n\n[](https://www.nextflow.io/)\n[](https://github.com/nf-core/tools/releases/tag/3.3.2)\n[](https://docs.conda.io/en/latest/)\n[](https://www.docker.com/)\n[](https://sylabs.io/docs/)\n[](https://cloud.seqera.io/launch?pipeline=https://github.com/WEHI-SODA-Hub/sp_segment)\n\n## Introduction\n\n**WEHI-SODA-Hub/sp_segment** is a pipeline for running cell segmentation\non COMET and MIBI data. For COMET, background subtraction can be performed\nfollowed by patched cellpose segmentation, non-patched mesmer segmentation, or\nCellSAM foundation model segmentation. For MIBI, mesmer or CellSAM segmentation\ncan be run. Whole-cell and nuclear segmentations are run separately, and then\nconsolidated into whole cells with nuclei with full shape and intensity\nmeasurements per compartment. The output GeoJSON files can be viewed in QuPath.\n\n<details>\n <summary>Click to view Mermaid diagram</summary>\n ```mermaid\n flowchart TD\n A(\"COMET TIFF\") --> B[\"Extract markers\"]\n B --> C[\"Background\n subtraction\"]\n C --> D{\"Segmentation\n method\"} & O[\"Backsub TIFF\"]\n N(\"COMET/MIBI TIFF\") --> D\n D -- Cellpose (COMET only) --> S[\"Combine\n channels\"]\n S --> E[\"sopa convert\"]\n E --> F[\"sopa patchify\"]\n F --> G[\"cellpose\n (nuclear)\"]\n F --> H[\"cellpose\n (whole-cell)\"]\n G --> I[\"sopa resolve\"]\n H --> I\n I --> J[\"parquet to tiff\"]\n J --> K[\"Cell measurement\"]\n D -- Mesmer (COMET/MIBI) --> L[\"mesmer\n (nuclear)\"]\n D -- Mesmer (COMET/MIBI) --> M[\"mesmer\n (whole-cell)\"]\n L --> K\n M --> K\n K --> P(\"GeoJSON\")\n K --> Q[\"segmentation\n report\"]\n Q --> R(\"html file\")\n```\n</details>\n\n\n\nThe pipeline uses the following tools:\n\n- [Background_subtraction](https://github.com/SchapiroLabor/Background_subtraction)\n -- background subtraction tool for COMET.\n- [MesmerSegmentation](https://github.com/WEHI-SODA-Hub/mesmersegmentation) -- a\n CLI for running Mesmer segmentation of MIBI and OME-XML TIFFs.\n- [CellSAM](https://github.com/vanvalenlab/cellSAM) -- a foundation model for\n cell segmentation across diverse imaging modalities.\n- [cellmeasurement](https://github.com/WEHI-SODA-Hub/cellmeasurement) -- a\n Groovy app that matches whole-cell segmentations with nuclei, and uses the\n QuPath API to calculate compartment measurements and intensities.\n- [KRONOS](https://github.com/mahmoodlab/KRONOS) -- a foundation model for\n multiplex spatial proteomics that extracts rich embeddings for each cell.\n- [sopa](https://github.com/gustaveroussy/sopa) -- we use the sopa CLI tool to\n patchify images and perform cellpose segmentation.\n- [spatialVis](https://github.com/WEHI-SODA-Hub/spatialVis) -- R package for spatial\n analyses, used to generate plots for the segmentation report.\n\nPlease see the [docs for more detailed information on pipeline usage and output](docs/README.md)\n\n## Usage\n\n> [!NOTE]\n> If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up Nextflow. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `-profile test` (to test cellpose segmentation) or `-profile test_mesmer` to test mesmer segmentation before running the workflow on actual data.\n\nIf you are running this pipeline from WEHI, it has been set up to run on [Seqera Platform](https://seqera.services.biocommons.org.au/).\n\n> [!NOTE]\n> If you don't have a .gradle directory in your home, make sure you create it with `mkdir $HOME/.gradle` before running the pipeline. You don't need to do this if you are running via WEHI's Seqera Platform mentioned above.\n\nUsage will depend on your desired steps. See [usage docs](docs/usage.md) for more detailed information.\n\n### Background subtraction\n\n> [!NOTE]\n> This step will only work with COMET OME-TIF files.\n\nPrepare a sample sheet as follows:\n\n`samplesheet.csv`:\n\n```csv\nsample,run_backsub,run_mesmer,run_cellpose,run_cellsam,tiff\nsample1,true,true,false,false,/path/to/sample1.tiff\nsample2,true,false,false,true,/path/to/sample2.tiff\n```\n\nYou may also prefer to use YAML for your samplesheet, either is supported:\n\n`samplesheet.yml`:\n\n```yaml\n- sample: sample1\n run_backsub: true\n run_mesmer: true\n run_cellpose: false\n run_cellsam: false\n tiff: /path/to/sample1.tiff\n- sample: sample2\n run_backsub: true\n run_mesmer: true\n run_cellpose: false\n run_cellsam: false\n tiff: /path/to/sample2.tiff\n```\n\n> [!WARNING]\n> Please ensure that your image name and all directories in your path do not contain spaces.\n\nIf you don't specify any segmentation algorithm to run (mesmer, cellpose, or cellsam), the pipeline will run a background subtraction only.\n\nNow, you can run the pipeline using:\n\n```bash\nnextflow run WEHI-SODA-Hub/sp_segment \\\n -profile <docker/singularity/.../institute> \\\n --input samplesheet.csv \\\n --outdir <OUTDIR>\n```\n\n> [!WARNING]\n> Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_; see [docs](https://nf-co.re/docs/usage/getting_started/configuration#custom-configuration-files).\n\n### Mesmer segmentation\n\nBefore running Mesmer, ensure that you have a [deepcell access token](https://users.deepcell.org/login/)\nand that you have set it in your Nextflow secrets:\n\n```bash\nnextflow secrets set DEEPCELL_ACCESS_TOKEN $YOUR_TOKEN\n```\n\nIf you want to run Mesmer as your segmentation algorithm, you can specify a\nconfig file like so:\n\n```csv\nsample,run_backsub,run_mesmer,tiff,nuclear_channel,membrane_channels\nsample1,true,true,/path/to/sample1.tiff,DAPI,CD45:CD8\nsample2,false,true,/path/to/sample2.tiff,DAPI,CD45\n```\n\nNuclear channels only support one entry; membrane channels may have multiple\nvalues separated by `:` characters. If your channels have spaces in them, make\nsure that you surround your channel name with quotes. For example, CD45:\"HLA I\".\n\nYou can also set the segmentation parameters for mesmer either via CLI\n(e.g., `--combine_method prod` or in a config file pass to the workflow\nvia `-c`. See [usage](docs/usage.md) for a full list.\n\n> [!NOTE]\n> You cannot run multiple segmentation methods (Mesmer, Cellpose, or CellSAM) on the same sample (with the same name). If you want to run multiple methods on a sample, put it on a different line and give it a different sample name.\n\n### Cellpose segmentation\n\nIf you want to run Cellpose as your segmentation algorithm, you can specify a\nconfig file like so:\n\n```csv\nsample,run_backsub,run_cellpose,tiff,nuclear_channel,membrane_channels\nsample1,true,true,/path/to/sample1.tiff,DAPI,CD45:CD8\nsample2,false,true,/path/to/sample2.tiff,DAPI,CD45\n```\n\nAs with Mesmer, nuclear channels only support one entry; membrane channels may\nhave multiple values separated by `:` characters. You can also set the following\nparameters, either via CLI (e.g., `--combine_method prod` or in a config\nfile pass to the workflow via `-c`. See [usage](docs/usage.md) for a full list.\n\nCellpose will run in a parallelised patched workflow using sopa. To control the\npatching process, you can use the `patch_width_pixel` and `patch_overlap_pixel`\nparameters.\n\nIf you want to skip measurements (this may take some time for large images), you\ncan use set the parameter `skip_measurements` to `true`.\n\n### KRONOS embeddings\n\nKRONOS is a foundation model for multiplex spatial proteomics that extracts rich embeddings for each cell. These embeddings capture cellular phenotype and microenvironment context, enabling downstream analysis like clustering, classification, and spatial analysis.The authors say they will be updating and maintaining kronos with updates from dino v2 to v3 ect. We shall see.\n\nTo enable KRONOS embeddings:\n\n```bash\nnextflow run main.nf \\\n --input samplesheet.csv \\\n --skip_kronos false \\\n --kronos_model_path /path/to/kronos_model \\\n --kronos_marker_metadata /path/to/marker_metadata.csv \\\n --kronos_merge_geojson true \\\n ...\n```\n\n#### KRONOS parameters\n\n- `--skip_kronos` (default: true): Set to `false` to enable KRONOS embedding extraction\n- `--kronos_model_path` (required): Path to the KRONOS model checkpoint (.pt file)\n- `--kronos_marker_metadata` (required): Path to marker metadata CSV file mapping marker IDs to names\n- `--kronos_merge_geojson` (default: false): Merge embeddings into the cellmeasurement GeoJSON output\n- `--kronos_patch_size` (default: 64): Patch size for cell-centered crops\n- `--kronos_batch_size` (default: 32): Batch size for model inference\n- `--kronos_num_workers` (default: 4): Number of DataLoader workers for parallel data loading\n- `--kronos_max_value` (default: 65535): Maximum intensity value for normalization\n- `--kronos_marker_mapping` (optional): JSON string mapping image marker names to KRONOS marker names\n\n#### Embeddings for filtered data with KRONOS\n\nWhen `--kronos_merge_geojson` is enabled, the pipeline automatically creates a new segmentation mask directly from the GeoJSON polygons. This ensures **100% perfect matching** between KRONOS embeddings and cells in the GeoJSON output, eliminating missing embeddings that would otherwise occur due to cell filtering in upstream segmentation/measurement steps.\n\n#### Output files\n\nKRONOS produces the following outputs:\n\n- `*_kronos_embeddings.csv`: CSV file with cell IDs, centroids, and 384 embedding dimensions\n- `*_marker_report.txt`: Report showing which image channels were matched to KRONOS markers\n- `*_kronos_merged.geojson` (if `--kronos_merge_geojson=true`): GeoJSON file with embeddings added as cell properties\n\nThe merged GeoJSON file contains all original cell measurements plus additional features (`kronos_emb_0` through `kronos_emb_#`), enabling integrated analysis of morphology, intensity, and KRONOS embeddings.\n\n#### Marker matching (This is Important)\n\nKRONOS expects specific marker names based on its training data. The pipeline automatically performs case-insensitive matching between your image channel names and the KRONOS marker metadata. For markers that don't auto-match, use `--kronos_marker_mapping`:\n\n```bash\n--kronos_marker_mapping '{\"CD3e\": \"CD3E\", \"PanCK\": \"PANCK\"}'\n```\n\nFor COMET data with fluorophore suffixes in channel names, you can map them like this:\n\n```bash\n--kronos_marker_mapping '{\"DAPI\": \"DAPI\", \"FOXP3_T - TRITC\": \"FOXP3\", \"CD3_T - Cy5\": \"CD3\"}'\n```\n\nFor more information about KRONOS, see the [KRONOS GitHub repository](https://github.com/mahmoodlab/KRONOS).\n\n### CellSAM segmentation\n\nCellSAM is a foundation model for cell segmentation that works across different\nimaging modalities. To use CellSAM as your segmentation algorithm, specify a\nconfig file like so:\n\n```csv\nsample,run_backsub,run_cellsam,tiff,nuclear_channel,membrane_channels\nsample1,true,true,/path/to/sample1.tiff,DAPI,CD45:CD8\nsample2,false,true,/path/to/sample2.tiff,DAPI,CD45\n```\n\nNuclear channels only support one entry; membrane channels may have multiple\nvalues separated by `:` characters. If your channels have spaces in them, make\nsure that you surround your channel name with quotes.\n\nCellSAM uses a tiling approach for large images and supports the following\nparameters:\n\n- `--cellsam_bbox_threshold` (default: 0.4): Confidence threshold for cell detection\n- `--cellsam_block_size` (default: 1024): Size of tiles for processing\n- `--cellsam_overlap` (default: 56): Tile overlap for merging\n- `--cellsam_iou_threshold` (default: 0.5): IOU threshold for label merging\n- `--cellsam_use_wsi` (default: true): Enable tiling for large images\n\n#### Model weights\n\nCellSAM can automatically download the latest model weights (v1.2) from\n[users.deepcell.org](https://users.deepcell.org). To use the latest weights:\n\n1. Create an account at [users.deepcell.org](https://users.deepcell.org)\n2. Generate your access token\n3. Set it as a Nextflow secret:\n ```bash\n nextflow secrets set DEEPCELL_ACCESS_TOKEN $YOUR_TOKEN\n ```\n\nIf the token is not set, CellSAM will use the default bundled model weights.\n\n> [!NOTE]\n> You cannot run both Mesmer/Cellpose and CellSAM segmentation on the same sample\n> (with the same name). If you want to run multiple methods on a sample, put it\n> on a different line and give it a different sample name.\n\n## Dealing with large images\n\nYou can run the pipeline with different profiles for different size images:\n\n- `small`: for images <150GB\n- `medium`: for images <300GB\n- `large`: for images <600GB\n\n> [!WARNING]\n> If you are combining many membrane channels, using `prod` as the combine method\n> may lead to large memory usage. In these cases, it is recommended to use `max`\n> instead.\n\n## Credits\n\nWEHI-SODA-Hub/sp_segment was originally written by the WEHI SODA-Hub.\n\nWe thank the following people for their extensive assistance in the development of this pipeline:\n\n- Michael McKay (@mikemcka)\n- Emma Watson\n\n## Contributions and Support\n\nIf you would like to contribute to this pipeline, please see the [contributing guidelines](.github/CONTRIBUTING.md).\n\n## Citations\n\nIf you use WEHI-SODA-Hub/sp_segment for your analysis, please cite it using the following doi: [10.5281/zenodo.17103183](https://doi.org/10.5281/zenodo.17103183)\n\n<!-- TODO nf-core: Add bibliography of tools and data used in your pipeline -->\n\nAn extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file.\n\nThis pipeline was created using the `nf-core` template. You can cite the `nf-core` publication as follows:\n\n> **The nf-core framework for community-curated bioinformatics pipelines.**\n>\n> Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.\n>\n> _Nat Biotechnol._ 2020 Feb 13. doi: [10.1038/s41587-020-0439-x](https://dx.doi.org/10.1038/s41587-020-0439-x).\n",
"hasPart": [
{
"@id": "main.nf"
},
{
"@id": "assets/"
},
{
"@id": "bin/"
},
{
"@id": "conf/"
},
{
"@id": "docs/"
},
{
"@id": "docs/images/"
},
{
"@id": "modules/"
},
{
"@id": "modules/local/"
},
{
"@id": "modules/nf-core/"
},
{
"@id": "workflows/"
},
{
"@id": "subworkflows/"
},
{
"@id": "nextflow.config"
},
{
"@id": "README.md"
},
{
"@id": "nextflow_schema.json"
},
{
"@id": "CHANGELOG.md"
},
{
"@id": "LICENSE"
},
{
"@id": "CODE_OF_CONDUCT.md"
},
{
"@id": "CITATIONS.md"
},
{
"@id": "modules.json"
},
{
"@id": "docs/usage.md"
},
{
"@id": "docs/output.md"
},
{
"@id": ".nf-core.yml"
},
{
"@id": ".pre-commit-config.yaml"
},
{
"@id": ".prettierignore"
}
],
"isBasedOn": "https://github.com/WEHI-SODA-Hub/sp_segment",
"license": "MIT",
"mainEntity": {
"@id": "main.nf"
},
"name": "WEHI-SODA-Hub/sp_segment"
},
{
"@id": "ro-crate-metadata.json",
"@type": "CreativeWork",
"about": {
"@id": "./"
},
"conformsTo": [
{
"@id": "https://w3id.org/ro/crate/1.1"
},
{
"@id": "https://w3id.org/workflowhub/workflow-ro-crate/1.0"
}
]
},
{
"@id": "main.nf",
"@type": [
"File",
"SoftwareSourceCode",
"ComputationalWorkflow"
],
"dateCreated": "",
"dateModified": "2025-08-06T12:29:12Z",
"dct:conformsTo": "https://bioschemas.org/profiles/ComputationalWorkflow/1.0-RELEASE/",
"keywords": [
"nf-core",
"nextflow"
],
"license": [
"MIT"
],
"name": [
"WEHI-SODA-Hub/sp_segment"
],
"programmingLanguage": {
"@id": "https://w3id.org/workflowhub/workflow-ro-crate#nextflow"
},
"sdPublisher": {
"@id": "https://nf-co.re/"
},
"url": [
"https://github.com/WEHI-SODA-Hub/sp_segment",
"https://nf-co.re/WEHI-SODA-Hub/sp_segment/dev/"
],
"version": [
"1.0.0dev"
]
},
{
"@id": "https://w3id.org/workflowhub/workflow-ro-crate#nextflow",
"@type": "ComputerLanguage",
"identifier": {
"@id": "https://www.nextflow.io/"
},
"name": "Nextflow",
"url": {
"@id": "https://www.nextflow.io/"
},
"version": "!>=24.04.2"
},
{
"@id": "assets/",
"@type": "Dataset",
"description": "Additional files"
},
{
"@id": "bin/",
"@type": "Dataset",
"description": "Scripts that must be callable from a pipeline process"
},
{
"@id": "conf/",
"@type": "Dataset",
"description": "Configuration files"
},
{
"@id": "docs/",
"@type": "Dataset",
"description": "Markdown files for documenting the pipeline"
},
{
"@id": "docs/images/",
"@type": "Dataset",
"description": "Images for the documentation files"
},
{
"@id": "modules/",
"@type": "Dataset",
"description": "Modules used by the pipeline"
},
{
"@id": "modules/local/",
"@type": "Dataset",
"description": "Pipeline-specific modules"
},
{
"@id": "modules/nf-core/",
"@type": "Dataset",
"description": "nf-core modules"
},
{
"@id": "workflows/",
"@type": "Dataset",
"description": "Main pipeline workflows to be executed in main.nf"
},
{
"@id": "subworkflows/",
"@type": "Dataset",
"description": "Smaller subworkflows"
},
{
"@id": "nextflow.config",
"@type": "File",
"description": "Main Nextflow configuration file"
},
{
"@id": "README.md",
"@type": "File",
"description": "Basic pipeline usage information"
},
{
"@id": "nextflow_schema.json",
"@type": "File",
"description": "JSON schema for pipeline parameter specification"
},
{
"@id": "CHANGELOG.md",
"@type": "File",
"description": "Information on changes made to the pipeline"
},
{
"@id": "LICENSE",
"@type": "File",
"description": "The license - should be MIT"
},
{
"@id": "CODE_OF_CONDUCT.md",
"@type": "File",
"description": "The nf-core code of conduct"
},
{
"@id": "CITATIONS.md",
"@type": "File",
"description": "Citations needed when using the pipeline"
},
{
"@id": "modules.json",
"@type": "File",
"description": "Version information for modules from nf-core/modules"
},
{
"@id": "docs/usage.md",
"@type": "File",
"description": "Usage documentation"
},
{
"@id": "docs/output.md",
"@type": "File",
"description": "Output documentation"
},
{
"@id": ".nf-core.yml",
"@type": "File",
"description": "nf-core configuration file, configuring template features and linting rules"
},
{
"@id": ".pre-commit-config.yaml",
"@type": "File",
"description": "Configuration file for pre-commit hooks"
},
{
"@id": ".prettierignore",
"@type": "File",
"description": "Ignore file for prettier"
},
{
"@id": "https://nf-co.re/",
"@type": "Organization",
"name": "nf-core",
"url": "https://nf-co.re/"
}
]
}