Skip to content

Commit 257a2e2

Browse files
committed
Add info on collections to the README
1 parent 91d74fd commit 257a2e2

2 files changed

Lines changed: 24 additions & 7 deletions

File tree

README.md

Lines changed: 23 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -79,12 +79,29 @@ You can see the options by running `--help` when calling the submodule. E.g.:
7979
python -m idc_annotation_conversion.pan_cancer_nuclei_seg --help
8080
```
8181

82-
### Modules
82+
### Collection Details
8383

8484
The following modules are currently available:
8585

86-
- `pan_cancer_nuclei_seg`: Conversion of Pan Cancer Nuclei segmentations from
87-
XML to ANN and SEGs for various TCGA collections.
88-
- `rms`: Conversion of annotations related to the "RMS-Mutation-Prediction"
89-
collection. Specifically conversion of hand annotated regions to SR, and
90-
ML generated segmentations to SEG.
86+
- `pan_cancer_nuclei_seg`: This module implements conversion of Conversion of Pan Cancer
87+
Nuclei Segmentations for several collections within TCGA. The original data are supplied
88+
in a non-standard CSV format giving the image coordinates points on the contours of
89+
nuclei as segmented by a deep-learning based segmentation model. These data were previously released
90+
[here](https://www.cancerimagingarchive.net/analysis-result/pan-cancer-nuclei-seg/) as part of
91+
The Cancer Imaging Archive. These coordinates are converted to DICOM Microscopy Bulk Simple
92+
Annotation objects, and in addition, the contours are converted to masks and stored as
93+
a pyramid of binary DICOM Segmentation objects. Since this "raster conversion" takes place at the
94+
highest resolution, this process is very slow and memory intensive.
95+
96+
- `rms`: Conversion of annotations related to the rhabdomyosarcoma mutation prediction project from the Frederick National Laboratory.
97+
Both hand annotated regions (used as training data in the project) and model-generated prediction results are available.
98+
Hand annotated regions are provided as ImageScope format XML annotations and are converted to DICOM Structured Report objects with the `convert-xml-annotations` sub-command.
99+
Model-generated prediction results as probabilistic segmentation maps are provided as serialized NumPy arrays (`.npy` files) and converted to both binary and fractional DICOM Segmentation objects with the `convert-segmentations` sub-command.
100+
101+
- `tcga_til_maps`: There are two related collections here, both containing patch-level maps of tumor-infiltrating lymphocytes (TILs) predicted by a neural network for several collectsions within TCGA. The two collections correspond to two different versions of the model, published in 2018 and 2022 by the same lab at Stony Brook University. Their conversion routines are implemented as two separate sub-commands within this module.
102+
103+
The 2018 set covers a smaller subset of the TCGA collections. The algorithm is published in [this paper](https://www.cell.com/cell-reports/pdf/S2211-1247(18)30447-9.pdf) and the source files are available [here](https://stonybrookmedicine.app.box.com/v/cellreportspaper). The collection was also described by TCIA on [this page](https://www.cancerimagingarchive.net/analysis-result/til-wsi-tcga/). The files are supplied as low-resolution PNG images, where each pixel in the PNG corresponds to a 50 micron patch in the original slide and the pixel value indicates the presence of TILs within the patch. The `convert-2018` command converts these to binary DICOM segmentation objects.
104+
105+
The 2022 set covers a wider range of TCGA images and additionally has probabilistic segmentations (before thresholding) available in addition to binarized versions. This algorithm is described in [this paper](https://www.frontiersin.org/journals/oncology/articles/10.3389/fonc.2021.806603/full) and the source data are available [here](https://stonybrookmedicine.box.com/v/til-results-new-model). These are supplied as a non-standard text file containing a list of patch coordinates and associated binary or probabilistic pixel values. The `convert-2022` command coverts these to pixel arrays and stores them as DICOM Segmentation objects, giving one binary and one fractional (probabilistic) segmentation object for each slide.
106+
107+
- `gbm_transcriptional_subtypes`: This modules relates to a collection of results from [this paper](https://www.nature.com/articles/s41467-023-39933-0) from Stanford University on transcriptional subtypes within glioblastoma. There are two data types of interest here: transcriptional subtype maps classifying an image patch into a set of transcriptional subtypes, and aggressiveness maps giving the aggressiveness of each image patch. While the conversion process for both, only the aggressiveness maps have been released at this time. The source data are not publicly available elsewhere. The aggressiveness maps are supplied as arrays of image coordinates and corresponding aggressiveness scores (between 0 and 1) within an h5 format file, with one aggressiveness score for an entire image patch. These are converted to DICOM Parametric Map objects using the `convert-aggressiveness-maps` sub-command of this module.

src/idc_annotation_conversion/gbm_transcriptional_subtypes/__main__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ def run_subtype_map_blob(
4040
store_wsi_dicom: bool = False,
4141
output_bucket: str | None = None,
4242
) -> str | None:
43-
"""Convert a single PNG blob for the 2018 TIL Maps.
43+
"""Convert a single transcriptional subtype map blob.
4444
4545
Parameters
4646
----------

0 commit comments

Comments
 (0)