Skip to content

Add imaging studies for all TCGA Pancan separated by modality#2260

Open
rmadupuri wants to merge 5 commits into
masterfrom
pancan_idc
Open

Add imaging studies for all TCGA Pancan separated by modality#2260
rmadupuri wants to merge 5 commits into
masterfrom
pancan_idc

Conversation

@rmadupuri
Copy link
Copy Markdown
Collaborator

@rmadupuri rmadupuri commented Jan 22, 2026

What?

Enhancement to the previous PR #2015

  • Added multi-modal imaging data for TCGA Pan Cancer studies, organized into separate tabs.
  • Enables filtering of imaging resources by modality.

Details on how the data is generated:
https://github.com/cBioPortal/datahub/blob/2615fc6c90fb0e30994d43a09c13d1f2abb81122/docs/tcga_pan_can_atlas/ohif-viewer.md

Available Imaging Modalities:

Code Modality Viewer Resource ID Resource Tab Name
CR Computed Radiography OHIF IDC_OHIF_CR Computed Radiography
CT Computed Tomography OHIF IDC_OHIF_CT CT Scan
DX Digital Radiography OHIF IDC_OHIF_DX Digital Radiography
MG Mammography OHIF IDC_OHIF_MG Mammography
MR Magnetic Resonance OHIF IDC_OHIF_MR Magnetic Resonance
NM Nuclear Medicine OHIF IDC_OHIF_NM Nuclear Medicine
PT Positron Emission Tomography OHIF IDC_OHIF_PT PET Scan
SM Slide Microscopy (H&E) SLIM IDC_SLIM H&E Slide

Testing:

  1. BLCA pancan : https://triage.cbioportal.mskcc.org/study/summary?id=blca_tcga_pan_can_test
  2. LIHC pancan : https://triage.cbioportal.mskcc.org/study/summary?id=lihc_tcga_pan_can_atlas_2018
  3. BRCA pancan : https://triage.cbioportal.mskcc.org/study/summary?id=brca_pancan_test1

H&E Slide using SLIM viewer:
Screenshot 2026-01-23 at 13 57 27

CT Scan using OHIF viewer:
Screenshot 2026-01-23 at 14 36 58

Copy link
Copy Markdown
Contributor

@dippindots dippindots left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

```bash
for f in $(cut -f2 ~/Downloads/idc_ohif.tsv | gsort | uniq | grep tcga_ | grep -v Filters | grep -v coad | grep -v read); do (head -1 coadread_tcga_pan_can_atlas_2018/data_resource_patient.txt; cut -f1,2,4 ~/Downloads/idc_ohif.tsv | tail -n +9 | grep $f | cut -f1,3 | awk -vFS='\t' -vOFS='\t' '{$1=substr($1,0,12); $3="https://viewer.imaging.datacommons.cancer.gov/viewer/"$2; $2="IDC_OHIF_V2"; print $0}' | gsort -k1,1 | uniq | rev | uniq -f2 | rev; ) > ${f/tcga_/}_tcga_pan_can_atlas_2018/*data_resource*patient*; done
```
**Note**: While a single study may contain multiple imaging modalities, data was extracted at the study level and then split by modality for cBioPortal integration. This modality-level organization allows users to selectively access specific imaging types (e.g., CT scans vs. H&E slides) for each patient.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The URL patterns you provide at the bottom of this document are at the study level, so this paragraph is confusing.

1. Downloading TCGA imaging metadata from IDC using the idc-index package
2. Linking patients to their imaging studies via OHIF and SLIM viewer URLs

**Script**: [generate_imaging_resources.py](https://github.com/cBioPortal/datahub-study-curation-tools/tree/master/generate_imaging_resources)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The script referenced uses custom code to generate viewer URLs. I recommend you use this function in IDCClient: https://idc-index.readthedocs.io/en/latest/api/idc_index.html#idc_index.index.IDCClient.get_viewer_URL. Note that your URLs point to the legacy OHIF Viewer v2, which will be deprecated. You should instead use OHIF Viewer v3. The function will generate OHIF v3 viewer links for radiology studies.

### Generate Resource Files

Patient-level resource files are generated for each TCGA Pan Cancer study by:
1. Downloading TCGA imaging metadata from IDC using the idc-index package
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should keep track of the version of IDC data you use for generating those links. You can get it with https://idc-index.readthedocs.io/en/latest/api/idc_index.html#idc_index.index.IDCClient.get_idc_version. It would also be best if you updated your links after each IDC release. Although data removal is rare, it can happen, and would lead to broken viewer links.

@fedorov
Copy link
Copy Markdown

fedorov commented Feb 26, 2026

I added custom Parquet to enable mapping between GDC and IDC in this PR ImagingDataCommons/idc-index-data#111, the resulting file is available here for the latest release: https://storage.googleapis.com/idc-index-data-artifacts/current/release_artifacts/gdc_idc_mapping.parquet and in https://storage.googleapis.com/idc-index-data-artifacts/23.6.1/release_artifacts/gdc_idc_mapping.parquet for the versioned release. I can modify CORS to allow your dev/prod URL to access the file. I think this might be the most optimal approach for integration. We can customize the content of the Parquet for cBioPortal as needed.

@rmadupuri
Copy link
Copy Markdown
Collaborator Author

Thanks @fedorov, will look into it.

@n1zea144 n1zea144 requested review from n1zea144 and removed request for n1zea144 March 18, 2026 16:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants