You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+27Lines changed: 27 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,6 +5,33 @@ All notable changes to the IDC Claude Skill are documented in this file.
5
5
The format is based on [Keep a Changelog](https://keepachangelog.com/),
6
6
and this project adheres to [Semantic Versioning](https://semver.org/).
7
7
8
+
## [1.6.0] - 2026-05-07
9
+
10
+
### Added
11
+
12
+
-`tests/test_bq_snippets.py`: BigQuery snippet validation using `bq query --dry_run` — 33 tests covering all SQL examples in `references/bigquery_guide.md` (dicom_all, original_collections_metadata, segmentations, quantitative_measurements, qualitative_measurements, private elements, and clinical tables); skips automatically when `bq` CLI is unavailable or unauthenticated
13
+
14
+
### Security
15
+
16
+
- Fixed auto-upgrade subprocess call to pin `idc-index` to `REQUIRED_VERSION` (was `"idc-index"`, now `f"idc-index=={REQUIRED_VERSION}"`), ensuring the installed version always matches the tested version declared in the frontmatter
17
+
- Added network access transparency note to Overview documenting expected external endpoints (GCS, S3, BigQuery, DICOMweb proxy, Google Healthcare API) and clarifying that no credentials or environment variables are accessed by the skill
18
+
- Added tested-with version comment to optional dependency install block (`pandas>=1.5, numpy>=1.23, pydicom>=2.3`)
19
+
20
+
### Changed
21
+
22
+
- Updated frontmatter description to be directive about skill triggering: now explicitly instructs invocation for IDC-related queries even without the word "IDC" in the prompt
23
+
- Extracted "Batch Processing and Filtering" (section 6) from SKILL.md to `references/use_cases.md` (Use Case 5); replaced inline code block with a 2-sentence summary and pointer
24
+
- Extracted "Integration with Analysis Pipelines" (section 9) from SKILL.md to `references/use_cases.md` (Use Case 6); replaced inline pydicom/SimpleITK code blocks with a 2-sentence summary and pointer
25
+
- SKILL.md reduced from 865 → 775 lines (−90 lines); `references/use_cases.md` expanded from 187 → 278 lines
26
+
- Updated to idc-index 0.12.1 (idc-index-data 24.0.4, IDC data version v24)
27
+
- IDC v24 adds 15 new collections (161 → 176), ~39K new series, ~4 TB new data (99.27 TB total, 85,682 cases)
description: Query and download public cancer imaging data from NCI Imaging Data Commons using idc-index. Use for accessing large-scale radiology (CT, MR, PET) and pathology datasets for AI training or research. No authentication required. Query by metadata, visualize in browser, check licenses.
3
+
description: Query and download public cancer imaging data from NCI Imaging Data Commons using idc-index. Invoke for any question about IDC collections, cancer imaging datasets, DICOM data access, radiology (CT, MR, PET) or pathology AI training sets, metadata queries, visualization, or license checks — even when the user doesn't explicitly mention "IDC". No authentication required.
4
4
license: This skill is provided under the MIT License. IDC data itself has individual licensing (mostly CC-BY, some CC-NC) that must be respected when using the data.
Use the `idc-index` Python package to query and download public cancer imaging data from the National Cancer Institute Imaging Data Commons (IDC). No authentication required for data access.
18
18
19
-
**Current IDC Data Version: v23** (always verify with `IDCClient().get_idc_version()`)
19
+
**Expected network access:**`idc-index` queries a local DuckDB index (no network for metadata). File downloads use public GCS (`storage.googleapis.com`) and AWS S3 (`s3.amazonaws.com`) — no authentication required. DICOMweb access uses either the public IDC proxy (`proxy.imaging.datacommons.cancer.gov`, no auth) or the Google Cloud Healthcare API (`healthcare.googleapis.com`, requires GCP authentication). Optional BigQuery queries (`bigquery.googleapis.com`) also require GCP authentication. No credentials or environment variables are accessed by this skill.
20
+
21
+
**Current IDC Data Version: v24** (always verify with `IDCClient().get_idc_version()`)
**Important:** New IDC data release will always trigger a new version of `idc-index`. Always use `--upgrade` flag while installing, unless an older version is needed for reproducibility.
237
239
238
-
**IMPORTANT:** IDC data version v23 is current. Always verify your version:
240
+
**IMPORTANT:** IDC data version v24 is current. Always verify your version:
239
241
```python
240
-
print(client.get_idc_version()) # Should return "v23"
242
+
print(client.get_idc_version()) # Should return "v24"
241
243
```
242
244
If you see an older version, upgrade with: `pip install --upgrade idc-index`
243
245
244
-
**Tested with:** idc-index 0.11.14 (IDC data version v23)
246
+
**Tested with:** idc-index 0.12.1 (IDC data version v24)
Process large datasets efficiently with filtering:
610
+
For large downloads, query first to build a manifest, save it to CSV for reproducibility, then iterate over slices of the result DataFrame with `download_from_selection()` using a `batch_size` of 10–20 series to avoid timeouts.
volume = np.stack([s.pixel_array for s in slices])
653
+
After downloading DICOM files, use `pydicom` to read individual files or build 3D numpy arrays sorted by `ImagePositionPatient`. For a more robust reader with automatic series sorting and ITK image output, use `SimpleITK.ImageSeriesReader`.
720
654
721
-
return volume, slices[0] # Return volume and first slice for metadata
See `references/use_cases.md` (Use Case 6) for code examples reading DICOM with pydicom, building 3D CT volumes, and integrating with SimpleITK.
745
656
746
657
## Common Use Cases
747
658
@@ -753,7 +664,7 @@ See `references/use_cases.md` for complete end-to-end workflow examples includin
753
664
754
665
## Best Practices
755
666
756
-
-**Verify IDC version before generating responses** - Always call `client.get_idc_version()` at the start of a session to confirm you're using the expected data version (currently v23). If using an older version, recommend `pip install --upgrade idc-index`
667
+
-**Verify IDC version before generating responses** - Always call `client.get_idc_version()` at the start of a session to confirm you're using the expected data version (currently v24). If using an older version, recommend `pip install --upgrade idc-index`
757
668
-**Check licenses before use** - Always query the `license_short_name` field and respect licensing terms (CC BY vs CC BY-NC)
758
669
-**Generate citations for attribution** - Use `citations_from_selection()` to get properly formatted citations from `source_DOI` values; include these in publications
759
670
-**Start with small queries** - Use `LIMIT` clause when exploring to avoid long downloads and understand data structure
Copy file name to clipboardExpand all lines: references/bigquery_guide.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# BigQuery Guide for IDC
2
2
3
-
**Tested with:** IDC data version v23
3
+
**Tested with:**idc-index 0.12.1 (IDC data version v24)
4
4
5
5
For most queries and downloads, use `idc-index` (see main SKILL.md). This guide covers BigQuery for advanced use cases requiring full DICOM metadata or complex joins.
Copy file name to clipboardExpand all lines: references/clinical_data_guide.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# Clinical Data Guide for IDC
2
2
3
-
**Tested with:** idc-index 0.11.7 (IDC data version v23)
3
+
**Tested with:** idc-index 0.12.1 (IDC data version v24)
4
4
5
5
Clinical data (demographics, diagnoses, therapies, lab tests, staging) accompanies many IDC imaging collections. This guide covers how to discover, access, and integrate clinical data with imaging data using `idc-index`.
Copy file name to clipboardExpand all lines: references/digital_pathology_guide.md
+6-6Lines changed: 6 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# Digital Pathology Guide for IDC
2
2
3
-
**Tested with:**IDC data version v23, idc-index 0.11.10
3
+
**Tested with:** idc-index 0.12.1 (IDC data version v24)
4
4
5
5
For general IDC queries and downloads, use `idc-index` (see main SKILL.md). This guide covers slide microscopy (SM) imaging, microscopy bulk simple annotations (ANN), and segmentations (SEG) in the context of digital pathology in IDC.
6
6
@@ -251,12 +251,12 @@ client.sql_query("""
251
251
SELECT
252
252
ar.analysis_result_id,
253
253
ar.analysis_result_title,
254
-
ar.Modalities,
255
-
ar.Subjects,
256
-
ar.Collections
254
+
ar.modalities,
255
+
ar.subjects,
256
+
ar.collections
257
257
FROM analysis_results_index ar
258
-
WHERE ar.Modalities LIKE '%ANN%' OR ar.Modalities LIKE '%SEG%'
259
-
ORDER BY ar.Subjects DESC
258
+
WHERE ar.modalities LIKE '%ANN%' OR ar.modalities LIKE '%SEG%'
Copy file name to clipboardExpand all lines: references/index_tables_guide.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# Index Tables Guide for IDC
2
2
3
-
**Tested with:** idc-index 0.11.14 (IDC data version v23)
3
+
**Tested with:** idc-index 0.12.1 (IDC data version v24)
4
4
5
5
This guide covers the structure and access patterns for IDC index tables: programmatic schema discovery, DataFrame access, and join column references. For the overview of available tables and their purposes, see the "Index Tables" section in the main SKILL.md.
6
6
@@ -34,7 +34,7 @@ results = client.sql_query("SELECT * FROM index WHERE Modality = 'CT' LIMIT 10")
34
34
35
35
# Fetch and query additional indices
36
36
client.fetch_index("collections_index")
37
-
collections = client.sql_query("SELECT collection_id, CancerTypes, TumorLocations FROM collections_index")
37
+
collections = client.sql_query("SELECT collection_id, cancer_types, tumor_locations FROM collections_index")
38
38
39
39
client.fetch_index("analysis_results_index")
40
40
analysis = client.sql_query("SELECT * FROM analysis_results_index LIMIT 5")
Copy file name to clipboardExpand all lines: references/sql_patterns.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# SQL Query Patterns for IDC
2
2
3
-
**Tested with:** idc-index 0.11.14 (IDC data version v23)
3
+
**Tested with:** idc-index 0.12.1 (IDC data version v24)
4
4
5
5
Quick reference for common SQL query patterns when working with IDC data. For detailed examples with context, see the "Core Capabilities" section in the main SKILL.md.
6
6
@@ -74,7 +74,7 @@ client.sql_query("""
74
74
# List analysis result collections (curated derived datasets)
0 commit comments