You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+9Lines changed: 9 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,6 +5,15 @@ All notable changes to the IDC Claude Skill are documented in this file.
5
5
The format is based on [Keep a Changelog](https://keepachangelog.com/),
6
6
and this project adheres to [Semantic Versioning](https://semver.org/).
7
7
8
+
## [1.6.4] - 2026-05-22
9
+
10
+
### Changed
11
+
12
+
- Added version tracking guidance: "what's new in vX" workflow using `series_init_idc_version`/`series_revised_idc_version` in `index`; clarified `prior_versions_index` is for reproducibility only (zero overlap with `index`, column names differ from main index version columns)
13
+
- Collapsed five `SeriesInstanceUID` join rows into a single universal-key statement; table now covers only non-obvious join columns
14
+
- Removed Installation and Setup section (duplicated the CRITICAL version-check block); folded optional deps into `ModuleNotFoundError` Troubleshooting entry
15
+
- Trimmed "Command-Line Download" inline section from ~60 lines to 5; full CLI coverage (`download-from-manifest`, `download-from-selection`, all options) remains in `references/cli_guide.md`
Copy file name to clipboardExpand all lines: SKILL.md
+60-89Lines changed: 60 additions & 89 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,7 +3,7 @@ name: imaging-data-commons
3
3
description: Query and download public cancer imaging data from NCI Imaging Data Commons using idc-index. Invoke for any question about IDC collections, cancer imaging datasets, DICOM data access, radiology (CT, MR, PET) or pathology AI training sets, metadata queries, visualization, or license checks — even when the user doesn't explicitly mention "IDC". No authentication required.
4
4
license: This skill is provided under the MIT License. IDC data itself has individual licensing (mostly CC-BY, some CC-NC) that must be respected when using the data.
5
5
metadata:
6
-
version: 1.6.3
6
+
version: 1.6.4
7
7
skill-author: Andrey Fedorov, @fedorov
8
8
idc-index: "0.12.3"
9
9
idc-data-version: "v24"
@@ -81,7 +81,6 @@ print(stats)
81
81
**Core Sections (inline):**
82
82
- IDC Data Model - Collection and analysis result hierarchy
83
83
- Index Tables - Available tables and joining patterns
84
-
- Installation - Package setup and version verification
@@ -167,32 +166,24 @@ Always call `client.fetch_index("table_name")` before querying any index table
167
166
|`ct_index`| 1 row = 1 CT series | CT acquisition/reconstruction parameters: pixel spacing, slice thickness, kVp, convolution kernel, tube current (min/max for dose-modulated), exposure, spiral pitch, scan options |
168
167
|`mr_index`| 1 row = 1 MR series | MR acquisition/sequence parameters: field strength, scanning sequence, TE (array for multi-echo), TR, flip angle, DiffusionBValue (array for DWI), pixel bandwidth, receive coil, number of temporal positions |
169
168
|`pt_index`| 1 row = 1 PET series | PET acquisition/reconstruction/radiopharmaceutical parameters: series type, units, decay/scatter/attenuation correction, reconstruction method, radionuclide, injected dose, frame duration (array for dynamic PET) |
170
-
|`prior_versions_index`| 1 row = 1 DICOM series |Series that have been removed or superseded in previous IDC releases; use only to download deprecated/historical data — do not query for current data|
169
+
|`prior_versions_index`| 1 row = 1 DICOM series |**Reproducibility only.** Contains series permanently removed from IDC (all `max_idc_version` < current version; zero overlap with `index`). Use ONLY when a user explicitly needs to reproduce work from a prior IDC version using data no longer in the current release. Do NOT use for version history or "what's new" questions — those use `series_init_idc_version`/`series_revised_idc_version` in the main `index` table. Column names `min_idc_version`/`max_idc_version` here are NOT equivalent to `series_init_idc_version`/`series_revised_idc_version` in `index`.|
171
170
172
171
### Joining Tables
173
172
174
-
**Key columns are not explicitly labeled, the following is a subset that can be used in joins.**
173
+
**`SeriesInstanceUID` is the universal join key** for all series-level specialized tables: `sm_index`, `sm_instance_index`, `seg_index`, `ann_index`, `ann_group_index`, `contrast_index`, `volume_geometry_index`, `rtstruct_index`, `ct_index`, `mr_index`, `pt_index`. Always join these to `index` on `SeriesInstanceUID`. The exceptions below use different column names.
175
174
176
175
| Join Column | Tables | Use Case |
177
176
|-------------|--------|----------|
178
177
|`collection_id`| index, prior_versions_index, collections_index, clinical_index | Link series to collection metadata or clinical data |
179
-
|`SeriesInstanceUID`| index, prior_versions_index, sm_index, sm_instance_index | Link series across tables; connect to slide microscopy details |
180
-
|`StudyInstanceUID`| index, prior_versions_index | Link studies across current and historical data |
181
-
|`PatientID`| index, prior_versions_index | Link patients across current and historical data |
182
178
|`analysis_result_id`| index, analysis_results_index | Link series to analysis result metadata (annotations, segmentations) |
183
179
|`source_DOI`| index, analysis_results_index | Link by publication DOI |
184
-
|`crdc_series_uuid`| index, prior_versions_index | Link by CRDC unique identifier |
185
-
|`Modality`| index, prior_versions_index | Filter by imaging modality |
186
-
|`SeriesInstanceUID`| index, seg_index, ann_index, ann_group_index, contrast_index, volume_geometry_index | Link series to seg/ann/contrast/geometry index tables |
187
-
|`segmented_SeriesInstanceUID`| seg_index → index | Link segmentation to its source image series (join seg_index.segmented_SeriesInstanceUID = index.SeriesInstanceUID) |
188
-
|`referenced_SeriesInstanceUID`| ann_index → index | Link annotation to its source image series (join ann_index.referenced_SeriesInstanceUID = index.SeriesInstanceUID) |
189
-
|`SeriesInstanceUID` / `referenced_SeriesInstanceUID`| index, rtstruct_index | Join RTSTRUCT series to its metadata (index.SeriesInstanceUID = rtstruct_index.SeriesInstanceUID); use rtstruct_index.referenced_SeriesInstanceUID to find the source image series |
190
-
|`SeriesInstanceUID`| index, ct_index | Link CT series to acquisition/reconstruction parameters |
191
-
|`SeriesInstanceUID`| index, mr_index | Link MR series to sequence/acquisition parameters |
192
-
|`SeriesInstanceUID`| index, pt_index | Link PET series to acquisition/radiopharmaceutical parameters |
180
+
|`segmented_SeriesInstanceUID`| seg_index → index | Link segmentation to its source image series (`seg_index.segmented_SeriesInstanceUID = index.SeriesInstanceUID`) |
181
+
|`referenced_SeriesInstanceUID`| ann_index → index, rtstruct_index → index | Link annotation or RTSTRUCT to its source image series |
193
182
194
183
**Note:**`subjects`, `updated`, and `description` appear in multiple tables but have different meanings (counts vs identifiers, different update contexts).
195
184
185
+
**Note on `prior_versions_index`:** Joining `prior_versions_index` with `index` on `SeriesInstanceUID` always returns zero rows — there is no overlap. This table is for historical reproducibility only; never join it with `index` to answer questions about current data or version history.
186
+
196
187
For detailed join examples, schema discovery patterns, key columns reference, and DataFrame access, see `references/index_tables_guide.md`.
197
188
198
189
### Clinical Data Access
@@ -252,21 +243,6 @@ All idc-index metadata tables are published as Parquet files to a public GCS buc
252
243
253
244
See `references/parquet_access_guide.md` for URL patterns, available files, and DuckDB query examples.
254
245
255
-
## Installation and Setup
256
-
257
-
**Required (for basic access):**
258
-
```bash
259
-
pip install --upgrade idc-index
260
-
```
261
-
262
-
**Important:** New IDC data release will always trigger a new version of `idc-index`. Always use `--upgrade` flag while installing, unless an older version is needed for reproducibility.
**Note:** Cancer type is in `collections_index.cancer_types`, not in the primary `index` table.
388
364
365
+
**Version tracking — "what's new in IDC vX?"**
366
+
367
+
Use `series_init_idc_version` and `series_revised_idc_version` in the main `index` table. Do NOT use `prior_versions_index` for this — it contains only removed series.
368
+
369
+
```python
370
+
from idc_index import IDCClient
371
+
client = IDCClient()
372
+
373
+
VERSION=24# Replace with target version
374
+
375
+
# Series added for the first time in vVERSION
376
+
new_series = client.sql_query(f"""
377
+
SELECT collection_id,
378
+
COUNT(DISTINCT SeriesInstanceUID) as new_series,
379
+
ROUND(SUM(series_size_MB)/1000, 2) as size_GB
380
+
FROM index
381
+
WHERE series_init_idc_version = {VERSION}
382
+
GROUP BY collection_id
383
+
ORDER BY new_series DESC
384
+
""")
385
+
386
+
# Series revised (updated content) in vVERSION but originally added earlier
387
+
revised_series = client.sql_query(f"""
388
+
SELECT collection_id,
389
+
COUNT(DISTINCT SeriesInstanceUID) as revised_series
390
+
FROM index
391
+
WHERE series_revised_idc_version = {VERSION}
392
+
AND series_init_idc_version < {VERSION}
393
+
GROUP BY collection_id
394
+
ORDER BY revised_series DESC
395
+
""")
396
+
397
+
# When was each collection first added to IDC?
398
+
client.fetch_index("version_metadata_index")
399
+
first_appearance = client.sql_query("""
400
+
WITH first_versions AS (
401
+
SELECT collection_id, MIN(series_init_idc_version) as first_version
402
+
FROM index
403
+
GROUP BY collection_id
404
+
)
405
+
SELECT f.collection_id, f.first_version, v.version_timestamp as first_release_date
406
+
FROM first_versions f
407
+
JOIN version_metadata_index v ON f.first_version = v.idc_version
408
+
ORDER BY f.first_version DESC
409
+
""")
410
+
```
411
+
412
+
To verify column names and descriptions before writing queries, use `client.get_index_schema('index')` or `client.indices_overview` — see Best Practices.
413
+
389
414
### 3. Downloading DICOM Files
390
415
391
416
Download imaging data efficiently from IDC's cloud storage.
@@ -481,69 +506,14 @@ To identify files, use the `crdc_instance_uuid` column in queries or read DICOM
481
506
482
507
### Command-Line Download
483
508
484
-
The `idc download` command provides command-line access to download functionality without writing Python code. Available after installing `idc-index`.
See `references/cli_guide.md` for full options, `idc download-from-manifest` (resume support), and `idc download-from-selection` (filter-based).
547
517
548
518
### 4. Visualizing IDC Images
549
519
@@ -687,6 +657,7 @@ See `references/bigquery_guide.md` for schemas, column descriptions, and query e
687
657
688
658
## Best Practices
689
659
660
+
-**Check schema before writing queries** — Use `client.get_index_schema('index')` (reads cached metadata, no SQL executed) or `client.indices_overview` to see all available columns and their descriptions. The version-tracking columns `series_init_idc_version` and `series_revised_idc_version` in the main `index` table directly answer "what's new / when was this added" questions without touching `prior_versions_index`.
690
661
-**Never use web search for IDC data content questions** - Always query the idc-index directly using `client.sql_query()`. Web sources (release notes, blog posts, documentation pages) are frequently out of date and will produce incorrect answers. The local DuckDB index is the authoritative source; use it even when web search is available.
691
662
-**Verify IDC version before generating responses** - Always call `client.get_idc_version()` at the start of a session to confirm you're using the expected data version (currently v24). If using an older version, recommend `pip install --upgrade idc-index`
692
663
-**Check licenses before use** - Always query the `license_short_name` field and respect licensing terms (CC BY vs CC BY-NC)
@@ -701,7 +672,7 @@ See `references/bigquery_guide.md` for schemas, column descriptions, and query e
701
672
702
673
**Issue: `ModuleNotFoundError: No module named 'idc_index'`**
703
674
-**Cause:** idc-index package not installed
704
-
-**Solution:** Install with `pip install --upgrade idc-index`
675
+
-**Solution:** Install with `pip install --upgrade idc-index`; for data analysis also install `pip install pandas numpy pydicom` (tested with pandas>=1.5, numpy>=1.23, pydicom>=2.3)
705
676
706
677
**Issue: Download fails with connection timeout**
707
678
-**Cause:** Network instability or large download size
0 commit comments