You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+21Lines changed: 21 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,6 +5,27 @@ All notable changes to the IDC Claude Skill are documented in this file.
5
5
The format is based on [Keep a Changelog](https://keepachangelog.com/),
6
6
and this project adheres to [Semantic Versioning](https://semver.org/).
7
7
8
+
## [1.6.3] - 2026-05-09
9
+
10
+
### Added
11
+
12
+
-`ct_index`, `mr_index`, `pt_index` tables (idc-index 0.12.3 / idc-index-data 24.2.0): modality-specific acquisition and reconstruction parameter indices, one row per series, all joining on `SeriesInstanceUID`
-`mr_index` (22 columns): field strength, scanning sequence, TE (array for multi-echo), TR, flip angle, DiffusionBValue (array for DWI), pixel bandwidth, receive coil, number of temporal positions
15
+
-`pt_index` (21 columns): radionuclide, injected dose, reconstruction method, decay/scatter/attenuation correction, frame duration (array for dynamic PET), number of time slices
16
+
- SQL query patterns for all three new tables in `references/sql_patterns.md`
17
+
- Join column entries for `ct_index`, `mr_index`, `pt_index` in `references/index_tables_guide.md` and SKILL.md
18
+
- Parquet file entries for `ct_index.parquet`, `mr_index.parquet`, `pt_index.parquet` in `references/parquet_access_guide.md`
19
+
20
+
### Changed
21
+
22
+
- Added concrete `indices_overview` code example showing how to search for a column across all tables and read column schemas without fetching the table; directly addresses the failure mode where agents query `index` for modality-specific parameters (SliceThickness, KVP, etc.) instead of using `ct_index`/`mr_index`/`pt_index`
23
+
- Added troubleshooting entry "Column not found in `index` table" with a working `indices_overview` search snippet and join example, covering common acquisition/reconstruction parameters that live in the modality-specific index tables
24
+
- Updated idc-index reference to 0.12.3
25
+
- Clarified `download_from_selection` API: added explicit warning that it takes filter keyword arguments (not a DataFrame), comparison table vs `download_dicom_series` (which has a different first-argument order), and restructured the download example as a step-by-step query → extract UIDs → pass list flow
26
+
- Documented `download_dicom_series` as an alternative download method with its own signature (`seriesInstanceUID` as first arg, then `downloadDir`)
27
+
- Reduced redundancy and duplication in SKILL.md for cleaner reading
Copy file name to clipboardExpand all lines: SKILL.md
+87-74Lines changed: 87 additions & 74 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,9 +3,9 @@ name: imaging-data-commons
3
3
description: Query and download public cancer imaging data from NCI Imaging Data Commons using idc-index. Invoke for any question about IDC collections, cancer imaging datasets, DICOM data access, radiology (CT, MR, PET) or pathology AI training sets, metadata queries, visualization, or license checks — even when the user doesn't explicitly mention "IDC". No authentication required.
4
4
license: This skill is provided under the MIT License. IDC data itself has individual licensing (mostly CC-BY, some CC-NC) that must be respected when using the data.
|`clinical_data_guide.md`| Clinical/tabular data, imaging+clinical joins, value mapping |
97
97
|`cloud_storage_guide.md`| Direct S3/GCS access, versioning, UUID mapping |
@@ -126,6 +126,25 @@ The `idc-index` package provides multiple metadata index tables, accessible via
126
126
127
127
**Important:** Use `client.indices_overview` to get current table descriptions and column schemas. This is the authoritative source for available columns and their types — always query it when writing SQL or exploring data structure.
128
128
129
+
```python
130
+
from idc_index import IDCClient
131
+
132
+
client = IDCClient()
133
+
134
+
# Find which table(s) contain a specific column (no fetch required)
135
+
target ="SliceThickness"
136
+
for table_name, info in client.indices_overview.items():
137
+
ifany(c["name"] == target for c in info["schema"]["columns"]):
138
+
print(f"'{target}' is in: {table_name}")
139
+
# → 'SliceThickness' is in: ct_index
140
+
141
+
# List all columns in a table from the schema (no fetch required)
142
+
ct_cols = [c["name"] for c in client.indices_overview["ct_index"]["schema"]["columns"]]
Always call `client.fetch_index("table_name")` before querying any index table — it is safe and idempotent for all tables, including those loaded automatically at startup.
@@ -145,6 +164,9 @@ Always call `client.fetch_index("table_name")` before querying any index table
145
164
|`contrast_index`| 1 row = 1 series with contrast info | Contrast agent metadata: agent name, ingredient, administration route (CT, MR, PT, XA, RF) |
146
165
|`volume_geometry_index`| 1 row = 1 CT/MR/PT series | 3D volume geometry validation for single-frame CT, MR, and PT series; boolean checks for orientation, spacing, dimensions, and slice positions; composite `regularly_spaced_3d_volume` flag |
147
166
|`rtstruct_index`| 1 row = 1 RTSTRUCT series | RT Structure Set metadata: total ROI count, ROI names, generation algorithms, interpreted types, and the referenced image series UID |
167
+
|`ct_index`| 1 row = 1 CT series | CT acquisition/reconstruction parameters: pixel spacing, slice thickness, kVp, convolution kernel, tube current (min/max for dose-modulated), exposure, spiral pitch, scan options |
168
+
|`mr_index`| 1 row = 1 MR series | MR acquisition/sequence parameters: field strength, scanning sequence, TE (array for multi-echo), TR, flip angle, DiffusionBValue (array for DWI), pixel bandwidth, receive coil, number of temporal positions |
169
+
|`pt_index`| 1 row = 1 PET series | PET acquisition/reconstruction/radiopharmaceutical parameters: series type, units, decay/scatter/attenuation correction, reconstruction method, radionuclide, injected dose, frame duration (array for dynamic PET) |
148
170
|`prior_versions_index`| 1 row = 1 DICOM series | Series that have been removed or superseded in previous IDC releases; use only to download deprecated/historical data — do not query for current data |
149
171
150
172
### Joining Tables
@@ -161,11 +183,13 @@ Always call `client.fetch_index("table_name")` before querying any index table
161
183
|`source_DOI`| index, analysis_results_index | Link by publication DOI |
162
184
|`crdc_series_uuid`| index, prior_versions_index | Link by CRDC unique identifier |
163
185
|`Modality`| index, prior_versions_index | Filter by imaging modality |
164
-
|`SeriesInstanceUID`| index, seg_index, ann_index, ann_group_index, contrast_index| Link segmentation/annotation/contrast series to its index metadata|
186
+
|`SeriesInstanceUID`| index, seg_index, ann_index, ann_group_index, contrast_index, volume_geometry_index | Link series to seg/ann/contrast/geometry index tables|
165
187
|`segmented_SeriesInstanceUID`| seg_index → index | Link segmentation to its source image series (join seg_index.segmented_SeriesInstanceUID = index.SeriesInstanceUID) |
166
188
|`referenced_SeriesInstanceUID`| ann_index → index | Link annotation to its source image series (join ann_index.referenced_SeriesInstanceUID = index.SeriesInstanceUID) |
167
-
|`SeriesInstanceUID`| index, volume_geometry_index | Link series to its 3D geometry validation result (join index.SeriesInstanceUID = volume_geometry_index.SeriesInstanceUID) |
168
189
|`SeriesInstanceUID` / `referenced_SeriesInstanceUID`| index, rtstruct_index | Join RTSTRUCT series to its metadata (index.SeriesInstanceUID = rtstruct_index.SeriesInstanceUID); use rtstruct_index.referenced_SeriesInstanceUID to find the source image series |
190
+
|`SeriesInstanceUID`| index, ct_index | Link CT series to acquisition/reconstruction parameters |
191
+
|`SeriesInstanceUID`| index, mr_index | Link MR series to sequence/acquisition parameters |
192
+
|`SeriesInstanceUID`| index, pt_index | Link PET series to acquisition/radiopharmaceutical parameters |
169
193
170
194
**Note:**`subjects`, `updated`, and `description` appear in multiple tables but have different meanings (counts vs identifiers, different update contexts).
**Important:** New IDC data release will always trigger a new version of `idc-index`. Always use `--upgrade` flag while installing, unless an older version is needed for reproducibility.
239
263
240
-
**IMPORTANT:** IDC data version v24 is current. Always verify your version:
241
-
```python
242
-
print(client.get_idc_version()) # Should return "v24"
243
-
```
244
-
If you see an older version, upgrade with: `pip install --upgrade idc-index`
245
-
246
-
**Tested with:** idc-index 0.12.2 (IDC data version v24)
Download imaging data efficiently from IDC's cloud storage:
391
+
Download imaging data efficiently from IDC's cloud storage.
392
+
393
+
**IMPORTANT — two download methods with different signatures:**
394
+
395
+
| Method | First arg | Second arg | Use when |
396
+
|--------|-----------|------------|----------|
397
+
|`download_from_selection`|`downloadDir` (required) | filter kwargs (optional) | Filtering by collection, patient, study, or series |
398
+
|`download_dicom_series`|`seriesInstanceUID` (required) |`downloadDir` (required) | Downloading specific series by UID only |
399
+
400
+
**`download_from_selection` takes filter keyword arguments, NOT a DataFrame.** The name "from_selection" refers to filtering the IDC index by criteria — not accepting a pandas DataFrame. To download the results of a query, extract UIDs from the DataFrame and pass them as a list.
376
401
377
402
**Download entire collection:**
378
403
```python
@@ -381,15 +406,16 @@ from idc_index import IDCClient
381
406
client = IDCClient()
382
407
383
408
# Download small collection (RIDER Pilot ~1GB)
409
+
# downloadDir is the FIRST positional argument
384
410
client.download_from_selection(
385
-
collection_id="rider_pilot",
386
-
downloadDir="./data/rider"
411
+
downloadDir="./data/rider",
412
+
collection_id="rider_pilot"
387
413
)
388
414
```
389
415
390
-
**Download specific series:**
416
+
**Download specific series (from a query result):**
**Best practice:** When publishing results using IDC data, include the generated citations to properly attribute the data sources and satisfy license requirements.
608
650
609
-
### 6. Batch Processing and Filtering
610
-
611
-
For large downloads, query first to build a manifest, save it to CSV for reproducibility, then iterate over slices of the result DataFrame with `download_from_selection()` using a `batch_size` of 10–20 series to avoid timeouts.
612
-
613
-
See `references/use_cases.md` (Use Case 5) for a complete worked example with manufacturer filtering, manifest saving, and batched downloads.
614
-
615
-
### 7. Advanced Queries with BigQuery
651
+
### 6. Advanced Queries with BigQuery
616
652
617
653
For queries requiring full DICOM metadata, complex JOINs, clinical data tables, or private DICOM elements, use Google BigQuery. Requires GCP account with billing enabled.
See `references/bigquery_guide.md` for schemas, column descriptions, and query examples for these tables.
640
676
641
-
### 8. Tool Selection Guide
677
+
### 7. Tool Selection Guide
642
678
643
679
| Task | Tool | Reference |
644
680
|------|------|-----------|
@@ -649,20 +685,6 @@ See `references/bigquery_guide.md` for schemas, column descriptions, and query e
649
685
650
686
**Default choice:** Use `idc-index` for most tasks (no auth, easy API, batch downloads).
651
687
652
-
### 9. Integration with Analysis Pipelines
653
-
654
-
After downloading DICOM files, use `pydicom` to read individual files or build 3D numpy arrays sorted by `ImagePositionPatient`. For a more robust reader with automatic series sorting and ITK image output, use `SimpleITK.ImageSeriesReader`.
655
-
656
-
See `references/use_cases.md` (Use Case 6) for code examples reading DICOM with pydicom, building 3D CT volumes, and integrating with SimpleITK.
657
-
658
-
## Common Use Cases
659
-
660
-
See `references/use_cases.md` for complete end-to-end workflow examples including:
661
-
- Building deep learning training datasets from lung CT scans
662
-
- Comparing image quality across scanner manufacturers
663
-
- Previewing data in browser before downloading
664
-
- License-aware batch downloads for commercial use
665
-
666
688
## Best Practices
667
689
668
690
-**Never use web search for IDC data content questions** - Always query the idc-index directly using `client.sql_query()`. Web sources (release notes, blog posts, documentation pages) are frequently out of date and will produce incorrect answers. The local DuckDB index is the authoritative source; use it even when web search is available.
@@ -700,6 +722,25 @@ See `references/use_cases.md` for complete end-to-end workflow examples includin
700
722
- Use `LIMIT 5` to test query first
701
723
- Check field names against metadata schema documentation
702
724
725
+
**Issue: Column not found in `index` table (e.g., `SliceThickness`, `PixelSpacing`, `KVP`, `EchoTime`, `InjectedDose`)**
726
+
-**Cause:** The `index` table contains series-level metadata only; modality-specific acquisition and reconstruction parameters live in dedicated tables (`ct_index`, `mr_index`, `pt_index`)
727
+
-**Solution:** Search `client.indices_overview` to find the right table, then fetch and join on `SeriesInstanceUID`:
728
+
```python
729
+
target ="SliceThickness"
730
+
for table_name, info in client.indices_overview.items():
731
+
ifany(c["name"] == target for c in info["schema"]["columns"]):
-**Cause:** Corrupted download or incompatible viewer
705
746
-**Solution:**
@@ -718,38 +759,10 @@ See `references/sql_patterns.md` for quick-reference SQL patterns including:
718
759
- Download size estimation
719
760
- Clinical data linking
720
761
721
-
For segmentation and annotation details, also see `references/digital_pathology_guide.md`.
722
-
723
-
## Related Skills
724
-
725
-
The following skills complement IDC workflows for downstream analysis and visualization:
726
-
727
-
### DICOM Processing
728
-
-**pydicom** - Read, write, and manipulate downloaded DICOM files. Use for extracting pixel data, reading metadata, anonymization, and format conversion. Essential for working with IDC radiology data (CT, MR, PET).
729
-
730
-
### Pathology and Slide Microscopy
731
-
See `references/digital_pathology_guide.md` for DICOM-compatible tools (highdicom, wsidicom, TIA-Toolbox, Slim viewer).
732
-
733
-
### Metadata Visualization
734
-
-**matplotlib** - Low-level plotting for full customization. Use for creating static figures summarizing IDC query results (bar charts of modalities, histograms of series counts, etc.).
735
-
-**seaborn** - Statistical visualization with pandas integration. Use for quick exploration of IDC metadata distributions, relationships between variables, and categorical comparisons with attractive defaults.
736
-
-**plotly** - Interactive visualization. Use when you need hover info, zoom, and pan for exploring IDC metadata, or for creating web-embeddable dashboards of collection statistics.
737
-
738
-
### Data Exploration
739
-
-**exploratory-data-analysis** - Comprehensive EDA on scientific data files. Use after downloading IDC data to understand file structure, quality, and characteristics before analysis.
762
+
For digital pathology related see `references/digital_pathology_guide.md`.
740
763
741
764
## Resources
742
765
743
-
### Schema Reference (Primary Source)
744
-
745
-
**Always use `client.indices_overview` for current column schemas.** This ensures accuracy with the installed idc-index version:
0 commit comments