Skip to content

Commit 5074ab2

Browse files
fedorovclaude
andcommitted
Improve version tracking guidance and reduce redundancy (v1.6.4)
- Add "what's new in vX" workflow using series_init/revised_idc_version - Clarify prior_versions_index is reproducibility-only, never for version history - Collapse SeriesInstanceUID join rows into a universal-key statement - Remove Installation section (duplicated CRITICAL block at top) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent bd31529 commit 5074ab2

2 files changed

Lines changed: 66 additions & 32 deletions

File tree

CHANGELOG.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,14 @@ All notable changes to the IDC Claude Skill are documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com/),
66
and this project adheres to [Semantic Versioning](https://semver.org/).
77

8+
## [1.6.4] - 2026-05-22
9+
10+
### Changed
11+
12+
- Added version tracking guidance: "what's new in vX" workflow using `series_init_idc_version`/`series_revised_idc_version` in `index`; clarified `prior_versions_index` is for reproducibility only (zero overlap with `index`, column names differ from main index version columns)
13+
- Collapsed five `SeriesInstanceUID` join rows into a single universal-key statement; table now covers only non-obvious join columns
14+
- Removed Installation and Setup section (duplicated the CRITICAL version-check block); folded optional deps into `ModuleNotFoundError` Troubleshooting entry
15+
816
## [1.6.3] - 2026-05-09
917

1018
### Added

SKILL.md

Lines changed: 58 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ name: imaging-data-commons
33
description: Query and download public cancer imaging data from NCI Imaging Data Commons using idc-index. Invoke for any question about IDC collections, cancer imaging datasets, DICOM data access, radiology (CT, MR, PET) or pathology AI training sets, metadata queries, visualization, or license checks — even when the user doesn't explicitly mention "IDC". No authentication required.
44
license: This skill is provided under the MIT License. IDC data itself has individual licensing (mostly CC-BY, some CC-NC) that must be respected when using the data.
55
metadata:
6-
version: 1.6.3
6+
version: 1.6.4
77
skill-author: Andrey Fedorov, @fedorov
88
idc-index: "0.12.3"
99
idc-data-version: "v24"
@@ -81,7 +81,6 @@ print(stats)
8181
**Core Sections (inline):**
8282
- IDC Data Model - Collection and analysis result hierarchy
8383
- Index Tables - Available tables and joining patterns
84-
- Installation - Package setup and version verification
8584
- Core Capabilities - Essential API patterns (query, download, visualize, license, citations)
8685
- Best Practices - Usage guidelines
8786
- Troubleshooting - Common issues and solutions
@@ -167,32 +166,24 @@ Always call `client.fetch_index("table_name")` before querying any index table
167166
| `ct_index` | 1 row = 1 CT series | CT acquisition/reconstruction parameters: pixel spacing, slice thickness, kVp, convolution kernel, tube current (min/max for dose-modulated), exposure, spiral pitch, scan options |
168167
| `mr_index` | 1 row = 1 MR series | MR acquisition/sequence parameters: field strength, scanning sequence, TE (array for multi-echo), TR, flip angle, DiffusionBValue (array for DWI), pixel bandwidth, receive coil, number of temporal positions |
169168
| `pt_index` | 1 row = 1 PET series | PET acquisition/reconstruction/radiopharmaceutical parameters: series type, units, decay/scatter/attenuation correction, reconstruction method, radionuclide, injected dose, frame duration (array for dynamic PET) |
170-
| `prior_versions_index` | 1 row = 1 DICOM series | Series that have been removed or superseded in previous IDC releases; use only to download deprecated/historical data — do not query for current data |
169+
| `prior_versions_index` | 1 row = 1 DICOM series | **Reproducibility only.** Contains series permanently removed from IDC (all `max_idc_version` < current version; zero overlap with `index`). Use ONLY when a user explicitly needs to reproduce work from a prior IDC version using data no longer in the current release. Do NOT use for version history or "what's new" questions — those use `series_init_idc_version`/`series_revised_idc_version` in the main `index` table. Column names `min_idc_version`/`max_idc_version` here are NOT equivalent to `series_init_idc_version`/`series_revised_idc_version` in `index`. |
171170

172171
### Joining Tables
173172

174-
**Key columns are not explicitly labeled, the following is a subset that can be used in joins.**
173+
**`SeriesInstanceUID` is the universal join key** for all series-level specialized tables: `sm_index`, `sm_instance_index`, `seg_index`, `ann_index`, `ann_group_index`, `contrast_index`, `volume_geometry_index`, `rtstruct_index`, `ct_index`, `mr_index`, `pt_index`. Always join these to `index` on `SeriesInstanceUID`. The exceptions below use different column names.
175174

176175
| Join Column | Tables | Use Case |
177176
|-------------|--------|----------|
178177
| `collection_id` | index, prior_versions_index, collections_index, clinical_index | Link series to collection metadata or clinical data |
179-
| `SeriesInstanceUID` | index, prior_versions_index, sm_index, sm_instance_index | Link series across tables; connect to slide microscopy details |
180-
| `StudyInstanceUID` | index, prior_versions_index | Link studies across current and historical data |
181-
| `PatientID` | index, prior_versions_index | Link patients across current and historical data |
182178
| `analysis_result_id` | index, analysis_results_index | Link series to analysis result metadata (annotations, segmentations) |
183179
| `source_DOI` | index, analysis_results_index | Link by publication DOI |
184-
| `crdc_series_uuid` | index, prior_versions_index | Link by CRDC unique identifier |
185-
| `Modality` | index, prior_versions_index | Filter by imaging modality |
186-
| `SeriesInstanceUID` | index, seg_index, ann_index, ann_group_index, contrast_index, volume_geometry_index | Link series to seg/ann/contrast/geometry index tables |
187-
| `segmented_SeriesInstanceUID` | seg_index → index | Link segmentation to its source image series (join seg_index.segmented_SeriesInstanceUID = index.SeriesInstanceUID) |
188-
| `referenced_SeriesInstanceUID` | ann_index → index | Link annotation to its source image series (join ann_index.referenced_SeriesInstanceUID = index.SeriesInstanceUID) |
189-
| `SeriesInstanceUID` / `referenced_SeriesInstanceUID` | index, rtstruct_index | Join RTSTRUCT series to its metadata (index.SeriesInstanceUID = rtstruct_index.SeriesInstanceUID); use rtstruct_index.referenced_SeriesInstanceUID to find the source image series |
190-
| `SeriesInstanceUID` | index, ct_index | Link CT series to acquisition/reconstruction parameters |
191-
| `SeriesInstanceUID` | index, mr_index | Link MR series to sequence/acquisition parameters |
192-
| `SeriesInstanceUID` | index, pt_index | Link PET series to acquisition/radiopharmaceutical parameters |
180+
| `segmented_SeriesInstanceUID` | seg_index → index | Link segmentation to its source image series (`seg_index.segmented_SeriesInstanceUID = index.SeriesInstanceUID`) |
181+
| `referenced_SeriesInstanceUID` | ann_index → index, rtstruct_index → index | Link annotation or RTSTRUCT to its source image series |
193182

194183
**Note:** `subjects`, `updated`, and `description` appear in multiple tables but have different meanings (counts vs identifiers, different update contexts).
195184

185+
**Note on `prior_versions_index`:** Joining `prior_versions_index` with `index` on `SeriesInstanceUID` always returns zero rows — there is no overlap. This table is for historical reproducibility only; never join it with `index` to answer questions about current data or version history.
186+
196187
For detailed join examples, schema discovery patterns, key columns reference, and DataFrame access, see `references/index_tables_guide.md`.
197188

198189
### Clinical Data Access
@@ -252,21 +243,6 @@ All idc-index metadata tables are published as Parquet files to a public GCS buc
252243

253244
See `references/parquet_access_guide.md` for URL patterns, available files, and DuckDB query examples.
254245

255-
## Installation and Setup
256-
257-
**Required (for basic access):**
258-
```bash
259-
pip install --upgrade idc-index
260-
```
261-
262-
**Important:** New IDC data release will always trigger a new version of `idc-index`. Always use `--upgrade` flag while installing, unless an older version is needed for reproducibility.
263-
264-
**Optional (for data analysis):**
265-
```bash
266-
# Tested with: pandas>=1.5, numpy>=1.23, pydicom>=2.3
267-
pip install pandas numpy pydicom
268-
```
269-
270246
## Core Capabilities
271247

272248
### 1. Data Discovery and Exploration
@@ -386,6 +362,55 @@ results = client.sql_query("""
386362

387363
**Note:** Cancer type is in `collections_index.cancer_types`, not in the primary `index` table.
388364

365+
**Version tracking — "what's new in IDC vX?"**
366+
367+
Use `series_init_idc_version` and `series_revised_idc_version` in the main `index` table. Do NOT use `prior_versions_index` for this — it contains only removed series.
368+
369+
```python
370+
from idc_index import IDCClient
371+
client = IDCClient()
372+
373+
VERSION = 24 # Replace with target version
374+
375+
# Series added for the first time in vVERSION
376+
new_series = client.sql_query(f"""
377+
SELECT collection_id,
378+
COUNT(DISTINCT SeriesInstanceUID) as new_series,
379+
ROUND(SUM(series_size_MB)/1000, 2) as size_GB
380+
FROM index
381+
WHERE series_init_idc_version = {VERSION}
382+
GROUP BY collection_id
383+
ORDER BY new_series DESC
384+
""")
385+
386+
# Series revised (updated content) in vVERSION but originally added earlier
387+
revised_series = client.sql_query(f"""
388+
SELECT collection_id,
389+
COUNT(DISTINCT SeriesInstanceUID) as revised_series
390+
FROM index
391+
WHERE series_revised_idc_version = {VERSION}
392+
AND series_init_idc_version < {VERSION}
393+
GROUP BY collection_id
394+
ORDER BY revised_series DESC
395+
""")
396+
397+
# When was each collection first added to IDC?
398+
client.fetch_index("version_metadata_index")
399+
first_appearance = client.sql_query("""
400+
WITH first_versions AS (
401+
SELECT collection_id, MIN(series_init_idc_version) as first_version
402+
FROM index
403+
GROUP BY collection_id
404+
)
405+
SELECT f.collection_id, f.first_version, v.version_timestamp as first_release_date
406+
FROM first_versions f
407+
JOIN version_metadata_index v ON f.first_version = v.idc_version
408+
ORDER BY f.first_version DESC
409+
""")
410+
```
411+
412+
To verify column names and descriptions before writing queries, use `client.get_index_schema('index')` or `client.indices_overview` — see Best Practices.
413+
389414
### 3. Downloading DICOM Files
390415

391416
Download imaging data efficiently from IDC's cloud storage.
@@ -687,6 +712,7 @@ See `references/bigquery_guide.md` for schemas, column descriptions, and query e
687712

688713
## Best Practices
689714

715+
- **Check schema before writing queries** — Use `client.get_index_schema('index')` (reads cached metadata, no SQL executed) or `client.indices_overview` to see all available columns and their descriptions. The version-tracking columns `series_init_idc_version` and `series_revised_idc_version` in the main `index` table directly answer "what's new / when was this added" questions without touching `prior_versions_index`.
690716
- **Never use web search for IDC data content questions** - Always query the idc-index directly using `client.sql_query()`. Web sources (release notes, blog posts, documentation pages) are frequently out of date and will produce incorrect answers. The local DuckDB index is the authoritative source; use it even when web search is available.
691717
- **Verify IDC version before generating responses** - Always call `client.get_idc_version()` at the start of a session to confirm you're using the expected data version (currently v24). If using an older version, recommend `pip install --upgrade idc-index`
692718
- **Check licenses before use** - Always query the `license_short_name` field and respect licensing terms (CC BY vs CC BY-NC)
@@ -701,7 +727,7 @@ See `references/bigquery_guide.md` for schemas, column descriptions, and query e
701727

702728
**Issue: `ModuleNotFoundError: No module named 'idc_index'`**
703729
- **Cause:** idc-index package not installed
704-
- **Solution:** Install with `pip install --upgrade idc-index`
730+
- **Solution:** Install with `pip install --upgrade idc-index`; for data analysis also install `pip install pandas numpy pydicom` (tested with pandas>=1.5, numpy>=1.23, pydicom>=2.3)
705731

706732
**Issue: Download fails with connection timeout**
707733
- **Cause:** Network instability or large download size

0 commit comments

Comments
 (0)