ImagingDataCommons
diff --git a/‎CHANGELOG.md‎
Lines changed: 21 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 21 additions & 0 deletions
diff --git a/‎SKILL.md‎
Lines changed: 87 additions & 74 deletions b/‎SKILL.md‎
Lines changed: 87 additions & 74 deletions
@@ -5,6 +5,27 @@ All notable changes to the IDC Claude Skill are documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/),
 and this project adheres to [Semantic Versioning](https://semver.org/).
 
+## [1.6.3] - 2026-05-09
+
+### Added
+
+- `ct_index`, `mr_index`, `pt_index` tables (idc-index 0.12.3 / idc-index-data 24.2.0): modality-specific acquisition and reconstruction parameter indices, one row per series, all joining on `SeriesInstanceUID`
+  - `ct_index` (21 columns): pixel spacing, slice thickness, kVp, convolution kernel, tube current min/max (dose-modulated), exposure, spiral pitch, scan options
+  - `mr_index` (22 columns): field strength, scanning sequence, TE (array for multi-echo), TR, flip angle, DiffusionBValue (array for DWI), pixel bandwidth, receive coil, number of temporal positions
+  - `pt_index` (21 columns): radionuclide, injected dose, reconstruction method, decay/scatter/attenuation correction, frame duration (array for dynamic PET), number of time slices
+- SQL query patterns for all three new tables in `references/sql_patterns.md`
+- Join column entries for `ct_index`, `mr_index`, `pt_index` in `references/index_tables_guide.md` and SKILL.md
+- Parquet file entries for `ct_index.parquet`, `mr_index.parquet`, `pt_index.parquet` in `references/parquet_access_guide.md`
+
+### Changed
+
+- Added concrete `indices_overview` code example showing how to search for a column across all tables and read column schemas without fetching the table; directly addresses the failure mode where agents query `index` for modality-specific parameters (SliceThickness, KVP, etc.) instead of using `ct_index`/`mr_index`/`pt_index`
+- Added troubleshooting entry "Column not found in `index` table" with a working `indices_overview` search snippet and join example, covering common acquisition/reconstruction parameters that live in the modality-specific index tables
+- Updated idc-index reference to 0.12.3
+- Clarified `download_from_selection` API: added explicit warning that it takes filter keyword arguments (not a DataFrame), comparison table vs `download_dicom_series` (which has a different first-argument order), and restructured the download example as a step-by-step query → extract UIDs → pass list flow
+- Documented `download_dicom_series` as an alternative download method with its own signature (`seriesInstanceUID` as first arg, then `downloadDir`)
+- Reduced redundancy and duplication in SKILL.md for cleaner reading
+
 ## [1.6.2] - 2026-05-08
 
 ### Changed
 
@@ -3,9 +3,9 @@ name: imaging-data-commons
 description: Query and download public cancer imaging data from NCI Imaging Data Commons using idc-index. Invoke for any question about IDC collections, cancer imaging datasets, DICOM data access, radiology (CT, MR, PET) or pathology AI training sets, metadata queries, visualization, or license checks — even when the user doesn't explicitly mention "IDC". No authentication required.
 license: This skill is provided under the MIT License. IDC data itself has individual licensing (mostly CC-BY, some CC-NC) that must be respected when using the data.
 metadata:
-    version: 1.6.2
+    version: 1.6.3
     skill-author: Andrey Fedorov, @fedorov
-    idc-index: "0.12.2"
+    idc-index: "0.12.3"
     idc-data-version: "v24"
     repository: https://github.com/ImagingDataCommons/idc-claude-skill
 ---
@@ -82,7 +82,7 @@ print(stats)
 - IDC Data Model - Collection and analysis result hierarchy
 - Index Tables - Available tables and joining patterns
 - Installation - Package setup and version verification
-- Core Capabilities - Essential API patterns (query, download, visualize, license, citations, batch)
+- Core Capabilities - Essential API patterns (query, download, visualize, license, citations)
 - Best Practices - Usage guidelines
 - Troubleshooting - Common issues and solutions
 
@@ -91,7 +91,7 @@ print(stats)
 | Guide | When to Load |
 |-------|--------------|
 | `index_tables_guide.md` | Complex JOINs, schema discovery, DataFrame access |
-| `use_cases.md` | End-to-end workflow examples (training datasets, batch downloads) |
+| `use_cases.md` | End-to-end workflows: training datasets, batch downloads, DICOM reading with pydicom/SimpleITK, pipeline integration |
 | `sql_patterns.md` | Quick SQL patterns for filter discovery, annotations, size estimation |
 | `clinical_data_guide.md` | Clinical/tabular data, imaging+clinical joins, value mapping |
 | `cloud_storage_guide.md` | Direct S3/GCS access, versioning, UUID mapping |
@@ -126,6 +126,25 @@ The `idc-index` package provides multiple metadata index tables, accessible via
 
 **Important:** Use `client.indices_overview` to get current table descriptions and column schemas. This is the authoritative source for available columns and their types — always query it when writing SQL or exploring data structure.
 
+```python
+from idc_index import IDCClient
+
+client = IDCClient()
+
+# Find which table(s) contain a specific column (no fetch required)
+target = "SliceThickness"
+for table_name, info in client.indices_overview.items():
+    if any(c["name"] == target for c in info["schema"]["columns"]):
+        print(f"'{target}' is in: {table_name}")
+# → 'SliceThickness' is in: ct_index
+
+# List all columns in a table from the schema (no fetch required)
+ct_cols = [c["name"] for c in client.indices_overview["ct_index"]["schema"]["columns"]]
+print("ct_index columns:", ct_cols)
+# → ['SeriesInstanceUID', 'PixelSpacing_row_mm', 'PixelSpacing_col_mm', 'Rows',
+#    'Columns', 'SliceThickness', 'KVP', 'ConvolutionKernel', ...]
+```
+
 ### Available Tables
 
 Always call `client.fetch_index("table_name")` before querying any index table — it is safe and idempotent for all tables, including those loaded automatically at startup.
@@ -145,6 +164,9 @@ Always call `client.fetch_index("table_name")` before querying any index table
 | `contrast_index` | 1 row = 1 series with contrast info | Contrast agent metadata: agent name, ingredient, administration route (CT, MR, PT, XA, RF) |
 | `volume_geometry_index` | 1 row = 1 CT/MR/PT series | 3D volume geometry validation for single-frame CT, MR, and PT series; boolean checks for orientation, spacing, dimensions, and slice positions; composite `regularly_spaced_3d_volume` flag |
 | `rtstruct_index` | 1 row = 1 RTSTRUCT series | RT Structure Set metadata: total ROI count, ROI names, generation algorithms, interpreted types, and the referenced image series UID |
+| `ct_index` | 1 row = 1 CT series | CT acquisition/reconstruction parameters: pixel spacing, slice thickness, kVp, convolution kernel, tube current (min/max for dose-modulated), exposure, spiral pitch, scan options |
+| `mr_index` | 1 row = 1 MR series | MR acquisition/sequence parameters: field strength, scanning sequence, TE (array for multi-echo), TR, flip angle, DiffusionBValue (array for DWI), pixel bandwidth, receive coil, number of temporal positions |
+| `pt_index` | 1 row = 1 PET series | PET acquisition/reconstruction/radiopharmaceutical parameters: series type, units, decay/scatter/attenuation correction, reconstruction method, radionuclide, injected dose, frame duration (array for dynamic PET) |
 | `prior_versions_index` | 1 row = 1 DICOM series | Series that have been removed or superseded in previous IDC releases; use only to download deprecated/historical data — do not query for current data |
 
 ### Joining Tables
@@ -161,11 +183,13 @@ Always call `client.fetch_index("table_name")` before querying any index table
 | `source_DOI` | index, analysis_results_index | Link by publication DOI |
 | `crdc_series_uuid` | index, prior_versions_index | Link by CRDC unique identifier |
 | `Modality` | index, prior_versions_index | Filter by imaging modality |
-| `SeriesInstanceUID` | index, seg_index, ann_index, ann_group_index, contrast_index | Link segmentation/annotation/contrast series to its index metadata |
+| `SeriesInstanceUID` | index, seg_index, ann_index, ann_group_index, contrast_index, volume_geometry_index | Link series to seg/ann/contrast/geometry index tables |
 | `segmented_SeriesInstanceUID` | seg_index → index | Link segmentation to its source image series (join seg_index.segmented_SeriesInstanceUID = index.SeriesInstanceUID) |
 | `referenced_SeriesInstanceUID` | ann_index → index | Link annotation to its source image series (join ann_index.referenced_SeriesInstanceUID = index.SeriesInstanceUID) |
-| `SeriesInstanceUID` | index, volume_geometry_index | Link series to its 3D geometry validation result (join index.SeriesInstanceUID = volume_geometry_index.SeriesInstanceUID) |
 | `SeriesInstanceUID` / `referenced_SeriesInstanceUID` | index, rtstruct_index | Join RTSTRUCT series to its metadata (index.SeriesInstanceUID = rtstruct_index.SeriesInstanceUID); use rtstruct_index.referenced_SeriesInstanceUID to find the source image series |
+| `SeriesInstanceUID` | index, ct_index | Link CT series to acquisition/reconstruction parameters |
+| `SeriesInstanceUID` | index, mr_index | Link MR series to sequence/acquisition parameters |
+| `SeriesInstanceUID` | index, pt_index | Link PET series to acquisition/radiopharmaceutical parameters |
 
 **Note:** `subjects`, `updated`, and `description` appear in multiple tables but have different meanings (counts vs identifiers, different update contexts).
 
@@ -237,14 +261,6 @@ pip install --upgrade idc-index
 
 **Important:** New IDC data release will always trigger a new version of `idc-index`. Always use `--upgrade` flag while installing, unless an older version is needed for reproducibility.
 
-**IMPORTANT:** IDC data version v24 is current. Always verify your version:
-```python
-print(client.get_idc_version())  # Should return "v24"
-```
-If you see an older version, upgrade with: `pip install --upgrade idc-index`
-
-**Tested with:** idc-index 0.12.2 (IDC data version v24)
-
 **Optional (for data analysis):**
 ```bash
 # Tested with: pandas>=1.5, numpy>=1.23, pydicom>=2.3
@@ -372,7 +388,16 @@ results = client.sql_query("""
 
 ### 3. Downloading DICOM Files
 
-Download imaging data efficiently from IDC's cloud storage:
+Download imaging data efficiently from IDC's cloud storage.
+
+**IMPORTANT — two download methods with different signatures:**
+
+| Method | First arg | Second arg | Use when |
+|--------|-----------|------------|----------|
+| `download_from_selection` | `downloadDir` (required) | filter kwargs (optional) | Filtering by collection, patient, study, or series |
+| `download_dicom_series` | `seriesInstanceUID` (required) | `downloadDir` (required) | Downloading specific series by UID only |
+
+**`download_from_selection` takes filter keyword arguments, NOT a DataFrame.** The name "from_selection" refers to filtering the IDC index by criteria — not accepting a pandas DataFrame. To download the results of a query, extract UIDs from the DataFrame and pass them as a list.
 
 **Download entire collection:**
 ```python
@@ -381,15 +406,16 @@ from idc_index import IDCClient
 client = IDCClient()
 
 # Download small collection (RIDER Pilot ~1GB)
+# downloadDir is the FIRST positional argument
 client.download_from_selection(
-    collection_id="rider_pilot",
-    downloadDir="./data/rider"
+    downloadDir="./data/rider",
+    collection_id="rider_pilot"
 )
 ```
 
-**Download specific series:**
+**Download specific series (from a query result):**
 ```python
-# First, query for series UIDs
+# Step 1: Query for series UIDs
 series_df = client.sql_query("""
     SELECT SeriesInstanceUID
     FROM index
@@ -399,11 +425,27 @@ series_df = client.sql_query("""
     LIMIT 5
 """)
 
-# Download only those series
+# Step 2: Extract UIDs as a list from the DataFrame
+uids = list(series_df['SeriesInstanceUID'].values)
+
+# Step 3: Pass the list to download_from_selection (NOT the DataFrame itself)
 client.download_from_selection(
-    seriesInstanceUID=list(series_df['SeriesInstanceUID'].values),
+    downloadDir="./data/lung_ct",
+    seriesInstanceUID=uids       # list of strings, not a DataFrame
+)
+
+# Alternative: download_dicom_series has seriesInstanceUID as FIRST arg (different order!)
+client.download_dicom_series(
+    seriesInstanceUID=uids,      # FIRST arg here
     downloadDir="./data/lung_ct"
 )
+
+# Download from Google Storage instead of AWS
+client.download_from_selection(
+    downloadDir="./data/lung_ct",
+    seriesInstanceUID=uids,
+    source_bucket_location="gcs"
+)
 ```
 
 **Custom directory structure:**
@@ -413,16 +455,16 @@ Default `dirTemplate`: `%collection_id/%PatientID/%StudyInstanceUID/%Modality_%S
 ```python
 # Simplified hierarchy (omit StudyInstanceUID level)
 client.download_from_selection(
-    collection_id="tcga_luad",
     downloadDir="./data",
+    collection_id="tcga_luad",
     dirTemplate="%collection_id/%PatientID/%Modality"
 )
 # Results in: ./data/tcga_luad/TCGA-05-4244/CT/
 
 # Flat structure (all files in one directory)
 client.download_from_selection(
-    seriesInstanceUID=list(series_df['SeriesInstanceUID'].values),
     downloadDir="./data/flat",
+    seriesInstanceUID=list(series_df['SeriesInstanceUID'].values),
     dirTemplate=""
 )
 # Results in: ./data/flat/*.dcm
@@ -606,13 +648,7 @@ bibtex_citations = client.citations_from_selection(
 
 **Best practice:** When publishing results using IDC data, include the generated citations to properly attribute the data sources and satisfy license requirements.
 
-### 6. Batch Processing and Filtering
-
-For large downloads, query first to build a manifest, save it to CSV for reproducibility, then iterate over slices of the result DataFrame with `download_from_selection()` using a `batch_size` of 10–20 series to avoid timeouts.
-
-See `references/use_cases.md` (Use Case 5) for a complete worked example with manufacturer filtering, manifest saving, and batched downloads.
-
-### 7. Advanced Queries with BigQuery
+### 6. Advanced Queries with BigQuery
 
 For queries requiring full DICOM metadata, complex JOINs, clinical data tables, or private DICOM elements, use Google BigQuery. Requires GCP account with billing enabled.
 
@@ -638,7 +674,7 @@ Common specialized indices: `seg_index` (segmentations), `ann_index` / `ann_grou
 
 See `references/bigquery_guide.md` for schemas, column descriptions, and query examples for these tables.
 
-### 8. Tool Selection Guide
+### 7. Tool Selection Guide
 
 | Task | Tool | Reference |
 |------|------|-----------|
@@ -649,20 +685,6 @@ See `references/bigquery_guide.md` for schemas, column descriptions, and query e
 
 **Default choice:** Use `idc-index` for most tasks (no auth, easy API, batch downloads).
 
-### 9. Integration with Analysis Pipelines
-
-After downloading DICOM files, use `pydicom` to read individual files or build 3D numpy arrays sorted by `ImagePositionPatient`. For a more robust reader with automatic series sorting and ITK image output, use `SimpleITK.ImageSeriesReader`.
-
-See `references/use_cases.md` (Use Case 6) for code examples reading DICOM with pydicom, building 3D CT volumes, and integrating with SimpleITK.
-
-## Common Use Cases
-
-See `references/use_cases.md` for complete end-to-end workflow examples including:
-- Building deep learning training datasets from lung CT scans
-- Comparing image quality across scanner manufacturers
-- Previewing data in browser before downloading
-- License-aware batch downloads for commercial use
-
 ## Best Practices
 
 - **Never use web search for IDC data content questions** - Always query the idc-index directly using `client.sql_query()`. Web sources (release notes, blog posts, documentation pages) are frequently out of date and will produce incorrect answers. The local DuckDB index is the authoritative source; use it even when web search is available.
@@ -700,6 +722,25 @@ See `references/use_cases.md` for complete end-to-end workflow examples includin
   - Use `LIMIT 5` to test query first
   - Check field names against metadata schema documentation
 
+**Issue: Column not found in `index` table (e.g., `SliceThickness`, `PixelSpacing`, `KVP`, `EchoTime`, `InjectedDose`)**
+- **Cause:** The `index` table contains series-level metadata only; modality-specific acquisition and reconstruction parameters live in dedicated tables (`ct_index`, `mr_index`, `pt_index`)
+- **Solution:** Search `client.indices_overview` to find the right table, then fetch and join on `SeriesInstanceUID`:
+  ```python
+  target = "SliceThickness"
+  for table_name, info in client.indices_overview.items():
+      if any(c["name"] == target for c in info["schema"]["columns"]):
+          print(f"Found in: {table_name}")
+  # → Found in: ct_index
+
+  client.fetch_index("ct_index")
+  result = client.sql_query("""
+      SELECT i.SeriesInstanceUID, i.Modality, c.SliceThickness, c.KVP, c.PixelSpacing_row_mm
+      FROM index i
+      JOIN ct_index c USING (SeriesInstanceUID)
+      WHERE i.collection_id = 'your_collection'
+  """)
+  ```
+
 **Issue: Downloaded DICOM files won't open**
 - **Cause:** Corrupted download or incompatible viewer
 - **Solution:**
@@ -718,38 +759,10 @@ See `references/sql_patterns.md` for quick-reference SQL patterns including:
 - Download size estimation
 - Clinical data linking
 
-For segmentation and annotation details, also see `references/digital_pathology_guide.md`.
-
-## Related Skills
-
-The following skills complement IDC workflows for downstream analysis and visualization:
-
-### DICOM Processing
-- **pydicom** - Read, write, and manipulate downloaded DICOM files. Use for extracting pixel data, reading metadata, anonymization, and format conversion. Essential for working with IDC radiology data (CT, MR, PET).
-
-### Pathology and Slide Microscopy
-See `references/digital_pathology_guide.md` for DICOM-compatible tools (highdicom, wsidicom, TIA-Toolbox, Slim viewer).
-
-### Metadata Visualization
-- **matplotlib** - Low-level plotting for full customization. Use for creating static figures summarizing IDC query results (bar charts of modalities, histograms of series counts, etc.).
-- **seaborn** - Statistical visualization with pandas integration. Use for quick exploration of IDC metadata distributions, relationships between variables, and categorical comparisons with attractive defaults.
-- **plotly** - Interactive visualization. Use when you need hover info, zoom, and pan for exploring IDC metadata, or for creating web-embeddable dashboards of collection statistics.
-
-### Data Exploration
-- **exploratory-data-analysis** - Comprehensive EDA on scientific data files. Use after downloading IDC data to understand file structure, quality, and characteristics before analysis.
+For digital pathology related see `references/digital_pathology_guide.md`.
 
 ## Resources
 
-### Schema Reference (Primary Source)
-
-**Always use `client.indices_overview` for current column schemas.** This ensures accuracy with the installed idc-index version:
-
-```python
-# Get all column names and types for any table
-schema = client.indices_overview["index"]["schema"]
-columns = [(c['name'], c['type'], c.get('description', '')) for c in schema['columns']]
-```
-
 ### Reference Documentation
 
 See the Quick Navigation section at the top for the full list of reference guides with decision triggers.