Skip to content

Commit 8300a62

Browse files
authored
Merge pull request #11 from ImagingDataCommons/update-iid-and-more
Update idc-index version and add functionality
2 parents 4a51999 + 9e7a55c commit 8300a62

12 files changed

Lines changed: 1346 additions & 15 deletions

.github/dependabot.yml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
version: 2
2+
updates:
3+
- package-ecosystem: "github-actions"
4+
directory: ".github/workflows"
5+
schedule:
6+
interval: "daily"

.github/workflows/check-links.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,3 +29,4 @@ jobs:
2929
--config .lychee.toml
3030
'**/*.md'
3131
fail: true
32+
debug: true
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
name: Test Code Snippets
2+
3+
on:
4+
push:
5+
branches:
6+
- main
7+
paths:
8+
- 'SKILL.md'
9+
- 'references/**'
10+
- 'tests/**'
11+
- '.github/workflows/test-snippets.yml'
12+
pull_request:
13+
paths:
14+
- 'SKILL.md'
15+
- 'references/**'
16+
- 'tests/**'
17+
- '.github/workflows/test-snippets.yml'
18+
workflow_dispatch:
19+
20+
jobs:
21+
test:
22+
runs-on: ubuntu-latest
23+
timeout-minutes: 30
24+
25+
steps:
26+
- uses: actions/checkout@v4
27+
28+
- uses: actions/setup-python@v5
29+
with:
30+
python-version: '3.11'
31+
32+
- name: Cache idc-index data
33+
uses: actions/cache@v4
34+
with:
35+
path: ~/.idc
36+
key: idc-index-0.11.14
37+
38+
- name: Install test dependencies
39+
run: pip install -r tests/requirements-test.txt
40+
41+
- name: Run snippet tests
42+
run: pytest tests/test_snippets.py -v --timeout=300

CHANGELOG.md

Lines changed: 22 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,28 @@ All notable changes to the IDC Claude Skill are documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com/),
66
and this project adheres to [Semantic Versioning](https://semver.org/).
77

8-
## [Unreleased]
8+
## [1.5.0] - 2026-04-08
9+
10+
### Added
11+
12+
- `volume_geometry_index` table documentation: 3D geometry validation for single-frame CT, MR, and PT series; boolean checks (orientation, spacing, dimensions, slice positions) and composite `regularly_spaced_3d_volume` flag; join via `SeriesInstanceUID`
13+
- `rtstruct_index` table documentation: RT Structure Set metadata (total ROIs, ROI names, generation algorithms, interpreted types, referenced image series UID); join via `SeriesInstanceUID`
14+
- New reference guide `references/parquet_access_guide.md`: direct DuckDB queries against public GCS Parquet files without installing idc-index; URL pattern, available files, and query examples for main index, `volume_geometry_index`, and `rtstruct_index`
15+
- SQL patterns for `volume_geometry_index` and `rtstruct_index` in `references/sql_patterns.md`
16+
- Detailed documentation for BigQuery-only derived tables in `references/bigquery_guide.md`:
17+
- `segmentations`: per-segment anatomy with full schema, column descriptions, and queries for discovering structures, filtering by coded concept, and linking to SR measurements; note on gap vs `seg_index` in idc-index
18+
- `quantitative_measurements`: radiomics and clinical numeric measurements from DICOM SR TID1500 (volume, diameter, shape descriptors, texture, intensity statistics); full schema with column descriptions and query examples
19+
- `qualitative_measurements`: coded assessments from DICOM SR TID1500 (malignancy rating, calcification, texture, margin); full schema with column descriptions and query examples
20+
- `measurement_groups`: parent grouping table for SR measurements
21+
- Combined example joining all three derived tables for LIDC-IDRI nodule analysis (malignancy + volume + diameter)
22+
- SKILL.md section 7 now explicitly lists per-segment anatomy search, quantitative SR measurements, and qualitative SR measurements as BigQuery-only use cases with no idc-index equivalent
23+
24+
### Changed
25+
26+
- Updated to idc-index 0.11.14 (idc-index-data 23.10.1)
27+
- Added `SOPClassUID` and `TransferSyntaxUID` columns to Key Columns Reference in `references/index_tables_guide.md`
28+
- Added Direct Parquet Access entry to Data Access Options table and pointer in SKILL.md
29+
- Added `parquet_access_guide.md` to Quick Navigation table in SKILL.md
930

1031
## [1.4.0] - 2026-03-04
1132

SKILL.md

Lines changed: 22 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ license: This skill is provided under the MIT License. IDC data itself has indiv
55
metadata:
66
version: 1.4.0
77
skill-author: Andrey Fedorov, @fedorov
8-
idc-index: "0.11.10"
8+
idc-index: "0.11.14"
99
idc-data-version: "v23"
1010
repository: https://github.com/ImagingDataCommons/idc-claude-skill
1111
---
@@ -25,7 +25,7 @@ Use the `idc-index` Python package to query and download public cancer imaging d
2525
```python
2626
import idc_index
2727

28-
REQUIRED_VERSION = "0.11.10" # Must match metadata.idc-index in this file
28+
REQUIRED_VERSION = "0.11.14" # Must match metadata.idc-index in this file
2929
installed = idc_index.__version__
3030

3131
if installed < REQUIRED_VERSION:
@@ -97,6 +97,7 @@ print(stats)
9797
| `digital_pathology_guide.md` | Slide microscopy (SM), annotations (ANN), pathology workflows |
9898
| `bigquery_guide.md` | Full DICOM metadata, private elements (requires GCP) |
9999
| `cli_guide.md` | Command-line tools (`idc download`, manifest files) |
100+
| `parquet_access_guide.md` | Direct Parquet queries via GCS (no idc-index install needed) |
100101

101102
## IDC Data Model
102103

@@ -138,6 +139,8 @@ The `idc-index` package provides multiple metadata index tables, accessible via
138139
| `ann_index` | 1 row = 1 DICOM ANN series | fetch_index() | Microscopy Bulk Simple Annotations series metadata; references annotated image series |
139140
| `ann_group_index` | 1 row = 1 annotation group | fetch_index() | Detailed annotation group metadata: graphic type, annotation count, property codes, algorithm |
140141
| `contrast_index` | 1 row = 1 series with contrast info | fetch_index() | Contrast agent metadata: agent name, ingredient, administration route (CT, MR, PT, XA, RF) |
142+
| `volume_geometry_index` | 1 row = 1 CT/MR/PT series | fetch_index() | 3D volume geometry validation for single-frame CT, MR, and PT series; boolean checks for orientation, spacing, dimensions, and slice positions; composite `regularly_spaced_3d_volume` flag |
143+
| `rtstruct_index` | 1 row = 1 RTSTRUCT series | fetch_index() | RT Structure Set metadata: total ROI count, ROI names, generation algorithms, interpreted types, and the referenced image series UID |
141144

142145
**Auto** = loaded automatically when `IDCClient()` is instantiated
143146
**fetch_index()** = requires `client.fetch_index("table_name")` to load
@@ -159,6 +162,8 @@ The `idc-index` package provides multiple metadata index tables, accessible via
159162
| `SeriesInstanceUID` | index, seg_index, ann_index, ann_group_index, contrast_index | Link segmentation/annotation/contrast series to its index metadata |
160163
| `segmented_SeriesInstanceUID` | seg_index → index | Link segmentation to its source image series (join seg_index.segmented_SeriesInstanceUID = index.SeriesInstanceUID) |
161164
| `referenced_SeriesInstanceUID` | ann_index → index | Link annotation to its source image series (join ann_index.referenced_SeriesInstanceUID = index.SeriesInstanceUID) |
165+
| `SeriesInstanceUID` | index, volume_geometry_index | Link series to its 3D geometry validation result (join index.SeriesInstanceUID = volume_geometry_index.SeriesInstanceUID) |
166+
| `SeriesInstanceUID` / `referenced_SeriesInstanceUID` | index, rtstruct_index | Join RTSTRUCT series to its metadata (index.SeriesInstanceUID = rtstruct_index.SeriesInstanceUID); use rtstruct_index.referenced_SeriesInstanceUID to find the source image series |
162167

163168
**Note:** `Subjects`, `Updated`, and `Description` appear in multiple tables but have different meanings (counts vs identifiers, different update contexts).
164169

@@ -184,6 +189,7 @@ See `references/clinical_data_guide.md` for detailed workflows including value m
184189
| Method | Auth Required | Best For |
185190
|--------|---------------|----------|
186191
| `idc-index` | No | Key queries and downloads (recommended) |
192+
| Direct Parquet (GCS) | No | Quick queries without installing idc-index; always uses latest data |
187193
| IDC Portal | No | Interactive exploration, manual selection, browser-based download |
188194
| BigQuery | Yes (GCP account) | Complex queries, full DICOM metadata |
189195
| DICOMweb proxy | No | Tool integration via DICOMweb API |
@@ -214,6 +220,12 @@ IDC data is available via DICOMweb interface (Google Cloud Healthcare API implem
214220

215221
See `references/dicomweb_guide.md` for endpoint URLs, code examples, supported operations, and implementation details.
216222

223+
**Direct Parquet access**
224+
225+
All idc-index metadata tables are published as Parquet files to a public GCS bucket (`idc-index-data-artifacts`) with unrestricted CORS. This enables DuckDB or pandas queries without installing idc-index, including cross-table joins and queries against `volume_geometry_index` and `rtstruct_index`.
226+
227+
See `references/parquet_access_guide.md` for URL patterns, available files, and DuckDB query examples.
228+
217229
## Installation and Setup
218230

219231
**Required (for basic access):**
@@ -229,7 +241,7 @@ print(client.get_idc_version()) # Should return "v23"
229241
```
230242
If you see an older version, upgrade with: `pip install --upgrade idc-index`
231243

232-
**Tested with:** idc-index 0.11.10 (IDC data version v23)
244+
**Tested with:** idc-index 0.11.14 (IDC data version v23)
233245

234246
**Optional (for data analysis):**
235247
```bash
@@ -649,6 +661,13 @@ See `references/bigquery_guide.md` for setup, table schemas, query patterns, pri
649661

650662
Common specialized indices: `seg_index` (segmentations), `ann_index` / `ann_group_index` (microscopy annotations), `sm_index` (slide microscopy), `collections_index` (collection metadata). Only use BigQuery if you need private DICOM elements or attributes not in any index.
651663

664+
**Use cases that require BigQuery (no idc-index equivalent):**
665+
- **Per-segment anatomy search**`seg_index` gives series-level SEG metadata, but the BigQuery `segmentations` table exposes each segment individually with its DICOM coded structure name (e.g., find all SEG series containing a "Liver" or "Neoplasm" segment)
666+
- **Quantitative measurements from SR** — the `quantitative_measurements` BigQuery table contains pre-extracted radiomics features (volume, diameter, shape descriptors, texture, intensity statistics) from DICOM SR TID1500 objects; no idc-index equivalent
667+
- **Qualitative measurements from SR** — the `qualitative_measurements` BigQuery table contains coded assessments (malignancy rating, calcification, texture, margin) from DICOM SR TID1500; no idc-index equivalent
668+
669+
See `references/bigquery_guide.md` for schemas, column descriptions, and query examples for these tables.
670+
652671
### 8. Tool Selection Guide
653672

654673
| Task | Tool | Reference |

0 commit comments

Comments
 (0)