Skip to content

Commit 0938570

Browse files
authored
Merge pull request #18 from ImagingDataCommons/skill-1.6.2-web-search-guard
Skill v1.6.2: prohibit web search for IDC data, improve table ordering
2 parents 915355d + 1804ddd commit 0938570

2 files changed

Lines changed: 29 additions & 23 deletions

File tree

CHANGELOG.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,15 @@ All notable changes to the IDC Claude Skill are documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com/),
66
and this project adheres to [Semantic Versioning](https://semver.org/).
77

8+
## [1.6.2] - 2026-05-08
9+
10+
### Changed
11+
12+
- Moved `version_metadata_index` to second position in Available Tables (right after `index`) to surface it alongside the primary index
13+
- Moved `prior_versions_index` to last position in Available Tables; updated description to clarify it contains only removed/superseded series and should not be queried for current data
14+
- Added explicit Best Practices rule prohibiting web search for IDC data content questions; idc-index DuckDB queries are always authoritative — web sources are stale
15+
- Removed "Loaded" column from Available Tables and replaced with an unconditional rule: always call `client.fetch_index("table_name")` before querying any table; `fetch_index()` is idempotent for all tables including auto-loaded ones, so no exceptions are needed
16+
817
## [1.6.1] - 2026-05-08
918

1019
### Added

SKILL.md

Lines changed: 20 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ name: imaging-data-commons
33
description: Query and download public cancer imaging data from NCI Imaging Data Commons using idc-index. Invoke for any question about IDC collections, cancer imaging datasets, DICOM data access, radiology (CT, MR, PET) or pathology AI training sets, metadata queries, visualization, or license checks — even when the user doesn't explicitly mention "IDC". No authentication required.
44
license: This skill is provided under the MIT License. IDC data itself has individual licensing (mostly CC-BY, some CC-NC) that must be respected when using the data.
55
metadata:
6-
version: 1.6.1
6+
version: 1.6.2
77
skill-author: Andrey Fedorov, @fedorov
88
idc-index: "0.12.2"
99
idc-data-version: "v24"
@@ -128,25 +128,24 @@ The `idc-index` package provides multiple metadata index tables, accessible via
128128

129129
### Available Tables
130130

131-
| Table | Row Granularity | Loaded | Description |
132-
|-------|-----------------|--------|-------------|
133-
| `index` | 1 row = 1 DICOM series | Auto | Primary metadata for all current IDC data |
134-
| `prior_versions_index` | 1 row = 1 DICOM series | Auto | Series from previous IDC releases; for downloading deprecated data |
135-
| `collections_index` | 1 row = 1 collection | fetch_index() | Collection-level metadata and descriptions |
136-
| `analysis_results_index` | 1 row = 1 analysis result collection | fetch_index() | Metadata about derived datasets (annotations, segmentations) |
137-
| `clinical_index` | 1 row = 1 (collection, table, column) triple | fetch_index() | Dictionary mapping clinical data table columns to collections |
138-
| `sm_index` | 1 row = 1 slide microscopy series | fetch_index() | Slide Microscopy (pathology) series metadata |
139-
| `sm_instance_index` | 1 row = 1 slide microscopy instance | fetch_index() | Instance-level (SOPInstanceUID) metadata for slide microscopy |
140-
| `seg_index` | 1 row = 1 DICOM Segmentation series | fetch_index() | Segmentation metadata: algorithm, segment count, reference to source image series |
141-
| `ann_index` | 1 row = 1 DICOM ANN series | fetch_index() | Microscopy Bulk Simple Annotations series metadata; references annotated image series |
142-
| `ann_group_index` | 1 row = 1 annotation group | fetch_index() | Detailed annotation group metadata: graphic type, annotation count, property codes, algorithm |
143-
| `contrast_index` | 1 row = 1 series with contrast info | fetch_index() | Contrast agent metadata: agent name, ingredient, administration route (CT, MR, PT, XA, RF) |
144-
| `volume_geometry_index` | 1 row = 1 CT/MR/PT series | fetch_index() | 3D volume geometry validation for single-frame CT, MR, and PT series; boolean checks for orientation, spacing, dimensions, and slice positions; composite `regularly_spaced_3d_volume` flag |
145-
| `rtstruct_index` | 1 row = 1 RTSTRUCT series | fetch_index() | RT Structure Set metadata: total ROI count, ROI names, generation algorithms, interpreted types, and the referenced image series UID |
146-
| `version_metadata_index` | 1 row = 1 IDC release version | fetch_index() | IDC version release timestamps; join on `idc_version` to correlate series with their release date |
147-
148-
**Auto** = loaded automatically when `IDCClient()` is instantiated
149-
**fetch_index()** = requires `client.fetch_index("table_name")` to load
131+
Always call `client.fetch_index("table_name")` before querying any index table — it is safe and idempotent for all tables, including those loaded automatically at startup.
132+
133+
| Table | Row Granularity | Description |
134+
|-------|-----------------|-------------|
135+
| `index` | 1 row = 1 DICOM series | Primary metadata for all current IDC data |
136+
| `version_metadata_index` | 1 row = 1 IDC release version | IDC version release timestamps; join on `idc_version` to correlate series with their release date |
137+
| `collections_index` | 1 row = 1 collection | Collection-level metadata and descriptions |
138+
| `analysis_results_index` | 1 row = 1 analysis result collection | Metadata about derived datasets (annotations, segmentations) |
139+
| `clinical_index` | 1 row = 1 (collection, table, column) triple | Dictionary mapping clinical data table columns to collections |
140+
| `sm_index` | 1 row = 1 slide microscopy series | Slide Microscopy (pathology) series metadata |
141+
| `sm_instance_index` | 1 row = 1 slide microscopy instance | Instance-level (SOPInstanceUID) metadata for slide microscopy |
142+
| `seg_index` | 1 row = 1 DICOM Segmentation series | Segmentation metadata: algorithm, segment count, reference to source image series |
143+
| `ann_index` | 1 row = 1 DICOM ANN series | Microscopy Bulk Simple Annotations series metadata; references annotated image series |
144+
| `ann_group_index` | 1 row = 1 annotation group | Detailed annotation group metadata: graphic type, annotation count, property codes, algorithm |
145+
| `contrast_index` | 1 row = 1 series with contrast info | Contrast agent metadata: agent name, ingredient, administration route (CT, MR, PT, XA, RF) |
146+
| `volume_geometry_index` | 1 row = 1 CT/MR/PT series | 3D volume geometry validation for single-frame CT, MR, and PT series; boolean checks for orientation, spacing, dimensions, and slice positions; composite `regularly_spaced_3d_volume` flag |
147+
| `rtstruct_index` | 1 row = 1 RTSTRUCT series | RT Structure Set metadata: total ROI count, ROI names, generation algorithms, interpreted types, and the referenced image series UID |
148+
| `prior_versions_index` | 1 row = 1 DICOM series | Series that have been removed or superseded in previous IDC releases; use only to download deprecated/historical data — do not query for current data |
150149

151150
### Joining Tables
152151

@@ -666,17 +665,15 @@ See `references/use_cases.md` for complete end-to-end workflow examples includin
666665

667666
## Best Practices
668667

668+
- **Never use web search for IDC data content questions** - Always query the idc-index directly using `client.sql_query()`. Web sources (release notes, blog posts, documentation pages) are frequently out of date and will produce incorrect answers. The local DuckDB index is the authoritative source; use it even when web search is available.
669669
- **Verify IDC version before generating responses** - Always call `client.get_idc_version()` at the start of a session to confirm you're using the expected data version (currently v24). If using an older version, recommend `pip install --upgrade idc-index`
670670
- **Check licenses before use** - Always query the `license_short_name` field and respect licensing terms (CC BY vs CC BY-NC)
671671
- **Generate citations for attribution** - Use `citations_from_selection()` to get properly formatted citations from `source_DOI` values; include these in publications
672672
- **Start with small queries** - Use `LIMIT` clause when exploring to avoid long downloads and understand data structure
673673
- **Use mini-index for simple queries** - Only use BigQuery when you need comprehensive metadata or complex JOINs
674674
- **Organize downloads with dirTemplate** - Use meaningful directory structures like `%collection_id/%PatientID/%Modality`
675-
- **Cache query results** - Save DataFrames to CSV files to avoid re-querying and ensure reproducibility
676675
- **Estimate size first** - Check collection size before downloading - some collection sizes are in terabytes!
677676
- **Save manifests** - Always save query results with Series UIDs for reproducibility and data provenance
678-
- **Read documentation** - IDC data structure and metadata fields are documented at https://learn.canceridc.dev/
679-
- **Use IDC forum** - Search for questons/answers and ask your questions to the IDC maintainers and users at https://discourse.canceridc.dev/
680677

681678
## Troubleshooting
682679

0 commit comments

Comments
 (0)