Skip to content

Add the EHRShot dataset#306

Draft
mmcdermott wants to merge 1 commit into
devfrom
feat/add-dataset-ehrshot
Draft

Add the EHRShot dataset#306
mmcdermott wants to merge 1 commit into
devfrom
feat/add-dataset-ehrshot

Conversation

@mmcdermott
Copy link
Copy Markdown
Collaborator

@mmcdermott mmcdermott commented May 13, 2026

Summary

Registers EHRShot (Stanford EHR cohort) in src/MEDS_DEV/datasets/EHRShot/.

Files added:

  • dataset.yaml — metadata only; build_full is echo "MEDS extraction is pre-done for this dataset" since Stanford doesn't ship a publicly-installable MEDS extractor. No build_demo either, so integration tests skip EHRShot (per Make build_demo optional, skip demo-less datasets in tests #312).
  • predicates.yaml — OMOP-style predicate set with visit / lab / vital codes (Visit/IP, LOINC/8480-6, etc.).
  • README.md — description, access requirements, supported tasks list, source links.

No requirements.txt (no extractor to install). No refs.bib.

EHRShot is not added to any task's supported_datasets. Worth a follow-up to wire it into an existing task once we've confirmed which OMOP codes show up in real EHRShot MEDS exports.

Depends on #312

#312 makes build_demo optional in the registry and skips datasets that don't declare it from the integration test matrix. Targeted at feat/dataset-demo-availability for now; once #312 merges, this PR retargets to dev.

Test plan

Supersedes / refs

🤖 Generated with Claude Code

@codecov
Copy link
Copy Markdown

codecov Bot commented May 13, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.

Files with missing lines Coverage Δ
src/MEDS_DEV/datasets/__init__.py 97.43% <100.00%> (+0.06%) ⬆️

@mmcdermott mmcdermott changed the base branch from dev to feat/dataset-demo-availability May 13, 2026 16:33
@mmcdermott mmcdermott force-pushed the feat/add-dataset-ehrshot branch from 924d591 to 9a2d6d5 Compare May 13, 2026 16:33
mmcdermott added a commit that referenced this pull request May 13, 2026
Prerequisite for the per-dataset registration PRs (#305 AUMCdb, #306
EHRShot, #307 HIRID, #308 INSPIRE, #309 NWICU, #310 SICdb, #311 eICU).
Most of those datasets' upstream extractors don't ship a publicly
installable demo, and the existing registry validation requires every
dataset to declare a build_demo command.

Switches the convention to: a dataset has a demo iff its commands
declare build_demo. Absence is the signal — no separate metadata field.

- `test_all_datasets_have_commands` now requires `build_full` (which
  every dataset still needs) and allows missing `build_demo`.
- `tests/conftest.py` drops datasets without `build_demo` from the
  integration test matrix, so a per-dataset CI lane for one collects
  zero parametrized tests and passes cleanly rather than trying to
  build data the dataset can't produce.
- `src/MEDS_DEV/datasets/__main__.py` raises a clear error when called
  with `demo=True` against a dataset that doesn't declare a
  build_demo command (instead of the previous KeyError).

No dataset.yaml files change here — those changes ship with the sister
per-dataset PRs that depend on this one.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mmcdermott mmcdermott force-pushed the feat/dataset-demo-availability branch from 17019a0 to a2d4038 Compare May 13, 2026 17:20
@mmcdermott mmcdermott force-pushed the feat/add-dataset-ehrshot branch from 9a2d6d5 to c5d0a36 Compare May 13, 2026 17:21
@mmcdermott mmcdermott changed the base branch from feat/dataset-demo-availability to dev May 13, 2026 17:43
@mmcdermott mmcdermott force-pushed the feat/add-dataset-ehrshot branch from c5d0a36 to 136ed8e Compare May 13, 2026 17:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant