Register MEDS-EIC-AR as a model (micro/small/medium/large) by mmcdermott · Pull Request #313 · Medical-Event-Data-Standard/MEDS-DEV

mmcdermott · 2026-05-13T20:27:14Z

Summary

Adds MEDS-EIC-AR (PyPI MEDS-EIC-AR==0.3.1) as a registered model. Resolves #302.

Registered as four capacity variants — meds_eic_ar/{micro,small,medium,large} — each selecting an upstream lightning_module=<size> preset, matching the meds_tab/tiny capacity-variant pattern. Shared README + refs.bib at the parent meds_eic_ar/ level.

Variant	Layers	Heads × dim	Hidden size
`meds_eic_ar/micro`	4	4 × 64	256
`meds_eic_ar/small`	8	8 × 64	512
`meds_eic_ar/medium`	12	12 × 64	768
`meds_eic_ar/large`	24	16 × 64	1024

The whole pipeline is dataset-agnostic — nothing is wired per-dataset.

What's wired

unsupervised: train — MEICAR_process_data (tokenize + tensorize) then MEICAR_pretrain. Uses the upstream demo configs under {demo}=True; selects the variant's lightning_module=<size> capacity preset in full mode.
supervised: predict (step 1 of 2) — MEICAR_generate_trajectories rolls future patient timelines forward from each task sample's prediction time, consuming the pretrained model + tensorized cohort from unsupervised: train and the task labels dir.

What's not wired — zero-shot prediction resolution

Zero-shot inference is two steps upstream: generate trajectories (wired, above) then resolve trajectories into a predictions.parquet. The resolution step is blocked on two gaps, both now filed:

MEDS-DEV gap — meds-trajectory-evaluation's ZSACES_label CLI needs the ACES task criteria + dataset predicates files, and MEDS-DEV doesn't pass those to model supervised commands (the available template vars are dataset_dir, labels_dir, model_initialization_dir, output_dir, model_dir, split, demo). The command literally can't be written today — str.format() would KeyError on an unknown placeholder. Filed as Expose ACES task criteria + dataset predicates to model supervised commands #314.
Upstream gap — even with ZSACES_label runnable, it emits per-trajectory boolean labels (valid/determinable/label), not an aggregated empirical-probability predictions.parquet. No CLI aggregates across the N sampled trajectories per task sample into a meds-evaluation-compatible file. Filed upstream as mmcdermott/MEDS_trajectory_evaluation#42.

Once both land, each variant's supervised: predict gains the ZSACES_label + aggregation steps and produces {output_dir}/predictions.parquet. Until then the supervised lane stops after trajectory generation — see the parent README for the full writeup.

Open question (per #302) — demo-mode capacity

In {demo}=True mode all four variants currently run the upstream demo capacity (_demo_pretrain), not their own architecture — a CI-cost tradeoff. We can override lightning_module/model=<size> on top of the demo config so each variant exercises its real architecture under demo; flagging it rather than deciding unilaterally.

Test plan

Pre-commit passes.
Full fast test suite passes locally (53 tests; all four variants register and pass test_registry_validation.py).
model-lane (meds_eic_ar/*): unsupervised: train should run the demo pre-train end-to-end; supervised: predict runs trajectory generation. The lane is expected to stop short of a packaged result until the two gaps above close (no predictions.parquet yet).

Refs

Resolves Add MEDS-EIC-AR as a registered model #302.
Filed by this PR: Expose ACES task criteria + dataset predicates to model supervised commands #314 (MEDS-DEV side), mmcdermott/MEDS_trajectory_evaluation#42 (upstream).
Related: Separate autoregressive generation from downstream inference for AR zero-shot models #304 (separating AR generation from inference as a lane concern).

🤖 Generated with Claude Code

MEDS-EIC-AR ("Everything is Code" autoregressive) is a MEDS-native transformer LM with a fully dataset-agnostic pipeline: tokenization, tensorization, and autoregressive pre-training all run on any MEDS dataset without per-dataset wiring. Registered as four capacity variants — `meds_eic_ar/{micro,small, medium,large}` — each selecting an upstream `lightning_module=<size>` preset, matching the `meds_tab/tiny` capacity-variant pattern. Shared README + refs.bib at the parent level. Pinned to `MEDS-EIC-AR==0.3.1`. Wired: - `unsupervised: train` — `MEICAR_process_data` (tokenize + tensorize) then `MEICAR_pretrain`, switching to the upstream demo configs under `{demo}=True` and selecting the variant's capacity preset otherwise. - `supervised: predict` — `MEICAR_generate_trajectories` rolls future patient timelines forward from each task sample's prediction time. Not yet wired (zero-shot prediction resolution), blocked on two gaps, both filed and linked from the parent README: - MEDS-DEV does not expose the ACES task criteria / dataset predicates files to model `supervised` commands, which meds-trajectory-evaluation's `ZSACES_label` needs (#314). - meds-trajectory-evaluation has no CLI to aggregate per-trajectory `ZSACES_label` output into an empirical-probability predictions.parquet in meds-evaluation format (mmcdermott/MEDS_trajectory_evaluation#42). Until both land, the supervised lane stops after trajectory generation and won't produce a packaged result, so the model-lane integration test is expected to stop short of evaluation. Resolves #302. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

This was referenced May 14, 2026

Expose ACES task criteria + dataset predicates to model supervised commands #314

Open

CLI to aggregate per-trajectory ZSACES labels into an empirical-probability predictions file mmcdermott/MEDS_trajectory_evaluation#42

Open

mmcdermott force-pushed the feat/add-model-meds-eic-ar branch from f48a40a to 065399b Compare May 14, 2026 18:03

mmcdermott changed the title ~~Register MEDS-EIC-AR as a model~~ Register MEDS-EIC-AR as a model (micro/small/medium/large) May 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Register MEDS-EIC-AR as a model (micro/small/medium/large)#313

Register MEDS-EIC-AR as a model (micro/small/medium/large)#313
mmcdermott wants to merge 1 commit into
devfrom
feat/add-model-meds-eic-ar

mmcdermott commented May 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mmcdermott commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's wired

What's not wired — zero-shot prediction resolution

Open question (per #302) — demo-mode capacity

Test plan

Refs

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mmcdermott commented May 13, 2026 •

edited

Loading