Accepted at the 1st Time Series in the Age of Large Models (TSALM) Workshop, ICLR 2026. [Paper]
Rafi Al Attrach, Rajna Fani, David Restrepo, Yugang Jia, Leo Anthony Celi, Peter Schüffler Massachusetts Institute of Technology | Technical University of Munich | CentraleSupelec
Transformer models for clinical time series face a deployment bottleneck: patient histories can span thousands of irregularly spaced events, but inference hardware imposes strict memory budgets. We study a decoupling strategy in which a pretrained foundation model compresses each patient's historical events into a fixed-size cached summary offline, and a lightweight model processes only a short recent window conditioned on that summary at inference time. Through 252 experiments on in-ICU mortality prediction (MIMIC-IV v3.1) we characterize when this is worthwhile. Cached summaries yield a 6.5% relative AUROC gain at N=8 recent events (p < 0.001), with a clear diminishing-returns pattern as N grows. FiLM modulation outperforms token injection (p < 0.001); recent-history summaries outperform distant-history summaries (p < 0.01).
Python 3.11 is required.
git clone https://github.com/rafiattrach/cached-summaries-ehr-inference.git
cd cached-summaries-ehr-inference
bash setup.sh
source venv/bin/activate
pip install meds-torchVerify the installation:
python smoke_test.pyThis runs three forward passes with random tensors (no MIMIC data required) and should print all three as PASS.
MIMIC-IV requires credentialed access via PhysioNet. Convert the raw dataset to MEDS format using the MIMIC-IV MEDS pipeline, then tensorize with meds-torch:
export MEDS_DIR=/path/to/mimic-iv/meds
export MODEL_DIR=/path/to/triplet_tensors
export MEDS_COHORT_DIR=/path/to/meds_cohort
export N_PARALLEL_WORKERS=8
meds-torch-ETL \
pipeline_config_fp=experiments/configs/triplet_config.yaml \
input_dir=$MEDS_DIR \
output_dir=$MODEL_DIR \
stage_runner_fp=experiments/configs/joblib_runner.yaml \
hydra.launcher.n_jobs=$N_PARALLEL_WORKERSThe environment variables below are assumed to be set for all steps. Adjust paths to match your storage layout.
export MEDS_DIR=/path/to/mimic-iv/meds
export MODEL_DIR=/path/to/triplet_tensors
export MEDS_COHORT_DIR=/path/to/meds_cohort
export SUMMARY_CACHE_DIR=$MODEL_DIR/hybrid_summary_cache
export TASK_NAME=mortality/in_icu/first_24h
export GPU=0Step 1: extract task cohort
aces-cli --multirun hydra/launcher=joblib \
data=sharded data.standard=meds \
data.root="$MEDS_DIR/data" \
"data.shard=$(expand_shards $MEDS_DIR/data)" \
cohort_dir="$MEDS_COHORT_DIR/tasks/" \
cohort_name="$TASK_NAME" \
config_path="experiments/configs/tasks/mortality/in_icu/first_24h.yaml"Step 2: precompute summary vectors (run once, reused across all training runs)
bash scripts/precompute_summaries.sh \
--data_dir $MODEL_DIR \
--output_dir $SUMMARY_CACHE_DIR \
--summary_size 256 \
--variant recent \
--gpu $GPUStep 3: train individual configurations
bash scripts/run_experiment.sh -m baseline -n 64 -g $GPU \
-r $(pwd) -o $MODEL_DIR/outputs
bash scripts/run_experiment.sh -m hybrid -a film -v recent -n 64 -s 256 -g $GPU \
-r $(pwd) -o $MODEL_DIR/outputs
bash scripts/run_experiment.sh -m oracle -n 64 -s 256 -g $GPU \
-r $(pwd) -o $MODEL_DIR/outputsStep 4: full grid search (reproduces all 252 runs)
bash scripts/sweep_context_budget.sh \
--gpu $GPU \
--epochs 10 \
--out_dir $MODEL_DIR/outputs/sweep \
--repo_dir $(pwd)Ablations
bash scripts/ablate_integration_method.sh \
--gpu $GPU --out_dir $MODEL_DIR/outputs/ablation_integration
bash scripts/ablate_summary_source.sh \
--gpu $GPU --out_dir $MODEL_DIR/outputs/ablation_source| Condition | AUROC gain vs baseline | p-value |
|---|---|---|
| N=8, FiLM, recent | +6.5% | < 0.001 |
| N=32, FiLM, recent | +2.1% | < 0.001 |
| N=128, FiLM, recent | +0.4% | n.s. |
| N=256, FiLM, recent | +0.1% | n.s. |
| FiLM vs token (all N) | FiLM consistently better | < 0.001 |
| Recent vs distant summary | Recent consistently better | < 0.01 |
Memory at inference scales with N only, independent of total patient history length.
@misc{attrach2026cached,
title = {Cached Foundation Model Summaries for Memory-Efficient Clinical Time Series Inference},
author = {Al Attrach, Rafi and Fani, Rajna and Restrepo, David and Jia, Yugang
and Celi, Leo Anthony and Sch\"{u}ffler, Peter},
year = {2026},
note = {1st ICLR Workshop on Time Series in the Age of Large Models}
}MIMIC-IV is available via PhysioNet with credentialed access. This codebase extends the meds-torch framework.

