|
| 1 | +# DWDP Reproduction |
| 2 | + |
| 3 | +This directory provides a thin reproduction layer on top of |
| 4 | +`examples/disaggregated/slurm/benchmark/submit_dwdp.py`. |
| 5 | +It does not modify that launcher. Instead, it combines: |
| 6 | + |
| 7 | +- `env.yaml`: cluster, container, model, and dataset inputs provided by the user |
| 8 | +- `dwdp_reproduce.yaml`: the DWDP reproduction matrix |
| 9 | +- `reproduce.py`: the script that merges both files, generates full benchmark |
| 10 | + configs, and forwards them to `submit_dwdp.py` |
| 11 | + |
| 12 | +## Files |
| 13 | + |
| 14 | +- `env.yaml` |
| 15 | + Holds environment-specific inputs such as Slurm settings, container image, |
| 16 | + mount list, model path, and dataset mapping. |
| 17 | +- `dwdp_reproduce.yaml` |
| 18 | + Holds only experiment parameters such as `isl`, `osl`, `ctx_tp`, `gen_tp`, |
| 19 | + `batch`, `prefetch`, and DWDP settings. |
| 20 | +- `generated/` |
| 21 | + Output directory for the generated full configs that are passed to |
| 22 | + `submit_dwdp.py`. |
| 23 | + |
| 24 | +## How It Works |
| 25 | + |
| 26 | +`reproduce.py` reads `env.yaml` and `dwdp_reproduce.yaml`, generates one full |
| 27 | +benchmark config per experiment, writes the config by default to `generated/`, then |
| 28 | +invokes: |
| 29 | + |
| 30 | +```bash |
| 31 | +python examples/disaggregated/slurm/benchmark/submit_dwdp.py -c <generated_config> |
| 32 | +``` |
| 33 | + |
| 34 | +## Configure `env.yaml` |
| 35 | + |
| 36 | +Update these sections before running: |
| 37 | + |
| 38 | +- `slurm` |
| 39 | + Set `partition`, `account`, `time`, and any cluster-specific `extra_args`. |
| 40 | +- `hardware` |
| 41 | + Set `gpus_per_node` for your cluster. |
| 42 | +- `environment` |
| 43 | + Set `container_image`, `container_mount`, `model_path`, and usually |
| 44 | + `trtllm_repo`. |
| 45 | + Leave `log_dir` unset unless you intentionally want a fixed log location. |
| 46 | + When `log_dir` is omitted, `submit_dwdp.py` creates a unique per-run log |
| 47 | + directory automatically. |
| 48 | +- `datasets` |
| 49 | + Map short dataset keys to concrete dataset files. |
| 50 | + |
| 51 | +`environment.work_dir` is optional. If omitted, `reproduce.py` automatically |
| 52 | +points it to `examples/disaggregated/slurm/benchmark`, which is what |
| 53 | +`submit_dwdp.py` expects for locating the benchmark shell scripts. |
| 54 | + |
| 55 | +## Configure `dwdp_reproduce.yaml` |
| 56 | + |
| 57 | +This file can define both context-only and end-to-end reproduction experiments. |
| 58 | + |
| 59 | +The reproduction matrix is split into: |
| 60 | + |
| 61 | +- `experiment_defaults` |
| 62 | + Common fields shared across many experiments. |
| 63 | +- `experiments` |
| 64 | + One entry per benchmark case. |
| 65 | + |
| 66 | +Each experiment may reference datasets in two ways: |
| 67 | + |
| 68 | +- `dataset_key`: resolves through `env.yaml -> datasets` |
| 69 | +- `dataset_file`: directly provides the full dataset path |
| 70 | + |
| 71 | +`dataset_key` is the preferred path when several experiments share the same |
| 72 | +dataset file. |
| 73 | + |
| 74 | +## Usage |
| 75 | + |
| 76 | +Install required Python dependency first: |
| 77 | + |
| 78 | +```bash |
| 79 | +python3 -m pip install pyyaml |
| 80 | +``` |
| 81 | + |
| 82 | + |
| 83 | +```bash |
| 84 | +python3 examples/dwdp/reproduce.py \ |
| 85 | + --env-config /path/to/env.yaml \ |
| 86 | + --reproduce-config /path/to/dwdp_reproduce.yaml \ |
| 87 | + --output-dir /path/to/generated |
| 88 | +``` |
| 89 | + |
| 90 | +Before running, update `dwdp_reproduce.yaml` as needed so it includes the |
| 91 | +reproduction experiments you want to launch. |
| 92 | + |
| 93 | +## Generated Configs |
| 94 | + |
| 95 | +Generated configs are written by default to `examples/dwdp/generated/`. |
| 96 | +The filenames include both the experiment name and the generated benchmark |
| 97 | +identifier so they can be inspected or reused directly with |
| 98 | +`submit_dwdp.py`. |
| 99 | + |
| 100 | +> **IMPORTANT:** Leave `environment.log_dir` unset by default. Logs are written |
| 101 | +> under `examples/disaggregated/slurm/benchmark/logs/`. |
0 commit comments