|
| 1 | +# CI Tests |
| 2 | + |
| 3 | +Configuration, scripts, and utilities for AutoModel's CI recipe validation pipeline. |
| 4 | + |
| 5 | +## Directory Structure |
| 6 | + |
| 7 | +``` |
| 8 | +ci_tests/ |
| 9 | + configs/{test_folder}/ |
| 10 | + nightly_recipes.yml # Recipes included in nightly scope |
| 11 | + convergence_recipes.yml # Recipes included in convergence scope (2x time) |
| 12 | + override_recipes.yml # Exemptions, known issues |
| 13 | + scripts/ |
| 14 | + finetune_launcher.sh # Finetune + checkpoint robustness test runner |
| 15 | + vllm_launcher.sh # vLLM deployment test runner |
| 16 | + golden_values/{test_folder}/ |
| 17 | + {model}/{config}_{gpu}.jsonl # Reference loss curves |
| 18 | + utils/ |
| 19 | + generate_ci_tests.py # Generates CI pipeline YAML from recipe configs |
| 20 | +``` |
| 21 | + |
| 22 | +## Pipeline Generation |
| 23 | + |
| 24 | +`generate_ci_tests.py` reads recipe lists from `configs/{test_folder}/` for the given scope, reads each recipe's `ci:` section from the YAML under `examples/`, and outputs a CI pipeline YAML with one job per recipe. |
| 25 | + |
| 26 | +**Scopes:** |
| 27 | +- **nightly** -- Recipes listed in `nightly_recipes.yml` |
| 28 | +- **convergence** -- Recipes in `convergence_recipes.yml`, time automatically doubled |
| 29 | +- **release** -- All recipe YAMLs found under `examples/{test_folder}/` |
| 30 | + |
| 31 | +**Stage assignment** is based on recipe type and configuration: |
| 32 | + |
| 33 | +| Stage | Criteria | |
| 34 | +|-------|----------| |
| 35 | +| `sft` / `peft` | No `checkpoint_robustness` | |
| 36 | +| `sft_ckpt_robustness` / `peft_ckpt_robustness` | Has `checkpoint_robustness` | |
| 37 | +| `sft_vllm_deploy` / `peft_vllm_deploy` | Has `vllm_deploy: true` | |
| 38 | +| `benchmark` | Filename contains `benchmark` | |
| 39 | + |
| 40 | +SFT vs PEFT is determined by whether `peft` appears in the recipe filename. |
| 41 | + |
| 42 | +## Recipe CI Configuration |
| 43 | + |
| 44 | +Each recipe YAML under `examples/` has an optional `ci:` section: |
| 45 | + |
| 46 | +```yaml |
| 47 | +ci: |
| 48 | + recipe_owner: username # Required. Maintainer's handle |
| 49 | + time: "00:25:00" # Required. SLURM wall time (HH:MM:SS) |
| 50 | + nodes: 2 # Optional. SLURM node count (default: 1) |
| 51 | + node_multiplier: true # Optional. Dynamic node scaling |
| 52 | + local_batch_size: 2 # Optional. Override batch size for CI |
| 53 | + vllm_deploy: true # Optional. Enable vLLM deployment test |
| 54 | + checkpoint_robustness: # Optional. Enable robustness testing |
| 55 | + hf_kl_threshold: 1e-3 |
| 56 | + tokenizer_name: org/model |
| 57 | + no_check_resume: true # Skip phase 6 (training resumption) |
| 58 | + # See checkpoint robustness section for all options |
| 59 | +``` |
| 60 | + |
| 61 | +## Checkpoint Robustness |
| 62 | + |
| 63 | +When `checkpoint_robustness` is present, the robustness test runs after the finetune under the same SLURM allocation. It trains for 5 steps, saves a checkpoint, then validates through: |
| 64 | + |
| 65 | +1. **Reference logits** -- Capture logits before teardown |
| 66 | +2. **AutoModel reload** -- Reload from consolidated checkpoint, verify KL = 0 |
| 67 | +3. **HF reload** -- Load into vanilla `transformers`/`peft`, verify KL below `hf_kl_threshold` |
| 68 | +4. **Cross-TP** (optional) -- Reload with different `tp_size` |
| 69 | +5. **Training resumption** (on by default) -- Baseline + resumed run, verify loss continuity |
| 70 | + |
| 71 | +Phase 5 is the most expensive (two additional training passes). Use `no_check_resume: true` to skip it. |
| 72 | + |
| 73 | +`ci.time` must cover both finetune and robustness. Estimated overhead: |
| 74 | +- ~30% with `no_check_resume: true` |
| 75 | +- ~50-60% with resumption check (default) |
| 76 | + |
| 77 | +## How To |
| 78 | + |
| 79 | +### Add a New Recipe to Nightly |
| 80 | + |
| 81 | +1. Create recipe YAML under `examples/{test_folder}/{model_family}/` |
| 82 | +2. Add `ci:` section with `recipe_owner` and `time` |
| 83 | +3. Add the path to `configs/{test_folder}/nightly_recipes.yml` |
| 84 | + |
| 85 | +### Enable Checkpoint Robustness |
| 86 | + |
| 87 | +1. Add `checkpoint_robustness:` under `ci:` with at least `hf_kl_threshold` and `tokenizer_name` |
| 88 | +2. Increase `ci.time` per the guidelines below |
| 89 | +3. For large models, consider `no_check_resume: true` |
| 90 | + |
| 91 | +### Enable vLLM Deploy |
| 92 | + |
| 93 | +1. Add `vllm_deploy: true` under `ci:` |
| 94 | +2. Robustness must also be enabled (vLLM test loads from the robustness checkpoint) |
| 95 | + |
| 96 | +### Add a New Test Folder |
| 97 | + |
| 98 | +1. Create `examples/{new_folder}/` with recipe YAMLs |
| 99 | +2. Create `configs/{new_folder}/` with `nightly_recipes.yml`, `convergence_recipes.yml`, `override_recipes.yml` |
| 100 | +3. Create `golden_values/{new_folder}/` |
| 101 | +4. Add a CI job template for the new folder in the CI template file |
| 102 | +5. Verify with `generate_ci_tests.py --test-folder {new_folder} --scope nightly` |
| 103 | + |
| 104 | +### Exempt a Recipe |
| 105 | + |
| 106 | +Edit `configs/{test_folder}/override_recipes.yml`: |
| 107 | + |
| 108 | +```yaml |
| 109 | +exempt_models: |
| 110 | + - model_family # Skips all recipes under this folder |
| 111 | + |
| 112 | +exempt_configs: |
| 113 | + config_stem: |
| 114 | + reason: "Description, PIC: @owner, issue#" |
| 115 | + |
| 116 | +known_issue: |
| 117 | + - config_stem # allow_failure instead of blocking |
| 118 | +``` |
| 119 | +
|
| 120 | +## Time Allocation Guidelines |
| 121 | +
|
| 122 | +`ci.time` covers the entire SLURM job: finetune, robustness (if enabled), model downloads, setup, and teardown. |
| 123 | + |
| 124 | +| Model Size | Finetune Only | Robustness (`no_check_resume`) | Robustness (full) | |
| 125 | +|------------|---------------|--------------------------------|-------------------| |
| 126 | +| < 2B | 10 min | 15 min | 15 min | |
| 127 | +| 2-5B | 12 min | 15 min | 20 min | |
| 128 | +| 5-10B | 18 min | 25 min | 25-30 min | |
| 129 | +| 10-20B | 22 min | 30 min | 35 min | |
| 130 | +| 20-50B | 35 min | 45 min | 45 min | |
| 131 | +| 50B+ | 50 min | 60 min | 60 min | |
| 132 | + |
| 133 | +MoE models, multi-node jobs, and convergence scope (auto 2x) may need additional time. vLLM deploy runs as a separate job and does not consume finetune time. |
0 commit comments