|
1 | 1 | # Using the ModelOpt Launcher for PTQ |
2 | 2 |
|
3 | | -The launcher (`tools/launcher/`) handles SLURM, Docker, and local execution. Read `tools/launcher/CLAUDE.md` for full documentation. This guide covers PTQ-specific usage. |
| 3 | +The launcher (`tools/launcher/`) handles SLURM and Docker execution. Read `tools/launcher/CLAUDE.md` for full docs. |
4 | 4 |
|
5 | 5 | ## Quick Start |
6 | 6 |
|
7 | 7 | ```bash |
8 | 8 | cd tools/launcher |
9 | | -uv run launch.py --yaml <config.yaml> --yes |
| 9 | +uv run launch.py --yaml <config.yaml> --yes # SLURM (SLURM_HOST set) |
| 10 | +uv run launch.py --yaml <config.yaml> hf_local=<cache> --yes # Local Docker |
10 | 11 | ``` |
11 | 12 |
|
12 | | -## Writing a PTQ Config |
| 13 | +## HF Transformers PTQ Config |
13 | 14 |
|
14 | | -### For supported models (typed task) |
15 | | - |
16 | | -Use the `MegatronLMQuantizeTask` for clean configs: |
| 15 | +The launcher provides `common/hf_ptq/hf_ptq.sh` which wraps `hf_ptq.py`. Configure via environment variables: |
17 | 16 |
|
18 | 17 | ```yaml |
19 | 18 | job_name: <Model>_<Format> |
20 | 19 | pipeline: |
21 | 20 | task_0: |
22 | | - _target_: common.megatron_lm.quantize.task.MegatronLMQuantizeTask |
23 | | - config: |
24 | | - model: <HuggingFace model ID> |
25 | | - quant_cfg: <QUANT_CFG name, e.g., NVFP4_DEFAULT_CFG> |
26 | | - tp: <tensor parallelism> |
27 | | - calib_dataset: abisee/cnn_dailymail |
28 | | - calib_size: 512 |
29 | | - hf_local: /hf-local/ |
| 21 | + script: common/hf_ptq/hf_ptq.sh |
| 22 | + environment: |
| 23 | + - HF_MODEL: <HuggingFace model ID, e.g. Qwen/Qwen3-0.6B> |
| 24 | + - QFORMAT: <format, e.g. nvfp4, fp8, int4_awq> |
| 25 | + - CALIB_SIZE: "512" |
| 26 | + - EXPORT_PATH: /scratchspace/exported_model |
30 | 27 | slurm_config: |
31 | 28 | _factory_: "slurm_factory" |
32 | 29 | nodes: 1 |
33 | | - ntasks_per_node: <tp> |
34 | | - gpus_per_node: <tp> |
| 30 | + ntasks_per_node: 1 |
| 31 | + gpus_per_node: <num_gpus> |
35 | 32 | ``` |
36 | 33 |
|
37 | | -Available `quant_cfg` values — check `modelopt/torch/quantization/config.py` for the full list. |
38 | | - |
39 | | -### For custom scripts (raw SandboxTask) |
40 | | - |
41 | | -When using a custom PTQ script (e.g., unsupported models): |
| 34 | +Extra `hf_ptq.py` flags can be passed via `args`: |
42 | 35 |
|
43 | 36 | ```yaml |
44 | | -job_name: <Model>_custom_ptq |
45 | | -pipeline: |
46 | | - task_0: |
47 | | - script: <path_to_your_script.sh> |
48 | 37 | args: |
49 | | - - --model <model_path> |
50 | | - - --output <output_path> |
51 | | - environment: |
52 | | - - HF_TOKEN: <token> |
53 | | - - CUDA_VISIBLE_DEVICES: "0" |
54 | | - slurm_config: |
55 | | - _factory_: "slurm_factory" |
56 | | - nodes: 1 |
57 | | - ntasks_per_node: 1 |
58 | | - gpus_per_node: 1 |
| 38 | + - --batch_size 2 |
| 39 | + - --trust_remote_code |
59 | 40 | ``` |
60 | 41 |
|
61 | | -Place custom scripts in `tools/launcher/common/` so the packager includes them. |
62 | | - |
63 | | -## SLURM vs Local |
64 | | - |
65 | | -The launcher auto-detects based on environment variables: |
66 | | - |
67 | | -| Variable | Purpose | Example | |
68 | | -|----------|---------|---------| |
69 | | -| `SLURM_HOST` | Login node for SSH submission | `cluster-login.example.com` | |
70 | | -| `SLURM_ACCOUNT` | SLURM account | `my_account` | |
71 | | -| `SLURM_PARTITION` | SLURM partition | `batch` | |
72 | | -| `HF_TOKEN` | HuggingFace token for gated models | `hf_abc...` | |
| 42 | +## Output Location |
73 | 43 |
|
74 | | -If `SLURM_HOST` is set → SLURM execution. Otherwise → local Docker. |
| 44 | +`EXPORT_PATH` controls the path inside the container (default: `/scratchspace/exported_model`). The launcher mounts `/scratchspace` to a host directory automatically — you cannot change the host path. |
75 | 45 |
|
76 | | -For local Docker, pass `hf_local=` to specify the model cache: |
| 46 | +To find the checkpoint on the host after completion: |
77 | 47 |
|
78 | 48 | ```bash |
79 | | -uv run launch.py --yaml <config> hf_local=/mnt/hf-local --yes |
| 49 | +find tools/launcher/local_experiments -name "config.json" -path "*/exported_model/*" 2>/dev/null |
80 | 50 | ``` |
81 | 51 |
|
82 | | -## GPU Sizing Guide |
| 52 | +## SLURM vs Local Docker |
83 | 53 |
|
84 | | -| Model size | TP | GPUs | Nodes | |
85 | | -|------------|-----|------|-------| |
86 | | -| < 15B | 1 | 1 | 1 | |
87 | | -| 15B-40B | 2-4 | 2-4 | 1 | |
88 | | -| 40B-100B | 4-8 | 4-8 | 1 | |
89 | | -| 100B+ | 8+ | 8+ | 2+ (use FSDP2 or multi-node) | |
| 54 | +| Condition | Mode | Invocation | |
| 55 | +| --- | --- | --- | |
| 56 | +| `SLURM_HOST` env var set | SLURM | `uv run launch.py --yaml <cfg> --yes` | |
| 57 | +| `hf_local=` passed | Local Docker | `uv run launch.py --yaml <cfg> hf_local=<cache> --yes` | |
90 | 58 |
|
91 | | -## Dry Run and Debug |
| 59 | +For SLURM, also set `SLURM_ACCOUNT` and optionally `SLURM_HF_LOCAL`. |
92 | 60 |
|
93 | | -Preview what the launcher will do without running: |
| 61 | +## Known Issues |
94 | 62 |
|
95 | | -```bash |
96 | | -uv run launch.py --yaml <config> --dryrun --yes -v |
97 | | -``` |
| 63 | +- **UID mapping in Docker**: May cause `getpwuid` failures. Add `USER=user` and `LOGNAME=user` to environment. |
| 64 | +- **Megatron-LM submodule**: Only needed for `MegatronLMQuantizeTask` (Megatron models). HF PTQ via `common/hf_ptq/hf_ptq.sh` does not require it. |
98 | 65 |
|
99 | | -Export resolved config: |
| 66 | +## Dry Run |
100 | 67 |
|
101 | 68 | ```bash |
102 | | -uv run launch.py --yaml <config> --to-yaml resolved.yaml |
| 69 | +uv run launch.py --yaml <config> --dryrun --yes -v |
103 | 70 | ``` |
104 | 71 |
|
105 | | -## Example Configs |
106 | | - |
107 | | -Check `tools/launcher/examples/` for working configs: |
| 72 | +## Examples |
108 | 73 |
|
109 | 74 | ```bash |
110 | 75 | ls tools/launcher/examples/ |
111 | 76 | ``` |
112 | 77 |
|
113 | | -Copy and modify the closest match for your model. |
| 78 | +Copy and modify the closest match. |
0 commit comments