Skip to content

Commit 9fd6c8d

Browse files
[FIX] weight_only_looper did not support multi-GPU quantization. (#2915)
* nested_move_to() now support dict Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * Revert erroneous modification to thread number. Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * WeightOnlyLooper supports multi-GPU and multi-threading acceleration for quantization Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> * Add progress bar to the `submodule finalize` Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> --------- Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
1 parent 7b27707 commit 9fd6c8d

5 files changed

Lines changed: 804 additions & 31 deletions

File tree

gptqmodel/__init__.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -176,12 +176,12 @@ def _build_device_thread_pool():
176176
"cpu": WarmupTask(run_torch_linalg_warmup, scope=WarmUpCtx.THREAD_AND_DEVICE),
177177
},
178178
workers={
179-
"cuda:per": 1,
179+
"cuda:per": 4,
180180
"xpu:per": 1,
181181
"npu:per": 1,
182-
"mps": 1,
183-
"cpu": 1, # count + 1, fixed pool size > 1 check when count=3
184-
"model_loader:cpu": 1,
182+
"mps": 8,
183+
"cpu": min(12, max(1, (os.cpu_count() or 1) + 1 // 2)), # count + 1, fixed pool size > 1 check when count=3
184+
"model_loader:cpu": 2,
185185
},
186186
empty_cache_every_n=512,
187187
)

0 commit comments

Comments
 (0)