Skip to content

Commit 2402f69

Browse files
WOODchen7woodchenwu
andauthored
add qwen2.5 code 14B dynamic config (#56)
Co-authored-by: woodchenwu <woodchenwu@tencent.com>
1 parent 73b9fec commit 2402f69

2 files changed

Lines changed: 27 additions & 1 deletion

File tree

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
# Global configuration of pipeline
2+
global:
3+
save_path: ./output
4+
5+
# Simplified Configuration for LLM compression
6+
model:
7+
name: Qwen
8+
model_path: Qwen/Qwen2.5-14B-Instruct
9+
trust_remote_code: true
10+
low_cpu_mem_usage: true
11+
use_cache: false
12+
torch_dtype: auto
13+
device_map: auto
14+
15+
# Compression configuration
16+
compression:
17+
name: PTQ
18+
quantization:
19+
name: fp8_dynamic # Supported: fp8_static, fp8_dynamic, int4_awq, int4_gptq
20+
bits: 8 # Quantization bits (4/8)
21+
quant_method:
22+
weight: "per-tensor"
23+
activation: "per-tensor"
24+
ignore_layers: # Skip quantization for these layers
25+
- "lm_head"
26+
- "model.embed_tokens"

configs/qwen2_5/fp8_static/qwen2_5-14b_instruct_fp8_static.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ global:
55
# Simplified Configuration for LLM compression
66
model:
77
name: Qwen
8-
model_path: Qwen/Qwen2.5-Coder-14B-Instruct
8+
model_path: Qwen/Qwen2.5-14B-Instruct
99
trust_remote_code: true
1010
low_cpu_mem_usage: true
1111
use_cache: false

0 commit comments

Comments
 (0)