Skip to content

Commit 8f447ca

Browse files
committed
fix dsv32 tokenizer for transformers 5.8.0
1 parent 466651c commit 8f447ca

3 files changed

Lines changed: 37 additions & 7 deletions

File tree

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
"""Make HuggingFace transformers recognize the ``deepseek_v32`` model_type.
2+
3+
DeepSeek-V3.2 ships ``config.json`` with ``model_type="deepseek_v32"``, which
4+
transformers (>=5.x) does not know. ``AutoTokenizer``/``AutoConfig`` then fall
5+
back to the base ``PreTrainedConfig`` and crash during RoPE standardization
6+
(``'PreTrainedConfig' object has no attribute 'max_position_embeddings'``).
7+
8+
V3.2 is architecturally a V3 variant, so we alias its config to
9+
``DeepseekV3Config``. lightllm uses its own model implementation and reads
10+
``config.json`` directly; this registration only fixes loading the HF tokenizer
11+
through ``AutoTokenizer`` (see ``lightllm/server/tokenizer.py``).
12+
"""
13+
from transformers import AutoConfig
14+
15+
try:
16+
from transformers.models.deepseek_v3.configuration_deepseek_v3 import DeepseekV3Config
17+
18+
class DeepseekV32Config(DeepseekV3Config):
19+
model_type = "deepseek_v32"
20+
21+
AutoConfig.register("deepseek_v32", DeepseekV32Config, exist_ok=True)
22+
except Exception:
23+
# Older transformers without deepseek_v3, or a build that already
24+
# supports deepseek_v32 natively. Nothing to do in either case.
25+
pass

lightllm/server/tokenizer.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@
3333
from ..models.gemma3.model import Gemma3Tokenizer
3434
from ..models.gemma4.tokenizer import Gemma4Tokenizer
3535
from ..models.qwen3_omni_moe_thinker.model import QWen3OmniTokenizer
36+
from ..models import deepseek3_2 # noqa: F401 # registers the deepseek_v32 config with transformers
3637

3738
# A fast LLaMA tokenizer with the pre-processed `tokenizer.json` file.
3839
_FAST_LLAMA_TOKENIZER = "hf-internal-testing/llama-tokenizer"

skills/test_model/deepseekv32-ep/SKILL.md

Lines changed: 11 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,10 @@ description: >-
44
Runs LightLLM DeepSeek-V3.2 EP MoE gsm8k: api_server with --tp 8 --dp 8 --enable_ep_moe,
55
tool_call_parser deepseekv32, reasoning_parser deepseek-v3, graph_max_batch_size 32,
66
mem_fraction 0.8, LOADWORKER 14, port 8000 aligned with lm_eval base_url. Requires a
7-
dedicated log directory, api_server and eval logs, summary.txt consolidated report,
8-
tokenizer aligned with MODEL_DIR. Distinct from R1 MTP/Base flows. Use for V3.2 EP MoE
9-
gsm8k accuracy on LightLLM.
7+
dedicated log directory, api_server and eval logs, summary.txt consolidated report.
8+
lm_eval uses tokenizer_backend=null (server-side tokenization) because local
9+
transformers does not recognize model_type deepseek_v32. Distinct from R1 MTP/Base
10+
flows. Use for V3.2 EP MoE gsm8k accuracy on LightLLM.
1011
---
1112

1213
# DeepSeek-V3.2 **EP**`--tp 8``--dp 8``--enable_ep_moe`)本地 GSM8K 评测
@@ -64,7 +65,7 @@ nohup python -m lightllm.server.api_server \
6465
--model_dir "${MODEL_DIR}" --tp 8 \
6566
--graph_max_batch_size 32 \
6667
--tool_call_parser deepseekv32 \
67-
--mem_fraction 0.6 \
68+
--mem_fraction 0.8 \
6869
--reasoning_parser deepseek-v3 \
6970
--dp 8 --enable_ep_moe \
7071
--port 8000 \
@@ -73,19 +74,22 @@ nohup python -m lightllm.server.api_server \
7374

7475
### 评测命令(服务就绪后执行一次)
7576

76-
服务就绪后执行(本地回环走代理时用 `no_proxy` / `NO_PROXY` 排除本机)。**`model_args``tokenizer` 必须与本次 server 的 `--model_dir`(即 **`${MODEL_DIR}`**)为同一字符串路径****`base_url` 中的端口须为 `8000`,与 `api_server``--port` 一致。** 以下为带日志落盘的**完整命令**`--model_args` 使用双引号以便展开 **`${MODEL_DIR}`**
77+
服务就绪后执行(本地回环走代理时用 `no_proxy` / `NO_PROXY` 排除本机)。**`base_url` 中的端口须为 `8000`,与 `api_server``--port` 一致。** 以下为带日志落盘的**完整命令**
7778

7879
```bash
7980
HF_ALLOW_CODE_EVAL=1 HF_DATASETS_OFFLINE=0 \
8081
no_proxy=127.0.0.1,localhost,::1 \
8182
lm_eval --model local-completions \
82-
--model_args "{\"model\":\"deepseek-ai/DeepSeek-V3.2\", \"base_url\":\"http://localhost:8000/v1/completions\", \"max_length\": 16384, \"tokenizer\":\"${MODEL_DIR}\"}" \
83+
--model_args '{"model":"deepseek-ai/DeepSeek-V3.2", "base_url":"http://localhost:8000/v1/completions", "tokenizer_backend":null, "eos_string":"<|end▁of▁sentence|>"}' \
8384
--tasks gsm8k --batch_size 500 --confirm_run_unsafe_code \
8485
>> "${LOG_DIR}/eval_gsm8k.log" 2>&1
8586
```
8687

88+
> **为什么用 `tokenizer_backend=null` 而非 `tokenizer=${MODEL_DIR}`**`local-completions` 默认会用 `transformers.AutoTokenizer.from_pretrained(${MODEL_DIR})` 在本地加载 HF tokenizer,但当前环境的 **transformers 不识别 `model_type: deepseek_v32`**`KeyError: 'deepseek_v32'` → rope `AttributeError`),评测在加载 tokenizer 阶段即崩溃,根本跑不到推理。设 **`tokenizer_backend=null`** 后 lm_eval 不再本地加载 tokenizer,直接把 **prompt 文本**发给 server,由 lightllm 服务端用真正的 deepseek_v32 tokenizer 分词——更贴合实际且无需本地 HF 适配。`eos_string` 显式给出 DeepSeek 的结束符以消除 “Cannot determine EOS string” 告警(gsm8k 本身也带 stop 序列)。`tokenized_requests` 会被自动关闭、不再做 context 长度校验(gsm8k 5-shot prompt 很短,无需截断)。
89+
> 若哪天升级了能识别 `deepseek_v32` 的 transformers,可改回 `"tokenizer":"${MODEL_DIR}"` 形式(届时 tokenizer 须与 `--model_dir` 同一路径)。
90+
8791
- **`LOG_DIR`**:与启动服务一节相同;若仅调试不重定向,去掉 `\` 续行及最后的 `>> "${LOG_DIR}/eval_gsm8k.log" 2>&1` 即可在前台查看输出。
88-
- **`MODEL_DIR`**须与 server 启动命令中的 `--model_dir` 一致;路径随环境变化时的默认试跑与向用户确认见「执行约定」。
92+
- **tokenizer**本命令用 `tokenizer_backend=null`,评测端不再依赖 `MODEL_DIR` 下的 HF tokenizer(分词在 server 端完成),故 `MODEL_DIR` 路径变化不影响评测命令;server 启动命令中的 `--model_dir` 仍按「执行约定」处理
8993
- 若环境需要,可同时设置 `NO_PROXY=127.0.0.1,localhost,::1`(或与团队约定一致的列表)。
9094

9195
## 执行约定(不要额外写“专用启动脚本”)

0 commit comments

Comments
 (0)