Commit 9777256
convert: add MiniCPM5 tokenizer support (ggml-org#23384)
Add minicpm5 pre-tokenizer hash via convert_hf_to_gguf_update.py and
implement hardcoded regex handling in llama-vocab.cpp, consistent with
other BPE pre-tokenizers.
Co-authored-by: zhangtao <zhangtao2@modelbest.cn>1 parent 7085492 commit 9777256
4 files changed
Lines changed: 16 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1625 | 1625 | | |
1626 | 1626 | | |
1627 | 1627 | | |
| 1628 | + | |
| 1629 | + | |
| 1630 | + | |
1628 | 1631 | | |
1629 | 1632 | | |
1630 | 1633 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
157 | 157 | | |
158 | 158 | | |
159 | 159 | | |
| 160 | + | |
160 | 161 | | |
161 | 162 | | |
162 | 163 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
511 | 511 | | |
512 | 512 | | |
513 | 513 | | |
| 514 | + | |
| 515 | + | |
| 516 | + | |
| 517 | + | |
| 518 | + | |
| 519 | + | |
| 520 | + | |
| 521 | + | |
514 | 522 | | |
515 | 523 | | |
516 | 524 | | |
| |||
2039 | 2047 | | |
2040 | 2048 | | |
2041 | 2049 | | |
| 2050 | + | |
| 2051 | + | |
| 2052 | + | |
2042 | 2053 | | |
2043 | 2054 | | |
2044 | 2055 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
60 | 60 | | |
61 | 61 | | |
62 | 62 | | |
| 63 | + | |
63 | 64 | | |
64 | 65 | | |
65 | 66 | | |
| |||
0 commit comments