Commit d99045f
llama: add DeepSeek V4 Flash model + architecture (CPU)
DeepSeek-V4-Flash model: graph (src/models/deepseek4.cpp), arch /
hparams / model-loader wiring, the dsv4_* compressed-KV extension to
llama_memory_hybrid_iswa, GGUF conversion (conversion/deepseek.py +
constants/writer keys), and the V4 chat template. Standard build_attn_mha
attention path; no DeepSeek Sparse Attention. Exercises the DSV4 ops
from the preceding commit so they are testable end-to-end on CPU.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent 9811b19 commit d99045f
27 files changed
Lines changed: 3596 additions & 63 deletions
File tree
- common
- conversion
- gguf-py/gguf
- models/templates
- src
- models
- tools/imatrix
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1661 | 1661 | | |
1662 | 1662 | | |
1663 | 1663 | | |
| 1664 | + | |
1664 | 1665 | | |
1665 | 1666 | | |
1666 | 1667 | | |
| |||
1681 | 1682 | | |
1682 | 1683 | | |
1683 | 1684 | | |
1684 | | - | |
1685 | | - | |
| 1685 | + | |
| 1686 | + | |
| 1687 | + | |
1686 | 1688 | | |
1687 | 1689 | | |
1688 | 1690 | | |
| |||
2093 | 2095 | | |
2094 | 2096 | | |
2095 | 2097 | | |
2096 | | - | |
| 2098 | + | |
2097 | 2099 | | |
2098 | 2100 | | |
2099 | | - | |
| 2101 | + | |
2100 | 2102 | | |
2101 | | - | |
| 2103 | + | |
2102 | 2104 | | |
2103 | 2105 | | |
2104 | 2106 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
47 | 47 | | |
48 | 48 | | |
49 | 49 | | |
| 50 | + | |
50 | 51 | | |
51 | 52 | | |
52 | 53 | | |
| |||
0 commit comments