Name and Version
Mac-Pro:bin admin$ ./llama-server --version
ggml_metal_device_init: tensor API disabled for pre-M5 and pre-A19 devices
ggml_metal_library_init: using embedded metal library
ggml_metal_library_init: loaded in 0.012 sec
ggml_metal_rsets_init: creating a residency set collection (keep_alive = 180 s)
ggml_metal_device_init: GPU name: MTL0 (Apple M3 Max)
ggml_metal_device_init: GPU family: MTLGPUFamilyApple9 (1009)
ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4 (5002)
ggml_metal_device_init: simdgroup reduction = true
ggml_metal_device_init: simdgroup matrix mul. = true
ggml_metal_device_init: has unified memory = true
ggml_metal_device_init: has bfloat = true
ggml_metal_device_init: has tensor = false
ggml_metal_device_init: use residency sets = true
ggml_metal_device_init: use shared buffers = true
ggml_metal_device_init: recommendedMaxWorkingSetSize = 126751.87 MB
version: 8927 (2f2d440)
built with AppleClang 21.0.0.21000099 for Darwin arm64
Operating systems
Mac
Which llama.cpp modules do you know to be affected?
llama-server
Command line
./llama-server -m ~/Downloads/models/DeepSeek-V4-Flash-IQ2XXS-w2Q2K-AProjQ8-SExpQ8-OutQ8-chat-v2.gguf -c 524288 -np 2
Problem description & steps to reproduce
When using np > 1 to start llama-server, the app aborts. I used the model to find the issue and will submit a PR after a bit more testing.
First Bad Commit
No response
Relevant log output
Logs
/Users/admin/Downloads/llama.cpp-deepseek-v4-flash/ggml/src/ggml.c:3643: GGML_ASSERT(ggml_nelements(a) == ne0*ne1*ne2) failed
WARNING: Using native backtrace. Set GGML_BACKTRACE_LLDB for more info.
See: https://github.com/ggml-org/llama.cpp/pull/17869
0 libggml-base.0.10.0.dylib 0x000000010509d3d0 ggml_print_backtrace + 276
1 libggml-base.0.10.0.dylib 0x00000001051080bc ggml_abort + 156
2 libggml-base.0.10.0.dylib 0x0000000105108b50 ggml_reshape_4d.cold.1 + 0
3 libggml-base.0.10.0.dylib 0x00000001050a45c4 ggml_reshape_3d + 312
4 libllama.0.0.8927.dylib 0x000000010581d440 _ZN19llm_build_deepseek4C2ERK11llama_modelRK16llm_graph_params + 1748
5 libllama.0.0.8927.dylib 0x00000001057c82d0 _ZNSt3__111make_uniqueB9nqe210106I19llm_build_deepseek4JRK11llama_modelRK16llm_graph_paramsELi0EEENS_10unique_ptrIT_NS_14default_deleteIS9_EEEEDpOT0_ + 52
6 libllama.0.0.8927.dylib 0x00000001057c69ac _ZNK11llama_model11build_graphERK16llm_graph_params + 1816
7 libllama.0.0.8927.dylib 0x000000010570448c _ZN13llama_context13graph_reserveEjjjPK22llama_memory_context_ibPmi + 776
8 libllama.0.0.8927.dylib 0x0000000105702dd4 _ZN13llama_context13sched_reserveEv + 616
9 libllama.0.0.8927.dylib 0x0000000105701ccc _ZN13llama_contextC2ERK11llama_model20llama_context_params + 4196
10 libllama.0.0.8927.dylib 0x000000010570ba48 llama_init_from_model + 600
11 libllama-common.0.0.8927.dylib 0x0000000105320548
_ZL29common_get_device_memory_dataPKcPK18llama_model_paramsPK20llama_context_paramsRNSt3__16vectorIP19ggml_backend_deviceNS7_9allocatorISA_EEEERjSF_SF_14ggml_log_level
+ 232
12 libllama-common.0.0.8927.dylib 0x000000010531ad88 _ZL22common_params_fit_implPKcP18llama_model_paramsP20llama_context_paramsPfP32llama_model_tensor_buft_overridePmj14ggml_log_level + 180
18 dyld 0x000000018dcc7da4 start + 6992
Abort trap: 6
Name and Version
Mac-Pro:bin admin$ ./llama-server --version
ggml_metal_device_init: tensor API disabled for pre-M5 and pre-A19 devices
ggml_metal_library_init: using embedded metal library
ggml_metal_library_init: loaded in 0.012 sec
ggml_metal_rsets_init: creating a residency set collection (keep_alive = 180 s)
ggml_metal_device_init: GPU name: MTL0 (Apple M3 Max)
ggml_metal_device_init: GPU family: MTLGPUFamilyApple9 (1009)
ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4 (5002)
ggml_metal_device_init: simdgroup reduction = true
ggml_metal_device_init: simdgroup matrix mul. = true
ggml_metal_device_init: has unified memory = true
ggml_metal_device_init: has bfloat = true
ggml_metal_device_init: has tensor = false
ggml_metal_device_init: use residency sets = true
ggml_metal_device_init: use shared buffers = true
ggml_metal_device_init: recommendedMaxWorkingSetSize = 126751.87 MB
version: 8927 (2f2d440)
built with AppleClang 21.0.0.21000099 for Darwin arm64
Operating systems
Mac
Which llama.cpp modules do you know to be affected?
llama-server
Command line
./llama-server -m ~/Downloads/models/DeepSeek-V4-Flash-IQ2XXS-w2Q2K-AProjQ8-SExpQ8-OutQ8-chat-v2.gguf -c 524288 -np 2Problem description & steps to reproduce
When using np > 1 to start llama-server, the app aborts. I used the model to find the issue and will submit a PR after a bit more testing.
First Bad Commit
No response
Relevant log output
Logs