Skip to content

Misc. bug: llama.cpp-deepdeek-v4-flash aborts on launch when np > 1 #9

Description

@kstjohn1

Name and Version

Mac-Pro:bin admin$ ./llama-server --version
ggml_metal_device_init: tensor API disabled for pre-M5 and pre-A19 devices
ggml_metal_library_init: using embedded metal library
ggml_metal_library_init: loaded in 0.012 sec
ggml_metal_rsets_init: creating a residency set collection (keep_alive = 180 s)
ggml_metal_device_init: GPU name: MTL0 (Apple M3 Max)
ggml_metal_device_init: GPU family: MTLGPUFamilyApple9 (1009)
ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4 (5002)
ggml_metal_device_init: simdgroup reduction = true
ggml_metal_device_init: simdgroup matrix mul. = true
ggml_metal_device_init: has unified memory = true
ggml_metal_device_init: has bfloat = true
ggml_metal_device_init: has tensor = false
ggml_metal_device_init: use residency sets = true
ggml_metal_device_init: use shared buffers = true
ggml_metal_device_init: recommendedMaxWorkingSetSize = 126751.87 MB
version: 8927 (2f2d440)
built with AppleClang 21.0.0.21000099 for Darwin arm64

Operating systems

Mac

Which llama.cpp modules do you know to be affected?

llama-server

Command line

./llama-server -m ~/Downloads/models/DeepSeek-V4-Flash-IQ2XXS-w2Q2K-AProjQ8-SExpQ8-OutQ8-chat-v2.gguf -c 524288 -np 2

Problem description & steps to reproduce

When using np > 1 to start llama-server, the app aborts. I used the model to find the issue and will submit a PR after a bit more testing.

First Bad Commit

No response

Relevant log output

Logs
 /Users/admin/Downloads/llama.cpp-deepseek-v4-flash/ggml/src/ggml.c:3643: GGML_ASSERT(ggml_nelements(a) == ne0*ne1*ne2) failed
    WARNING: Using native backtrace. Set GGML_BACKTRACE_LLDB for more info.
    See: https://github.com/ggml-org/llama.cpp/pull/17869
    0   libggml-base.0.10.0.dylib           0x000000010509d3d0 ggml_print_backtrace + 276
    1   libggml-base.0.10.0.dylib           0x00000001051080bc ggml_abort + 156
    2   libggml-base.0.10.0.dylib           0x0000000105108b50 ggml_reshape_4d.cold.1 + 0
    3   libggml-base.0.10.0.dylib           0x00000001050a45c4 ggml_reshape_3d + 312
    4   libllama.0.0.8927.dylib             0x000000010581d440 _ZN19llm_build_deepseek4C2ERK11llama_modelRK16llm_graph_params + 1748
    5   libllama.0.0.8927.dylib             0x00000001057c82d0 _ZNSt3__111make_uniqueB9nqe210106I19llm_build_deepseek4JRK11llama_modelRK16llm_graph_paramsELi0EEENS_10unique_ptrIT_NS_14default_deleteIS9_EEEEDpOT0_ + 52
    6   libllama.0.0.8927.dylib             0x00000001057c69ac _ZNK11llama_model11build_graphERK16llm_graph_params + 1816
    7   libllama.0.0.8927.dylib             0x000000010570448c _ZN13llama_context13graph_reserveEjjjPK22llama_memory_context_ibPmi + 776
    8   libllama.0.0.8927.dylib             0x0000000105702dd4 _ZN13llama_context13sched_reserveEv + 616
    9   libllama.0.0.8927.dylib             0x0000000105701ccc _ZN13llama_contextC2ERK11llama_model20llama_context_params + 4196
    10  libllama.0.0.8927.dylib             0x000000010570ba48 llama_init_from_model + 600
    11  libllama-common.0.0.8927.dylib      0x0000000105320548
  _ZL29common_get_device_memory_dataPKcPK18llama_model_paramsPK20llama_context_paramsRNSt3__16vectorIP19ggml_backend_deviceNS7_9allocatorISA_EEEERjSF_SF_14ggml_log_level
     + 232
    12  libllama-common.0.0.8927.dylib      0x000000010531ad88 _ZL22common_params_fit_implPKcP18llama_model_paramsP20llama_context_paramsPfP32llama_model_tensor_buft_overridePmj14ggml_log_level + 180
    18  dyld                                0x000000018dcc7da4 start + 6992
    Abort trap: 6

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions