Name and Version
Any versions post #20503
Operating systems
Mac
GGML backends
Metal
Hardware
All hardware
Models
Possibly all models
Problem description & steps to reproduce
LLM_TN_IMPL::str() now includes a check to verify if a given tensor is defined in the current model architecture's model_tensors list.
If a model includes tensors that might not be strictly listed in the architecture definition, like position_embd or token_types for example, the console is spammed with many
str: cannot properly format tensor name position_embd with suffix=weight bid=-1 xid=-1
str: cannot properly format tensor name token_types with suffix=weight bid=-1 xid=-1
To reproduce, simply quantize a model like: llama-quantize Qwen3.5-9B-F16.gguf Model-Q4_K.gguf Q4_K 12
First Bad Commit
#20503
Relevant log output
llama-quantize Qwen3.5-9B-F16.gguf Model-Q4_K.gguf Q4_K 12
main: build = 8563 (1f5d15e66)
main: built with AppleClang 17.0.0.17000604 for Darwin arm64
main: quantizing 'Qwen3.5-9B-F16.gguf' to 'Model-Q4_K.gguf' as Q4_K using 12 threads
llama_model_loader: loaded meta data with 41 key-value pairs and 427 tensors from Qwen3.5-9B-F16.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = qwen35
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.name str = Qwen3.5 9B
llama_model_loader: - kv 3: general.basename str = Qwen3.5
llama_model_loader: - kv 4: general.size_label str = 9B
llama_model_loader: - kv 5: general.license str = apache-2.0
llama_model_loader: - kv 6: general.license.link str = https://huggingface.co/Qwen/Qwen3.5-9...
llama_model_loader: - kv 7: general.base_model.count u32 = 1
llama_model_loader: - kv 8: general.base_model.0.name str = Qwen3.5 9B Base
llama_model_loader: - kv 9: general.base_model.0.organization str = Qwen
llama_model_loader: - kv 10: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen3.5-9...
llama_model_loader: - kv 11: general.tags arr[str,1] = ["image-text-to-text"]
llama_model_loader: - kv 12: qwen35.block_count u32 = 32
llama_model_loader: - kv 13: qwen35.context_length u32 = 262144
llama_model_loader: - kv 14: qwen35.embedding_length u32 = 4096
llama_model_loader: - kv 15: qwen35.feed_forward_length u32 = 12288
llama_model_loader: - kv 16: qwen35.attention.head_count u32 = 16
llama_model_loader: - kv 17: qwen35.attention.head_count_kv u32 = 4
llama_model_loader: - kv 18: qwen35.rope.dimension_sections arr[i32,4] = [11, 11, 10, 0]
llama_model_loader: - kv 19: qwen35.rope.freq_base f32 = 10000000.000000
llama_model_loader: - kv 20: qwen35.attention.layer_norm_rms_epsilon f32 = 0.000001
llama_model_loader: - kv 21: qwen35.attention.key_length u32 = 256
llama_model_loader: - kv 22: qwen35.attention.value_length u32 = 256
llama_model_loader: - kv 23: general.file_type u32 = 1
llama_model_loader: - kv 24: qwen35.ssm.conv_kernel u32 = 4
llama_model_loader: - kv 25: qwen35.ssm.state_size u32 = 128
llama_model_loader: - kv 26: qwen35.ssm.group_count u32 = 16
llama_model_loader: - kv 27: qwen35.ssm.time_step_rank u32 = 32
llama_model_loader: - kv 28: qwen35.ssm.inner_size u32 = 4096
llama_model_loader: - kv 29: qwen35.full_attention_interval u32 = 4
llama_model_loader: - kv 30: qwen35.rope.dimension_count u32 = 64
llama_model_loader: - kv 31: general.quantization_version u32 = 2
llama_model_loader: - kv 32: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 33: tokenizer.ggml.pre str = qwen35
llama_model_loader: - kv 34: tokenizer.ggml.tokens arr[str,248320] = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 35: tokenizer.ggml.token_type arr[i32,248320] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 36: tokenizer.ggml.merges arr[str,247587] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv 37: tokenizer.ggml.eos_token_id u32 = 248046
llama_model_loader: - kv 38: tokenizer.ggml.padding_token_id u32 = 248044
llama_model_loader: - kv 39: tokenizer.ggml.add_bos_token bool = false
llama_model_loader: - kv 40: tokenizer.chat_template str = {%- set image_count = namespace(value...
llama_model_loader: - type f32: 177 tensors
llama_model_loader: - type f16: 250 tensors
str: cannot properly format tensor name position_embd with suffix=weight bid=-1 xid=-1
str: cannot properly format tensor name token_types with suffix=weight bid=-1 xid=-1
str: cannot properly format tensor name position_embd with suffix=weight bid=-1 xid=-1
str: cannot properly format tensor name token_types with suffix=weight bid=-1 xid=-1
str: cannot properly format tensor name position_embd with suffix=weight bid=-1 xid=-1
...
Name and Version
Any versions post #20503
Operating systems
Mac
GGML backends
Metal
Hardware
All hardware
Models
Possibly all models
Problem description & steps to reproduce
LLM_TN_IMPL::str()now includes a check to verify if a given tensor is defined in the current model architecture'smodel_tensorslist.If a model includes tensors that might not be strictly listed in the architecture definition, like
position_embdortoken_typesfor example, the console is spammed with manyTo reproduce, simply quantize a model like:
llama-quantize Qwen3.5-9B-F16.gguf Model-Q4_K.gguf Q4_K 12First Bad Commit
#20503
Relevant log output