Name and Version
llama-server --version
ggml_cuda_init: found 1 CUDA devices (Total VRAM: 8150 MiB):
Device 0: NVIDIA GeForce RTX 5060 Laptop GPU, compute capability 12.0, VMM: yes, VRAM: 8150 MiB
version: 8616 (ced5734)
built with MSVC 19.50.35728.0 for Windows AMD64
Operating systems
Windows
Which llama.cpp modules do you know to be affected?
llama-server
Command line
llama-server --grammar-file grammar.gbnf -m qwen_qwen3.5-0.8b-q8_0.gguf
Problem description & steps to reproduce
The server ignores the file passed by the command line flag, but honors APIs requests that pass a "grammar" field.
possibly caused by commit
5e54d51
as it removed defaults.sampling.grammar from the initialization process (default initialization to empty string instead) and seems to depend on the grammar field having been sent through the API
First Bad Commit
No response
Relevant log output
No response
Name and Version
llama-server --version
ggml_cuda_init: found 1 CUDA devices (Total VRAM: 8150 MiB):
Device 0: NVIDIA GeForce RTX 5060 Laptop GPU, compute capability 12.0, VMM: yes, VRAM: 8150 MiB
version: 8616 (ced5734)
built with MSVC 19.50.35728.0 for Windows AMD64
Operating systems
Windows
Which llama.cpp modules do you know to be affected?
llama-server
Command line
Problem description & steps to reproduce
The server ignores the file passed by the command line flag, but honors APIs requests that pass a "grammar" field.
possibly caused by commit
5e54d51
as it removed defaults.sampling.grammar from the initialization process (default initialization to empty string instead) and seems to depend on the grammar field having been sent through the API
First Bad Commit
No response
Relevant log output
No response