-
Notifications
You must be signed in to change notification settings - Fork 16.5k
Pull requests: ggml-org/llama.cpp
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
llama-quant : remove these checks as some arches do not have these tensors
#21544
opened Apr 7, 2026 by
ownia
Loading…
fix --grammar-file commandline arg not working (and others)
examples
server
#21543
opened Apr 7, 2026 by
AUTOMATIC1111
Loading…
metal : add CROSS_ENTROPY_LOSS and CROSS_ENTROPY_LOSS_BACK ops
Apple Metal
https://en.wikipedia.org/wiki/Metal_(API)
ggml
changes relating to the ggml tensor library for machine learning
#21542
opened Apr 7, 2026 by
nuri-yoo
Loading…
2 tasks done
vulkan: Support Q1_0
ggml
changes relating to the ggml tensor library for machine learning
testing
Everything test related
Vulkan
Issues specific to the Vulkan backend
#21539
opened Apr 7, 2026 by
jeffbolznv
Loading…
server : fix json_schema response_format ignored by some chat templates
examples
server
#21537
opened Apr 7, 2026 by
wiktoraleksanderkaczor
Loading…
common: fix split model loading by sorting file list
testing
Everything test related
#21535
opened Apr 6, 2026 by
brettp
Loading…
YATF (Yet Another Tokenizer Fix) for Gemma 4. With tests!
python
python script changes
testing
Everything test related
#21534
opened Apr 6, 2026 by
pwilkin
Loading…
ggml-webgpu: parameterize submission size and add iOS specific limits
ggml
changes relating to the ggml tensor library for machine learning
WebGPU
#21533
opened Apr 6, 2026 by
reeselevine
Loading…
llama: remove per-arch tensor name lists
merge ready
A maintainer can use this label to indicate that they consider the changes final and ready to merge.
#21531
opened Apr 6, 2026 by
JohannesGaessler
Loading…
metal: Q1_0 backend
Apple Metal
https://en.wikipedia.org/wiki/Metal_(API)
ggml
changes relating to the ggml tensor library for machine learning
testing
Everything test related
#21528
opened Apr 6, 2026 by
khosravipasha
Loading…
common : preserve original Gemma 4 tool responses even when JSON-like
#21522
opened Apr 6, 2026 by
kiwixz
Loading…
ggml-webgpu: address quantization precision and backend lifecycle managment
ggml
changes relating to the ggml tensor library for machine learning
testing
Everything test related
WebGPU
#21521
opened Apr 6, 2026 by
Constannnnnt
Loading…
ggml-cuda : fix CDNA2 compute capability constant for gfx90a (MI210)
ggml
changes relating to the ggml tensor library for machine learning
Nvidia GPU
Issues specific to Nvidia GPUs
#21519
opened Apr 6, 2026 by
aviallon
Loading…
kv-cache : support attention rotation for heterogeneous iSWA
#21513
opened Apr 6, 2026 by
ggerganov
Loading…
server : fix restore for checkpoints with pos_min == 0
examples
server
#21510
opened Apr 6, 2026 by
ggerganov
Loading…
llama-server: fix model params not propagated
examples
server
#21509
opened Apr 6, 2026 by
taronaeo
Loading…
llama-quant : overlap compute and write with double buffering
#21507
opened Apr 6, 2026 by
nuri-yoo
Loading…
6 tasks done
mtmd: fit_params now take into account mmproj
examples
server
#21489
opened Apr 5, 2026 by
ngxson
Loading…
server: add null check for context to prevent segfault on init failure
examples
server
#21477
opened Apr 5, 2026 by
Anirudh171202
Loading…
gguf-py: Fix lazy tensor handling for keyword arguments
python
python script changes
#21476
opened Apr 5, 2026 by
lainon1
Loading…
llama-quant: use LLM_KV constants instead of hardcoded strings
#21475
opened Apr 5, 2026 by
lainon1
Loading…
CUDA: make cuda graphs props check faster
ggml
changes relating to the ggml tensor library for machine learning
Nvidia GPU
Issues specific to Nvidia GPUs
#21472
opened Apr 5, 2026 by
am17an
Loading…
Previous Next
ProTip!
Mix and match filters to narrow down what you’re looking for.