-
Notifications
You must be signed in to change notification settings - Fork 307
Pull requests: ikawrakow/ik_llama.cpp
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
fix: only inflate n_batch for GPU-offloaded mmproj, not CPU
#1788
opened May 12, 2026 by
ubergarm
Contributor
Loading…
2 of 4 tasks
server: reset cache tokens after pp stops halfway
#1787
opened May 12, 2026 by
firecoperana
Collaborator
Loading…
fix: Use MMQ for large-batch quantized matmuls on Volta
#1785
opened May 12, 2026 by
jkyamog
Contributor
Loading…
Gemma4: support public assistant GGUF schema
#1783
opened May 12, 2026 by
joelfarthing
Contributor
Loading…
MTP: reduce per-step qkv checkpoint rows
#1782
opened May 11, 2026 by
joelfarthing
Contributor
Loading…
2 of 4 tasks
Extend expiring logit bias to other sampling parameters
#1770
opened May 10, 2026 by
dungquixote42
Contributor
•
Draft
2 of 4 tasks
Slightly expand the usage of VNNI256
#1764
opened May 9, 2026 by
XZiar
Contributor
Loading…
2 of 4 tasks
runtime : add
--run-time-repack auto mode for swap-bound MoE safety
#1738
opened May 4, 2026 by
AndrewMoryakov
Contributor
Loading…
2 of 4 tasks
Change signature of llama_set_draft_input_hidden_state
#1727
opened May 3, 2026 by
ikawrakow
Owner
Loading…
convert_hf_to_gguf: add Qwen3.5 / Qwen3.6 / Qwen3-Next support
#1654
opened Apr 18, 2026 by
markaalonzo
Contributor
•
Draft
5 of 7 tasks
Mamba-2 + Nemotron-H MoE backport (Phase 3.x)
#1593
opened Apr 6, 2026 by
AIdevsmartdata
Loading…
5 tasks
Graph-based draft token loop for MTP
#1531
opened Mar 27, 2026 by
SamuelOliveirads
Collaborator
Loading…
ProTip!
Exclude everything labeled
bug with -label:bug.