ggml-org / llama.cpp Public

Notifications You must be signed in to change notification settings
Fork 16.5k
Star 102k

Code
Issues 555
Pull requests 854
Discussions
Actions
Projects
Wiki
Security and quality 13
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Wiki
Security and quality
Insights

Pull requests: ggml-org/llama.cpp

Labels 93 Milestones 0

New pull request New

854 Open 9,853 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

llama-quant : remove these checks as some arches do not have these tensors

#21544 opened Apr 7, 2026 by ownia

Loading…

fix --grammar-file commandline arg not working (and others) examples server

#21543 opened Apr 7, 2026 by AUTOMATIC1111

Loading…

metal : add CROSS_ENTROPY_LOSS and CROSS_ENTROPY_LOSS_BACK ops Apple Metal

https://en.wikipedia.org/wiki/Metal_(API)

ggml

changes relating to the ggml tensor library for machine learning

#21542 opened Apr 7, 2026 by nuri-yoo

Loading…

2 tasks done

vulkan: Support Q1_0 ggml

changes relating to the ggml tensor library for machine learning

testing

Everything test related

Vulkan

Issues specific to the Vulkan backend

#21539 opened Apr 7, 2026 by jeffbolznv

Loading…

server : fix json_schema response_format ignored by some chat templates examples server

#21537 opened Apr 7, 2026 by wiktoraleksanderkaczor

Loading…

common: fix split model loading by sorting file list testing

Everything test related

#21535 opened Apr 6, 2026 by brettp

Loading…

YATF (Yet Another Tokenizer Fix) for Gemma 4. With tests! python

python script changes

testing

Everything test related

#21534 opened Apr 6, 2026 by pwilkin

Loading…

ggml-webgpu: parameterize submission size and add iOS specific limits ggml

changes relating to the ggml tensor library for machine learning

WebGPU

#21533 opened Apr 6, 2026 by reeselevine

Loading…

llama: remove per-arch tensor name lists merge ready

A maintainer can use this label to indicate that they consider the changes final and ready to merge.

#21531 opened Apr 6, 2026 by JohannesGaessler

Loading…

metal: Q1_0 backend Apple Metal

https://en.wikipedia.org/wiki/Metal_(API)

ggml

changes relating to the ggml tensor library for machine learning

testing

Everything test related

#21528 opened Apr 6, 2026 by khosravipasha

Loading…

common : preserve original Gemma 4 tool responses even when JSON-like

#21522 opened Apr 6, 2026 by kiwixz

Loading…

ggml-webgpu: address quantization precision and backend lifecycle managment ggml

changes relating to the ggml tensor library for machine learning

testing

Everything test related

WebGPU

#21521 opened Apr 6, 2026 by Constannnnnt

Loading…

ggml-cuda : fix CDNA2 compute capability constant for gfx90a (MI210) ggml

changes relating to the ggml tensor library for machine learning

Nvidia GPU

Issues specific to Nvidia GPUs

#21519 opened Apr 6, 2026 by aviallon

Loading…

kv-cache : support attention rotation for heterogeneous iSWA

#21513 opened Apr 6, 2026 by ggerganov

Loading…

server : fix restore for checkpoints with pos_min == 0 examples server

#21510 opened Apr 6, 2026 by ggerganov

Loading…

llama-server: fix model params not propagated examples server

#21509 opened Apr 6, 2026 by taronaeo

Loading…

llama-quant : overlap compute and write with double buffering

#21507 opened Apr 6, 2026 by nuri-yoo

Loading…

6 tasks done

models : set gemma 4 FFN MoE prec to F32

#21506 opened Apr 6, 2026 by ggerganov • Draft

vocab : remove </s> eog token if gemma4

#21492 opened Apr 6, 2026 by aldehir

Loading…

mtmd: fit_params now take into account mmproj examples server

#21489 opened Apr 5, 2026 by ngxson

Loading…

llama-quantize: fix tensor-type logic

#21482 opened Apr 5, 2026 by theo77186

Loading…

server: add null check for context to prevent segfault on init failure examples server

#21477 opened Apr 5, 2026 by Anirudh171202

Loading…

gguf-py: Fix lazy tensor handling for keyword arguments python

python script changes

#21476 opened Apr 5, 2026 by lainon1

Loading…

llama-quant: use LLM_KV constants instead of hardcoded strings

#21475 opened Apr 5, 2026 by lainon1

Loading…

CUDA: make cuda graphs props check faster ggml

changes relating to the ggml tensor library for machine learning

Nvidia GPU

Issues specific to Nvidia GPUs

#21472 opened Apr 5, 2026 by am17an

Loading…

Previous 1 2 3 4 5 … 34 35 Next

Previous Next

ProTip! Mix and match filters to narrow down what you’re looking for.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!