Sampler extensions instead of implementing every possible method directly #23028

dpmm99 · 2026-05-13T22:58:14Z

dpmm99
May 13, 2026

This would be huge to me. llama-server has a great amount of features and ease of use that I'd lose by writing a wrapper (like I often use LlamaSharp, which doesn't even wrap common), but it doesn't allow a lot of control over sampling--just some built-in parameters and a grammar.

The thing I wanted the most is to be able to break loops without affecting intelligence (like the ability to write code that uses the same variable on every line)... especially because those loops happen fairly often with quantized models, and that can cost a lot of time when the client doesn't stream all the tokens, so you can't even tell it's happening. So, I had Qwen3.6-27B-UD-Q6_K_XL try implementing it via OpenCode:

Not too complex, but it'd need a lot more work for it to function in various cases like speculative decoding, multiple chained samplers, and optimization, I'm sure. But making this customization available via an extension library means not having to maintain and repeatedly rebase our own llama.cpp branches if we do want a custom sampling method in llama-server.

Speaking of rebasing, I had it implement this on #22673 at 5d5f1b4, before it was force-pushed recently, merged with master at 9dcf835. server-sampler-extension.patch

Edit: and I reapplied it to that pull request and merged llama.cpp master again, just in case someone wants to play with it: https://github.com/ggml-org/llama.cpp/compare/master...dpmm99:llama.cpp:mtp-clean-with-extensions-merged-23020?expand=1

Edit again: reapplied to master now that MTP is merged, and made it support speculative decoding: https://github.com/ggml-org/llama.cpp/compare/master...dpmm99:llama.cpp:master-with-sampling-extensions?expand=1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sampler extensions instead of implementing every possible method directly #23028

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Sampler extensions instead of implementing every possible method directly #23028

Uh oh!

Uh oh!

dpmm99 May 13, 2026

Replies: 0 comments

dpmm99
May 13, 2026