Skip to content

Releases: ggml-org/llama.cpp

b9245

20 May 04:13
b39a7bf

Choose a tag to compare

b9244

20 May 03:39
b28a2f3

Choose a tag to compare

b9243

20 May 03:08
17d22a3

Choose a tag to compare

b9240

20 May 02:44
57cb35c

Choose a tag to compare

b9239

20 May 02:42
7256fce

Choose a tag to compare

b9235

20 May 03:04
d14ce3d

Choose a tag to compare

llama : MTP clean-up (#23269)

  • llama : disable equal splits for recurrent memory with partial rollback

  • spec : re-enable p-min with MTP drafts

  • spec : re-enable ngram spec in combination with RS rollback

  • spec : fix ngram-map-* params

  • spec : fix acceptance logic in combined ngram + draft configs

  • graph : fix reuse for combined token + embd batches

  • spec : log parameters for each speculative implementation

  • add LOG_INF in each constructor with implementation type and parameters
  • extract device string logic into common_speculative_get_devices_str()
  • move 'adding speculative implementation' log from init into constructors

Assisted-by: llama.cpp:local pi

  • spec : extend --spec-default with ngram-map-k4v

Assisted-by: llama.cpp:local pi

  • minor : fix n_embd log

  • args : update draft.n_max == 3 + regen docs

  • spec : relax ngram-mod rejection thold to 0.25 @ 5 low

  • logs : improve

  • docs : update speculative decoding CLI argument documentation

  • Add missing draft model CPU scheduling and tensor override parameters
  • Update --spec-type to include all available types (excluding draft-eagle3 WIP)
  • Fix default values to match implementation (n_max=3, n_min=0, p_min=0.0)
  • Remove deprecated options (spec-draft-ctx-size, spec-draft-replace)
  • Add environment variables for new parameters

Assisted-by: llama.cpp:local pi

  • arg : step-back on adding k4v to the default spec config

  • cont : fix name

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

b9222

19 May 00:29
9a532ae

Choose a tag to compare

hexagon: add support for TRI op (#22822)

  • Hexagon: TRI HVX Kernel addition to ggml hexagon HTP ops and context

  • addressed PR review comments for TRI op

  • hexagon: clang format

  • hex-unary: remove merge conflict markers

  • hex-ggml: remove duplicate op cases (merge conflict)

  • hex-ggml: fix editor config errors


Co-authored-by: Todor Boinovski todorb@qti.qualcomm.com
Co-authored-by: Max Krasnyansky maxk@qti.qualcomm.com

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

b9221

18 May 23:16
b734044

Choose a tag to compare

ggml-hexagon: add PAD op HVX kernel (#23078)

  • ggml-hexagon: add PAD op HVX kernel

Implements GGML_OP_PAD on the Hexagon HTP backend using HVX vectorized
kernels. Supports zero-padding and circular padding across all 4 tensor
dimensions.

  • hex-ggml: remove duplicate op cases (merge conflict)

  • hex-pad: fix editorconfig checks and macro alignment


Co-authored-by: Max Krasnyansky maxk@qti.qualcomm.com

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

b9219

18 May 19:25
45b455e

Choose a tag to compare

b9216

18 May 18:23
1ff0fc1

Choose a tag to compare

ui: Refactor models store, MCP service, and gate logs behind VITE_DEBUG (#23236)

  • refactor: Scope console logs to DEV + VITE_DEBUG env vars

  • refactor: skip MCP proxy probe when no server requires it

  • refactor: suppress expected disconnect errors during MCP client shutdown

  • refactor: Deduplicate requests

  • refactor: deduplicate model fetching across ROUTER and MODEL modes

  • refactor: Clean up models logic

  • chore: Add .env.example file

  • refactor: replace client-side CORS proxy probe with server status flag

  • refactor: Post-review fixes

  • test: add vitest client setup with API fetch mocks

macOS/iOS:

Linux:

Android:

Windows:

openEuler: