Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
111 commits
Select commit Hold shift + click to select a range
8fac4b1
feat: add EAGLE3 speculative decoding support
ruixiang63 Dec 14, 2025
ac5667d
fix eagle3 logits sync bug & remove ggml_set_sync()
ruixiang63 Dec 16, 2025
3e7f376
Merge branch 'master' into pr/18039
ggerganov Dec 17, 2025
5a79c19
eagle3 : improve naming
ggerganov Dec 17, 2025
c0d99e6
add eagle3 support for Qwen3 series models
ruixiang63 Jan 8, 2026
71ba283
add eagle3 support for Qwen3 MoE models
ruixiang63 Jan 9, 2026
3da288d
eagle3: load lm_head from target model if not in draft model when con…
ruixiang63 Jan 10, 2026
13a9f31
eagle3: make d2t mapping optional
ruixiang63 Jan 10, 2026
75883cd
eagle3: add support for gpt-oss-120B eagle3
ruixiang63 Jan 10, 2026
7b78bfa
eagle3: add support for RedHtAI eagle3 speculator series models
ruixiang63 Jan 16, 2026
7d4c223
Merge branch 'master' into HEAD
ggerganov Feb 5, 2026
5e224bc
Merge branch 'master' into pr/18039
ggerganov Feb 9, 2026
b353792
eagle3: fix model convert issue
ruixiang63 Feb 20, 2026
9fea243
eagle3: fix model convert code format
ruixiang63 Feb 20, 2026
b8ab2cc
Merge branch 'master' into pr/18039
ggerganov Feb 23, 2026
07e2c97
eagle3: support --eagle3 in llama-cli
ruixiang63 Feb 28, 2026
5bb2d50
Merge branch 'master' into pr/18039
ggerganov Mar 16, 2026
b94050e
CUDA: use LRU based eviction for cuda graphs (#21611)
am17an Apr 17, 2026
45cac7c
ggml-webgpu: fix compiler warnings and refactor FlashAttention encodi…
reeselevine Apr 17, 2026
fd1c0ec
llama: fit ctx size for CPU only (#21568)
JohannesGaessler Apr 18, 2026
89a5474
convert : fix (ignore for now) typings errors (#22002)
CISC Apr 18, 2026
83d58e0
ci : free disk space for rocm release (#22012)
CISC Apr 18, 2026
59accc8
ggml-backend-meta: add multi-segment read support in get_tensor (#22063)
ssam18 Apr 18, 2026
23b8cc4
android : libcommon -> libllama-common (#22076)
CISC Apr 18, 2026
4f02d47
model : refactor bias tensor variable names (#22079)
CISC Apr 18, 2026
9e5647a
server: Expose `media_tag` on /props endpoint. (#22028)
cetarthoriphros Apr 18, 2026
91fef95
rpc : refactor the RPC transport (#21998)
rgerganov Apr 19, 2026
455d8e4
server : speculative checkpointing (#19493)
srogmann Apr 19, 2026
09b4efa
cmake: remove CMP0194 policy to restore MSVC builds (#21934)
texasich Apr 19, 2026
8685e7b
convert : support sentence-transformer 5.4 config files (#22087)
Bing-su Apr 19, 2026
037bfe3
ci : install spirv-headers for vulkan-cross (#22109)
CISC Apr 19, 2026
bcdcc10
ggml : reduce CPU overhead in meta backend (#22041)
gaugarg-nv Apr 19, 2026
1912407
mtmd: add pos_0 to mtmd_image_tokens_get_decoder_pos (breaking change…
ngxson Apr 19, 2026
471540a
HIP: Remove unesscary NCCL_CHECK (#21914)
IMbackK Apr 19, 2026
d5b780a
common/autoparser : allow space after tool call (#22073)
aldehir Apr 19, 2026
4eac5b4
CUDA: refactor mma data loading for AMD (#22051)
JohannesGaessler Apr 19, 2026
e365e65
vendor : update cpp-httplib to 0.42.0 (#21781)
cabelo Apr 19, 2026
9d49acb
server: rename --clear-idle to --cache-idle-slots (#21741)
yychyo Apr 20, 2026
788fcbc
[SYCL] Fix reorder MMVQ assert on unaligned vocab sizes (#22035)
PMZFX Apr 20, 2026
de71b5f
server : refactor "use checkpoint" logic (#22114)
ggerganov Apr 20, 2026
81df3f7
fix: GLM-DSA crash in llama-tokenize when using vocab_only (#22102)
ssam18 Apr 20, 2026
a678916
mtmd: refactor mtmd_decode_use_mrope (#22161)
ngxson Apr 20, 2026
a6cc43c
ggml-webgpu: updated matrix-vector multiplication (#21738)
neha-ha Apr 20, 2026
7f251fd
ggml-cpu: Optimized x86 and generic cpu q1_0 dot (follow up) (#21636)
pl752 Apr 20, 2026
fb19f94
TP: fix 0-sized tensor slices, AllReduce fallback (#21808)
JohannesGaessler Apr 20, 2026
fd6ae4c
Tensor-parallel: Fix delayed AllReduce on Gemma-4 MoE (#22129)
gaugarg-nv Apr 20, 2026
cf8b0db
server : remove /api endpoints (#22165)
ggerganov Apr 20, 2026
86f8daa
mtmd: correct get_n_pos / get_decoder_pos (#22175)
ngxson Apr 20, 2026
9789512
ggml-cuda: flush legacy pool on OOM and retry (#22155)
leonardHONG Apr 20, 2026
ff6b106
server : fix hardcoded proxy connection timeout in router mode (#1876…
xris99 Apr 21, 2026
cfe9838
fit-params : refactor + add option to output estimated memory per dev…
ggerganov Apr 21, 2026
041fe83
ggml : bump version to 0.10.0 (ggml/1463)
ggerganov Apr 21, 2026
4889afb
sync : ggml
ggerganov Apr 21, 2026
cd03ec7
llama-ext : fix exports (#22202)
ggerganov Apr 21, 2026
9998d88
mtmd: correct mtmd_decode_use_mrope() (#22188)
ngxson Apr 21, 2026
82209ef
vulkan: Support F16 OP_FILL (#22177)
jeffbolznv Apr 21, 2026
7fc1c4e
metal : workaround macOS GPU interactivity watchdog (#22216)
ggerganov Apr 21, 2026
606fa42
vendor : update cpp-httplib to 0.43.1 (#22143)
cabelo Apr 21, 2026
52f1096
openvino: driver setup, CI split, thread safety, and NPU optimization…
wine99 Apr 21, 2026
84652b8
arg : add --spec-default (#22223)
ggerganov Apr 21, 2026
98d2d28
mtmd: Add support for Reka Edge 2603 (#21616)
kwajiehao Apr 21, 2026
72d693e
spec : reset i_last when low acceptance streak occurs (#22168)
treo Apr 21, 2026
2248799
hexagon: fix missing v79 entry in libggml-htp.inf (#22194)
mengshengwu Apr 21, 2026
5a4cd67
Hexagon: DAIG op (#22195)
shreyajn Apr 21, 2026
04fe84b
server: allow cancel loading model (#21814)
ngxson Apr 21, 2026
2799d93
ggml-webgpu: reset CPU/GPU profiling time when freeing context (#22050)
yomaytk Apr 21, 2026
0dedb9e
hexagon: add support for FILL op (#22198)
aparmp-quic Apr 21, 2026
ca7f7b7
ggml-webgpu(shader): support conv2d kernels. (#21964)
Constannnnnt Apr 22, 2026
134d6e5
common/chat, server: refactor, move all conversion functions to commo…
pwilkin Apr 22, 2026
750579f
common: Refactoring sampler parameters (#20429) (#22233)
ezturner Apr 22, 2026
7bfe60f
mtmd, llama : Update HunyuanVL vision-language model support (#22037)
ManaEstras Apr 22, 2026
17f6245
server: ignore reasoning content from transcription api (#21905)
ngxson Apr 22, 2026
82d3f4d
mtmd: also support LLAMA_ROPE_TYPE_NONE (#22242)
ngxson Apr 22, 2026
225088e
sycl: Improve mul_mat_id memory efficiency and add BF16 fast path (#2…
qnixsynapse Apr 22, 2026
bcb5eeb
speculative-simple : add checkpoint support (#22227)
ggerganov Apr 22, 2026
8bccdbb
chat: fix parallel_tool_calls default setting based on model capabili…
pwilkin Apr 22, 2026
6da7168
ggml-webgpu: Add fused RMS_NORM + MUL (#21983)
yomaytk Apr 22, 2026
0d0764d
[WebGPU] Implement async tensor api and event api (#22099)
nikhilJain17 Apr 22, 2026
6217b49
HIP: flip GGML_HIP_GRAPHS to default on (#22254)
IMbackK Apr 23, 2026
86db42e
CUDA: fuse relu + sqr (#22249)
anavp-nvidia Apr 23, 2026
b76429a
ggml-webgpu: add support for im2col (#22259)
Constannnnnt Apr 23, 2026
60b68a6
sycl : fused MoE mul_mat_vec_q for TG (#21920)
abotsis Apr 23, 2026
5eaee65
convert : Handle ModelOpt produced mixed precision model during conve…
ynankani Apr 23, 2026
4ead6fd
[SYCL] Update oneapi 2025.3.3, Seperate SYCL build, release Ubuntu 24…
NeoZhangJianyu Apr 23, 2026
96c1db2
ggml-base: use MATH_LIBRARY variable instead of hardcoded 'm' (#22239)
ggerganov Apr 23, 2026
930e021
gitignore: add AGENTS.local.md (#22246)
ggerganov Apr 23, 2026
8635e22
metal : fix event synchronization (#22260)
ggerganov Apr 23, 2026
550d684
server: Enable transcriptions API for LFM2-Audio (#22000)
tdakhran Apr 23, 2026
0dd7f91
cli : cleanup auto-completion code (#21745)
matthiasstraka Apr 23, 2026
9012c50
model-conversion : fix mmproj output file name [no ci] (#22274)
danbev Apr 23, 2026
0949beb
fix build number for sycl release (#22283)
CISC Apr 23, 2026
c807c6e
server: (anthropic API) fix prefix caching (#21793)
kvc0 Apr 23, 2026
12568ca
vendor : update LibreSSL to 4.3.1 (#22285)
angt Apr 23, 2026
c78fb90
server: fix heap-buffer-overflow from negative n_discard (CVE-2026-21…
SongTonyLi Apr 23, 2026
185cbff
server : convert_anthropic_to_oai: also copy chat_template_kwargs (#2…
Soreepeong Apr 23, 2026
187a456
Enable testing on Snapdragon devices (#21051)
shreyajn Apr 23, 2026
5d2b52d
hexagon: add support for basic and extended Op profiling (#22269)
max-krasnyansky Apr 23, 2026
fa0b8a7
cli: Remove redundant local sampling variables (#20429) (#22264)
ezturner Apr 23, 2026
e5f070a
fix(shader): handle the buffer aliasing for rms fuse (#22266)
Constannnnnt Apr 23, 2026
8bc492e
hexagon: add SOLVE_TRI op (#21974)
mengshengwu Apr 24, 2026
793d0a7
server: rename debug tags to match --cache-idle-slots naming (#22292)
yychyo Apr 24, 2026
ffdd983
server : fix swa-full logic (#22288)
ggerganov Apr 24, 2026
017f090
jinja : remove unused header (#22310)
ggerganov Apr 24, 2026
e583f3b
ggml : minor coding style (#22308)
ggerganov Apr 24, 2026
dc80c52
common : fix jinja warnings with clang 21 (#22313)
angt Apr 24, 2026
15fa3c4
metal : print GPU description (#22318)
ggerganov Apr 24, 2026
91b03e4
Merge branch 'master' into pr/18039
ggerganov Apr 24, 2026
0724d66
dflash: first working POC
ruixiang63 Apr 18, 2026
85a0089
dflash: add support for qwen3.5/3.6 moe models
ruixiang63 Apr 19, 2026
e344c4a
dflash: remove rebundant logic & correct bias naming
ruixiang63 Apr 24, 2026
67cb0d5
dflash: enable llama-cli & llama-server with np=1
ruixiang63 Apr 27, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .devops/intel.Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
ARG ONEAPI_VERSION=2025.3.2-0-devel-ubuntu24.04
ARG ONEAPI_VERSION=2025.3.3-0-devel-ubuntu24.04

## Build Image

Expand Down
50 changes: 48 additions & 2 deletions .devops/openvino.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,19 @@ ARG OPENVINO_VERSION_MAJOR=2026.0
ARG OPENVINO_VERSION_FULL=2026.0.0.20965.c6d6a13a886
ARG UBUNTU_VERSION=24.04

# Optional proxy build arguments - empty by default
# Intel GPU driver versions. https://github.com/intel/compute-runtime/releases
ARG IGC_VERSION=v2.30.1
ARG IGC_VERSION_FULL=2_2.30.1+20950
ARG COMPUTE_RUNTIME_VERSION=26.09.37435.1
ARG COMPUTE_RUNTIME_VERSION_FULL=26.09.37435.1-0
ARG IGDGMM_VERSION=22.9.0

# Intel NPU driver versions. https://github.com/intel/linux-npu-driver/releases
ARG NPU_DRIVER_VERSION=v1.32.0
ARG NPU_DRIVER_FULL=v1.32.0.20260402-23905121947
ARG LIBZE1_VERSION=1.27.0-1~24.04~ppa2

# Optional proxy build arguments
ARG http_proxy=
ARG https_proxy=

Expand Down Expand Up @@ -78,13 +90,47 @@ ARG http_proxy
ARG https_proxy

RUN apt-get update \
&& apt-get install -y libgomp1 libtbb12 curl \
&& apt-get install -y libgomp1 libtbb12 curl wget ocl-icd-libopencl1 \
&& apt autoremove -y \
&& apt clean -y \
&& rm -rf /tmp/* /var/tmp/* \
&& find /var/cache/apt/archives /var/lib/apt/lists -not -name lock -type f -delete \
&& find /var/cache -type f -delete

# Install GPU drivers
ARG IGC_VERSION
ARG IGC_VERSION_FULL
ARG COMPUTE_RUNTIME_VERSION
ARG COMPUTE_RUNTIME_VERSION_FULL
ARG IGDGMM_VERSION
RUN mkdir /tmp/neo/ && cd /tmp/neo/ \
&& wget https://github.com/intel/intel-graphics-compiler/releases/download/${IGC_VERSION}/intel-igc-core-${IGC_VERSION_FULL}_amd64.deb \
&& wget https://github.com/intel/intel-graphics-compiler/releases/download/${IGC_VERSION}/intel-igc-opencl-${IGC_VERSION_FULL}_amd64.deb \
&& wget https://github.com/intel/compute-runtime/releases/download/${COMPUTE_RUNTIME_VERSION}/intel-ocloc-dbgsym_${COMPUTE_RUNTIME_VERSION_FULL}_amd64.ddeb \
&& wget https://github.com/intel/compute-runtime/releases/download/${COMPUTE_RUNTIME_VERSION}/intel-ocloc_${COMPUTE_RUNTIME_VERSION_FULL}_amd64.deb \
&& wget https://github.com/intel/compute-runtime/releases/download/${COMPUTE_RUNTIME_VERSION}/intel-opencl-icd-dbgsym_${COMPUTE_RUNTIME_VERSION_FULL}_amd64.ddeb \
&& wget https://github.com/intel/compute-runtime/releases/download/${COMPUTE_RUNTIME_VERSION}/intel-opencl-icd_${COMPUTE_RUNTIME_VERSION_FULL}_amd64.deb \
&& wget https://github.com/intel/compute-runtime/releases/download/${COMPUTE_RUNTIME_VERSION}/libigdgmm12_${IGDGMM_VERSION}_amd64.deb \
&& wget https://github.com/intel/compute-runtime/releases/download/${COMPUTE_RUNTIME_VERSION}/libze-intel-gpu1-dbgsym_${COMPUTE_RUNTIME_VERSION_FULL}_amd64.ddeb \
&& wget https://github.com/intel/compute-runtime/releases/download/${COMPUTE_RUNTIME_VERSION}/libze-intel-gpu1_${COMPUTE_RUNTIME_VERSION_FULL}_amd64.deb \
&& dpkg --install *.deb \
&& rm -rf /tmp/neo/

# Install NPU drivers
ARG NPU_DRIVER_VERSION
ARG NPU_DRIVER_FULL
ARG LIBZE1_VERSION
RUN mkdir /tmp/npu/ && cd /tmp/npu/ \
&& wget https://github.com/intel/linux-npu-driver/releases/download/${NPU_DRIVER_VERSION}/linux-npu-driver-${NPU_DRIVER_FULL}-ubuntu2404.tar.gz \
&& tar -xf linux-npu-driver-${NPU_DRIVER_FULL}-ubuntu2404.tar.gz \
&& dpkg --install *.deb \
&& rm -rf /tmp/npu/

RUN cd /tmp \
&& wget https://snapshot.ppa.launchpadcontent.net/kobuk-team/intel-graphics/ubuntu/20260324T100000Z/pool/main/l/level-zero-loader/libze1_${LIBZE1_VERSION}_amd64.deb \
&& dpkg --install libze1_${LIBZE1_VERSION}_amd64.deb \
&& rm libze1_${LIBZE1_VERSION}_amd64.deb

COPY --from=build /app/lib/ /app/

### Full (all binaries)
Expand Down
113 changes: 113 additions & 0 deletions .github/workflows/build-and-test-snapdragon.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
name: CI (snapdragon)

on:
workflow_dispatch:
push:
branches:
- master
paths:
- '.github/workflows/build-and-test-snapdragon.yml'
- 'ggml/include/ggml-hexagon.h'
- 'ggml/src/ggml-hexagon/**'
- 'docs/backend/snapdragon/**'
- 'scripts/snapdragon/**'
- 'CMakePresets.json'

pull_request:
types: [opened, synchronize, reopened]
paths:
- '.github/workflows/build-and-test-snapdragon.yml'
- 'ggml/include/ggml-hexagon.h'
- 'ggml/src/ggml-hexagon/**'
- 'docs/backend/snapdragon/**'
- 'scripts/snapdragon/**'
- 'CMakePresets.json'

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref && github.ref || github.run_id }}
cancel-in-progress: true

jobs:
android-ndk-snapdragon:
runs-on: ubuntu-latest
container:
image: 'ghcr.io/snapdragon-toolchain/arm64-android:v0.3'
defaults:
run:
shell: bash

steps:
- name: Clone
uses: actions/checkout@v6
with:
fetch-depth: 0
lfs: false

- name: Build Llama.CPP for Snapdragon Android
id: build_llama_cpp_snapdragon_android
run: |
cp docs/backend/snapdragon/CMakeUserPresets.json .
cmake --preset arm64-android-snapdragon-release -B build
cmake --build build
cmake --install build --prefix pkg-adb/llama.cpp

- name: Upload Llama.CPP Snapdragon Android Build Artifact
if: ${{ always() && steps.build_llama_cpp_snapdragon_android.outcome == 'success' }}
uses: actions/upload-artifact@v6
with:
name: llama-cpp-android-arm64-snapdragon
path: pkg-adb/llama.cpp

check-secret:
runs-on: ubuntu-latest
outputs:
has-key: ${{ steps.check.outputs.has-key }}
steps:
- id: check
run: echo "has-key=${{ secrets.QDC_API_KEY != '' }}" >> "$GITHUB_OUTPUT"

test-snapdragon-qdc:
name: Test on QDC Android Device (${{ matrix.device }})
needs: [android-ndk-snapdragon, check-secret]
if: needs.check-secret.outputs.has-key == 'true'
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
device: [SM8750, SM8650, SM8850]

steps:
- name: Checkout
uses: actions/checkout@v6

- name: Download build artifact
uses: actions/download-artifact@v4
with:
name: llama-cpp-android-arm64-snapdragon
path: pkg-snapdragon/

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.x'
cache: pip

- name: Install QDC SDK wheel
run: |
curl -fSL -o qdc_sdk.zip https://softwarecenter.qualcomm.com/api/download/software/tools/Qualcomm_Device_Cloud_SDK/All/0.2.3/qualcomm_device_cloud_sdk-0.2.3.zip
unzip qdc_sdk.zip -d qdc_sdk
pip install qdc_sdk/qualcomm_device_cloud_sdk-0.2.3-py3-none-any.whl

- name: Run QDC tests (${{ matrix.device }})
run: |
python scripts/snapdragon/qdc/run_qdc_jobs.py \
--test all \
--pkg-dir pkg-snapdragon/llama.cpp \
--model-url "https://huggingface.co/bartowski/Llama-3.2-1B-Instruct-GGUF/resolve/main/Llama-3.2-1B-Instruct-Q4_0.gguf" \
--device ${{ matrix.device }}
env:
QDC_API_KEY: ${{ secrets.QDC_API_KEY }}

- name: Cleanup
if: always()
run: rm -rf pkg-snapdragon qdc_sdk qdc_sdk.zip
49 changes: 18 additions & 31 deletions .github/workflows/build-android.yml
Original file line number Diff line number Diff line change
@@ -1,26 +1,24 @@
name: CI (android)

on:
workflow_dispatch: # allows manual triggering
workflow_dispatch:
push:
branches:
- master
paths: [
'.github/workflows/build-android.yml',
'**/CMakeLists.txt',
'**/.cmake',
'**/*.h',
'**/*.hpp',
'**/*.c',
'**/*.cpp'
]
paths:
- '.github/workflows/build-android.yml'
- '**/CMakeLists.txt'
- '**/.cmake'
- '**/*.h'
- '**/*.hpp'
- '**/*.c'
- '**/*.cpp'

pull_request:
types: [opened, synchronize, reopened]
paths: [
'.github/workflows/build-android.yml',
'examples/llama.android/**'
]
paths:
- '.github/workflows/build-android.yml'
- 'examples/llama.android/**'

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref && github.ref || github.run_id }}
Expand Down Expand Up @@ -67,35 +65,24 @@ jobs:
defaults:
run:
shell: bash
strategy:
matrix:
include:
- build: 'arm64-cpu'
defines: '-D ANDROID_ABI=arm64-v8a -D ANDROID_PLATFORM=android-31 -D CMAKE_TOOLCHAIN_FILE=${ANDROID_NDK_ROOT}/build/cmake/android.toolchain.cmake -D GGML_NATIVE=OFF -DGGML_CPU_ARM_ARCH=armv8.5-a+fp16+i8mm -G Ninja -D LLAMA_OPENSSL=OFF -D GGML_OPENMP=OFF'
- build: 'arm64-snapdragon'
defines: '--preset arm64-android-snapdragon-release'

steps:
- name: Clone
id: checkout
uses: actions/checkout@v6
with:
fetch-depth: 0
lfs: false

- name: Build Llama.CPP for Hexagon Android
id: build_llama_cpp_hexagon_android
- name: Build
id: ndk_build
run: |
if [[ "${{ matrix.build }}" == "arm64-snapdragon" ]]; then
cp docs/backend/snapdragon/CMakeUserPresets.json .
fi
cmake ${{ matrix.defines }} -B build
cmake -D ANDROID_ABI=arm64-v8a -D ANDROID_PLATFORM=android-31 -D CMAKE_TOOLCHAIN_FILE=${ANDROID_NDK_ROOT}/build/cmake/android.toolchain.cmake -D GGML_NATIVE=OFF -DGGML_CPU_ARM_ARCH=armv8.5-a+fp16+i8mm -G Ninja -D LLAMA_OPENSSL=OFF -D GGML_OPENMP=OFF -B build
cmake --build build
cmake --install build --prefix pkg-adb/llama.cpp

- name: Upload Llama.CPP Hexagon Android Build Artifact
if: ${{ always() && steps.build_llama_cpp_hexagon_android.outcome == 'success' }}
- name: Upload Android Build Artifact
if: ${{ always() && steps.ndk_build.outcome == 'success' }}
uses: actions/upload-artifact@v6
with:
name: llama-cpp-android-${{ matrix.build }}
name: llama-cpp-android-arm64-cpu
path: pkg-adb/llama.cpp
1 change: 1 addition & 0 deletions .github/workflows/build-cross.yml
Original file line number Diff line number Diff line change
Expand Up @@ -246,6 +246,7 @@ jobs:
apt-get install -y --no-install-recommends \
build-essential \
glslc \
spirv-headers \
gcc-14-loongarch64-linux-gnu \
g++-14-loongarch64-linux-gnu \
libvulkan-dev:loong64
Expand Down
120 changes: 120 additions & 0 deletions .github/workflows/build-openvino.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
name: CI (openvino)

on:
workflow_dispatch: # allows manual triggering
push:
branches:
- master
paths: [
'.github/workflows/build-openvino.yml',
'**/CMakeLists.txt',
'**/.cmake',
'**/*.h',
'**/*.hpp',
'**/*.c',
'**/*.cpp',
]

pull_request:
types: [opened, synchronize, reopened]
paths: [
'.github/workflows/build-openvino.yml',
'ggml/src/ggml-openvino/**'
]

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref && github.ref || github.run_id }}
cancel-in-progress: true

env:
GGML_NLOOP: 3
GGML_N_THREADS: 1
LLAMA_LOG_COLORS: 1
LLAMA_LOG_PREFIX: 1
LLAMA_LOG_TIMESTAMPS: 1

jobs:
ubuntu-24-openvino:
name: ubuntu-24-openvino-${{ matrix.openvino_device }}

concurrency:
group: openvino-${{ matrix.variant }}-${{ github.head_ref || github.ref }}
cancel-in-progress: false

strategy:
matrix:
include:
- variant: cpu
runner: '"ubuntu-24.04"'
openvino_device: "CPU"
- variant: gpu
runner: '["self-hosted","Linux","Intel","OpenVINO"]'
openvino_device: "GPU"

runs-on: ${{ fromJSON(matrix.runner) }}

env:
# Sync versions in build-openvino.yml, build-self-hosted.yml, release.yml, build-cache.yml, .devops/openvino.Dockerfile
OPENVINO_VERSION_MAJOR: "2026.0"
OPENVINO_VERSION_FULL: "2026.0.0.20965.c6d6a13a886"

steps:
- name: Clone
id: checkout
uses: actions/checkout@v6

- name: ccache
if: runner.environment == 'github-hosted'
uses: ggml-org/ccache-action@v1.2.21
with:
key: ubuntu-24-openvino-${{ matrix.variant }}-no-preset-v1
evict-old-files: 1d
save: ${{ github.event_name == 'push' && github.ref == 'refs/heads/master' }}

- name: Dependencies
id: depends
run: |
sudo apt-get update
sudo apt-get install -y build-essential libssl-dev libtbb12 cmake ninja-build python3-pip
sudo apt-get install -y ocl-icd-opencl-dev opencl-headers opencl-clhpp-headers intel-opencl-icd

- name: Use OpenVINO Toolkit Cache
if: runner.environment == 'github-hosted'
uses: actions/cache@v5
id: cache-openvino
with:
path: ./openvino_toolkit
key: openvino-toolkit-v${{ env.OPENVINO_VERSION_FULL }}-${{ runner.os }}

- name: Setup OpenVINO Toolkit
if: steps.cache-openvino.outputs.cache-hit != 'true'
uses: ./.github/actions/linux-setup-openvino
with:
path: ./openvino_toolkit
version_major: ${{ env.OPENVINO_VERSION_MAJOR }}
version_full: ${{ env.OPENVINO_VERSION_FULL }}

- name: Install OpenVINO dependencies
run: |
cd ./openvino_toolkit
chmod +x ./install_dependencies/install_openvino_dependencies.sh
echo "Y" | sudo -E ./install_dependencies/install_openvino_dependencies.sh

- name: Build
id: cmake_build
run: |
source ./openvino_toolkit/setupvars.sh
cmake -B build/ReleaseOV -G Ninja \
-DCMAKE_BUILD_TYPE=Release \
-DGGML_OPENVINO=ON
time cmake --build build/ReleaseOV --config Release -j $(nproc)

- name: Test
id: cmake_test
# TODO: fix and re-enable the `test-llama-archs` test below
run: |
cd ${{ github.workspace }}
if [ "${{ matrix.openvino_device }}" = "GPU" ]; then
export GGML_OPENVINO_DEVICE=GPU
fi
ctest --test-dir build/ReleaseOV -L main -E "test-llama-archs" --verbose --timeout 2000
Loading
Loading