Skip to content

Commit a8e894d

Browse files
authored
fix editorconfig-checker, update docs
1 parent cb92f77 commit a8e894d

7 files changed

Lines changed: 31 additions & 15 deletions

File tree

.github/workflows/build-cache.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -68,8 +68,8 @@ jobs:
6868

6969
env:
7070
# Sync versions in build.yml, release.yml, build-cache.yml, .devops/openvino.Dockerfile
71-
OPENVINO_VERSION_MAJOR: "2026.0.1"
72-
OPENVINO_VERSION_FULL: "2026.0.1.20960.91859257636"
71+
OPENVINO_VERSION_MAJOR: "2026.0.1"
72+
OPENVINO_VERSION_FULL: "2026.0.1.20960.91859257636"
7373

7474
steps:
7575
- name: Clone

.github/workflows/build.yml

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -748,8 +748,8 @@ jobs:
748748

749749
env:
750750
# Sync versions in build.yml, release.yml, build-cache.yml, .devops/openvino.Dockerfile
751-
OPENVINO_VERSION_MAJOR: "2026.0.1"
752-
OPENVINO_VERSION_FULL: "2026.0.1.20960.91859257636"
751+
OPENVINO_VERSION_MAJOR: "2026.0.1"
752+
OPENVINO_VERSION_FULL: "2026.0.1.20960.91859257636"
753753

754754
steps:
755755
- name: Clone
@@ -789,7 +789,7 @@ jobs:
789789
cd ./openvino_toolkit
790790
chmod +x ./install_dependencies/install_openvino_dependencies.sh
791791
echo "Y" | sudo -E ./install_dependencies/install_openvino_dependencies.sh
792-
792+
793793
- name: Build
794794
id: cmake_build
795795
run: |
@@ -799,10 +799,10 @@ jobs:
799799
-DGGML_OPENVINO=ON
800800
cmake --build build/ReleaseOV --config Release -j $(nproc)
801801
802-
- name: Test
803-
id: cmake_test
804-
run: |
805-
cd ${{ github.workspace }}
802+
- name: Test
803+
id: cmake_test
804+
run: |
805+
cd ${{ github.workspace }}
806806
ctest --test-dir build/ReleaseOV -L main --verbose --timeout 2000
807807
808808
build-linux-cross:

.github/workflows/release.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -240,7 +240,7 @@ jobs:
240240
env:
241241
# Sync versions in build.yml, release.yml, build-cache.yml, .devops/openvino.Dockerfile
242242
OPENVINO_VERSION_MAJOR: "2026.0.1"
243-
OPENVINO_VERSION_FULL: "2026.0.1.20960.91859257636"
243+
OPENVINO_VERSION_FULL: "2026.0.1.20960.91859257636"
244244

245245
steps:
246246
- name: Set OpenVINO version output
@@ -285,7 +285,7 @@ jobs:
285285
cd ./openvino_toolkit
286286
chmod +x ./install_dependencies/install_openvino_dependencies.sh
287287
echo "Y" | sudo -E ./install_dependencies/install_openvino_dependencies.sh
288-
288+
289289
- name: Build
290290
id: cmake_build
291291
run: |

CODEOWNERS

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,7 @@
7272
/ggml/src/ggml-virtgpu/ @kpouget
7373
/ggml/src/ggml-webgpu/ @reeselevine
7474
/ggml/src/ggml-zdnn/ @taronaeo @Andreas-Krebbel @AlekseiNikiforovIBM
75+
/ggml/src/ggml-openvino/ @cavusmustafa @wine99
7576
/ggml/src/ggml.c @ggerganov
7677
/ggml/src/ggml.cpp @ggerganov
7778
/ggml/src/gguf.cpp @JohannesGaessler @Green-Sky

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -277,6 +277,7 @@ Instructions for adding support for new models: [HOWTO-add-model.md](docs/develo
277277
| [BLAS](docs/build.md#blas-build) | All |
278278
| [BLIS](docs/backend/BLIS.md) | All |
279279
| [SYCL](docs/backend/SYCL.md) | Intel and Nvidia GPU |
280+
| [OpenVINO [In Progress]](docs/backend/openvino.md) | Intel CPUs, GPUs, and NPUs |
280281
| [MUSA](docs/build.md#musa) | Moore Threads GPU |
281282
| [CUDA](docs/build.md#cuda) | Nvidia GPU |
282283
| [HIP](docs/build.md#hip) | AMD GPU |

docs/backend/OPENVINO.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# OpenVINO Backend for llama.cpp
22

3-
This document describes the OpenVINO backend for `llama.cpp`, which enables hardware-accelerated inference on **Intel® CPUs, GPUs, and NPUs** while remaining compatible with the existing **GGUF model ecosystem**.
3+
This document describes the [OpenVINO](https://docs.openvino.ai/) backend for `llama.cpp`, which enables hardware-accelerated inference on **Intel® CPUs, GPUs, and NPUs** while remaining compatible with the existing **GGUF model ecosystem**.
44

55
The backend translates GGML compute graphs into OpenVINO graphs and leverages graph compilation, kernel fusion, and device-specific optimizations to improve inference performance on supported Intel hardware.
66

@@ -20,7 +20,7 @@ Although OpenVINO supports a wide range of [Intel hardware](https://docs.openvin
2020

2121
## Supported Model Precisions
2222

23-
- `FP16`
23+
- `FP16`
2424
- `BF16` (on Intel Xeon)
2525
- `Q4_0`
2626
- `Q4_1`
@@ -112,7 +112,7 @@ GGML_OPENVINO_DEVICE=GPU ./llama-bench -fa 1
112112
- Does not support llama-server -np > 1 (multiple parallel sequences)
113113
- Only supports llama-perplexity -b 512 or smaller
114114

115-
## Llama.cpp Tools
115+
## Llama.cpp Tools
116116

117117
The following tools work with the OpenVINO backend on CPU and GPU: llama-simple, llama-run, llama-cli, llama-server, llama-bench, llama-perplexity.
118118

docs/build.md

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -777,7 +777,7 @@ Follow the instructions below to install OpenVINO runtime and build llama.cpp wi
777777
- **Linux:**
778778

779779
<details>
780-
<summary>📦 Click to expand OpenVINO 2025.3 installation from an archive file on Ubuntu</summary>
780+
<summary>📦 Click to expand OpenVINO installation from an archive file on Ubuntu</summary>
781781
<br>
782782

783783
```bash
@@ -852,6 +852,20 @@ Control OpenVINO behavior using these environment variables:
852852
- **`GGML_OPENVINO_DUMP_CGRAPH`**: Save compute graph to `cgraph.txt`.
853853
- **`GGML_OPENVINO_DUMP_IR`**: Export OpenVINO IR files with timestamps.
854854

855+
| Variable | Description |
856+
|--------|-------------|
857+
| `GGML_OPENVINO_DEVICE` | Specify the target device for OpenVINO inference. If not set, automatically selects the first available device in priority order: GPU, CPU, NPU. When set to `NPU` to use Intel NPUs, it enables |
858+
| `GGML_OPENVINO_CACHE_DIR` | Directory for OpenVINO model caching (recommended: `/tmp/ov_cache`). If set, enables model caching in OpenVINO. Note: Not supported when using NPU devices yet. |
859+
| `GGML_OPENVINO_PROFILING` | Enable execution-time profiling. |
860+
| `GGML_OPENVINO_DUMP_CGRAPH` | Save the GGML compute graph to `cgraph.txt`. |
861+
| `GGML_OPENVINO_DUMP_IR` | Export OpenVINO IR files with timestamps. |
862+
| `GGML_OPENVINO_DEBUG_INPUT` | Enable input debugging. |
863+
| `GGML_OPENVINO_DEBUG_OUTPUT` | Enable output debugging. |
864+
| `GGML_OPENVINO_STATEFUL_EXECUTION` | Enable stateful execution for better performance. |
865+
866+
> [!NOTE]
867+
>`GGML_OPENVINO_STATEFUL_EXECUTION` is an **Experimental** feature to allow stateful execution for managing the KV cache internally inside the OpenVINO model, improving performance on CPUs and GPUs. Stateful execution is not effective on NPUs, and not all models currently support this feature. This feature is experimental and has been validated only with the llama-simple, llama-cli, llama-bench, and llama-run applications and is recommended to enable for the best performance. Other applications, such as llama-server and llama-perplexity, are not yet supported.
868+
855869
### Example with Profiling
856870

857871
```bash

0 commit comments

Comments
 (0)