fix editorconfig-checker, update docs

ravi9 · web-flow · commit a8e894d895d3 · 2026-02-19T08:03:24.000-08:00
diff --git a/.github/workflows/build-cache.yml b/.github/workflows/build-cache.yml
@@ -68,8 +68,8 @@ jobs:
 
     env:
       # Sync versions in build.yml, release.yml, build-cache.yml, .devops/openvino.Dockerfile
-      OPENVINO_VERSION_MAJOR: "2026.0.1"  
-      OPENVINO_VERSION_FULL: "2026.0.1.20960.91859257636"  
+      OPENVINO_VERSION_MAJOR: "2026.0.1"
+      OPENVINO_VERSION_FULL: "2026.0.1.20960.91859257636"
 
     steps:
       - name: Clone
diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
@@ -748,8 +748,8 @@ jobs:
 
       env:
         # Sync versions in build.yml, release.yml, build-cache.yml, .devops/openvino.Dockerfile
-        OPENVINO_VERSION_MAJOR: "2026.0.1"  
-        OPENVINO_VERSION_FULL: "2026.0.1.20960.91859257636"  
+        OPENVINO_VERSION_MAJOR: "2026.0.1"
+        OPENVINO_VERSION_FULL: "2026.0.1.20960.91859257636"
 
       steps:
         - name: Clone
@@ -789,7 +789,7 @@ jobs:
             cd ./openvino_toolkit
             chmod +x ./install_dependencies/install_openvino_dependencies.sh
             echo "Y" | sudo -E ./install_dependencies/install_openvino_dependencies.sh
-            
+
         - name: Build
           id: cmake_build
           run: |
@@ -799,10 +799,10 @@ jobs:
               -DGGML_OPENVINO=ON
             cmake --build build/ReleaseOV --config Release -j $(nproc)
 
-        - name: Test  
-          id: cmake_test  
-          run: |  
-            cd ${{ github.workspace }}  
+        - name: Test
+          id: cmake_test
+          run: |
+            cd ${{ github.workspace }}
             ctest --test-dir build/ReleaseOV -L main --verbose --timeout 2000
 
   build-linux-cross:
diff --git a/.github/workflows/release.yml b/.github/workflows/release.yml
@@ -240,7 +240,7 @@ jobs:
     env:
       # Sync versions in build.yml, release.yml, build-cache.yml, .devops/openvino.Dockerfile
       OPENVINO_VERSION_MAJOR: "2026.0.1"
-      OPENVINO_VERSION_FULL: "2026.0.1.20960.91859257636"  
+      OPENVINO_VERSION_FULL: "2026.0.1.20960.91859257636"
 
     steps:
       - name: Set OpenVINO version output
@@ -285,7 +285,7 @@ jobs:
           cd ./openvino_toolkit
           chmod +x ./install_dependencies/install_openvino_dependencies.sh
           echo "Y" | sudo -E ./install_dependencies/install_openvino_dependencies.sh
-          
+
       - name: Build
         id: cmake_build
         run: |
diff --git a/CODEOWNERS b/CODEOWNERS
@@ -72,6 +72,7 @@
 /ggml/src/ggml-virtgpu/                 @kpouget
 /ggml/src/ggml-webgpu/                  @reeselevine
 /ggml/src/ggml-zdnn/                    @taronaeo @Andreas-Krebbel @AlekseiNikiforovIBM
+/ggml/src/ggml-openvino/                @cavusmustafa @wine99
 /ggml/src/ggml.c                        @ggerganov
 /ggml/src/ggml.cpp                      @ggerganov
 /ggml/src/gguf.cpp                      @JohannesGaessler @Green-Sky
diff --git a/README.md b/README.md
@@ -277,6 +277,7 @@ Instructions for adding support for new models: [HOWTO-add-model.md](docs/develo
 | [BLAS](docs/build.md#blas-build) | All |
 | [BLIS](docs/backend/BLIS.md) | All |
 | [SYCL](docs/backend/SYCL.md) | Intel and Nvidia GPU |
+| [OpenVINO [In Progress]](docs/backend/openvino.md) | Intel CPUs, GPUs, and NPUs |
 | [MUSA](docs/build.md#musa) | Moore Threads GPU |
 | [CUDA](docs/build.md#cuda) | Nvidia GPU |
 | [HIP](docs/build.md#hip) | AMD GPU |
diff --git a/docs/backend/OPENVINO.md b/docs/backend/OPENVINO.md
@@ -1,6 +1,6 @@
 # OpenVINO Backend for llama.cpp
 
-This document describes the OpenVINO backend for `llama.cpp`, which enables hardware-accelerated inference on **Intel® CPUs, GPUs, and NPUs** while remaining compatible with the existing **GGUF model ecosystem**.
+This document describes the [OpenVINO](https://docs.openvino.ai/) backend for `llama.cpp`, which enables hardware-accelerated inference on **Intel® CPUs, GPUs, and NPUs** while remaining compatible with the existing **GGUF model ecosystem**.
 
 The backend translates GGML compute graphs into OpenVINO graphs and leverages graph compilation, kernel fusion, and device-specific optimizations to improve inference performance on supported Intel hardware.
 
@@ -20,7 +20,7 @@ Although OpenVINO supports a wide range of [Intel hardware](https://docs.openvin
 
 ## Supported Model Precisions
 
-- `FP16` 
+- `FP16`
 - `BF16` (on Intel Xeon)
 - `Q4_0`
 - `Q4_1`
@@ -112,7 +112,7 @@ GGML_OPENVINO_DEVICE=GPU ./llama-bench -fa 1
 - Does not support llama-server -np > 1 (multiple parallel sequences)
 - Only supports llama-perplexity -b 512 or smaller
 
-## Llama.cpp Tools 
+## Llama.cpp Tools
 
 The following tools work with the OpenVINO backend on CPU and GPU: llama-simple, llama-run, llama-cli, llama-server, llama-bench, llama-perplexity.
 
diff --git a/docs/build.md b/docs/build.md
@@ -777,7 +777,7 @@ Follow the instructions below to install OpenVINO runtime and build llama.cpp wi
 - **Linux:**
 
     <details>
-    <summary>📦 Click to expand OpenVINO 2025.3 installation from an archive file on Ubuntu</summary>
+    <summary>📦 Click to expand OpenVINO installation from an archive file on Ubuntu</summary>
     <br>
 
     ```bash
@@ -852,6 +852,20 @@ Control OpenVINO behavior using these environment variables:
 -   **`GGML_OPENVINO_DUMP_CGRAPH`**: Save compute graph to `cgraph.txt`.
 -   **`GGML_OPENVINO_DUMP_IR`**: Export OpenVINO IR files with timestamps.
 
+| Variable | Description |
+|--------|-------------|
+| `GGML_OPENVINO_DEVICE` | Specify the target device for OpenVINO inference.  If not set, automatically selects the first available device in priority order: GPU, CPU, NPU. When set to `NPU` to use Intel NPUs, it enables  |
+| `GGML_OPENVINO_CACHE_DIR` | Directory for OpenVINO model caching (recommended: `/tmp/ov_cache`). If set, enables model caching in OpenVINO. Note: Not supported when using NPU devices yet. |
+| `GGML_OPENVINO_PROFILING` | Enable execution-time profiling. |
+| `GGML_OPENVINO_DUMP_CGRAPH` | Save the GGML compute graph to `cgraph.txt`. |
+| `GGML_OPENVINO_DUMP_IR` | Export OpenVINO IR files with timestamps. |
+| `GGML_OPENVINO_DEBUG_INPUT` | Enable input debugging. |
+| `GGML_OPENVINO_DEBUG_OUTPUT` | Enable output debugging. |
+| `GGML_OPENVINO_STATEFUL_EXECUTION` | Enable stateful execution for better performance. |
+
+> [!NOTE]
+>`GGML_OPENVINO_STATEFUL_EXECUTION` is an **Experimental** feature to allow stateful execution for managing the KV cache internally inside the OpenVINO model, improving performance on CPUs and GPUs. Stateful execution is not effective on NPUs, and not all models currently support this feature. This feature is experimental and has been validated only with the llama-simple, llama-cli, llama-bench, and llama-run applications and is recommended to enable for the best performance. Other applications, such as llama-server and llama-perplexity, are not yet supported.
+
 ### Example with Profiling
 
 ```bash