Skip to content

Commit 7d4d311

Browse files
authored
Update docs
1 parent c6ee7c5 commit 7d4d311

2 files changed

Lines changed: 22 additions & 19 deletions

File tree

docs/backend/OPENVINO.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -73,16 +73,17 @@ The OpenVINO backend can be configured using the following environment variables
7373

7474
| Variable | Description |
7575
|--------|-------------|
76-
| `GGML_OPENVINO_DEVICE` | Specify the target device (`CPU`, `GPU`, `NPU`). If not set, the backend automatically selects the first available device in priority order: **GPU → CPU → NPU**. When set to `NPU`, static compilation mode is enabled for optimal performance. |
76+
| `GGML_OPENVINO_DEVICE` | Specify the target device (`CPU`, `GPU`, `NPU`). When set to `NPU`, static compilation mode is enabled for optimal performance. |
7777
| `GGML_OPENVINO_CACHE_DIR` | Directory for OpenVINO model caching (recommended: `/tmp/ov_cache`). Enables model caching when set. **Not supported on NPU devices.** |
7878
| `GGML_OPENVINO_PROFILING` | Enable execution-time profiling. |
7979
| `GGML_OPENVINO_DUMP_CGRAPH` | Dump the GGML compute graph to `cgraph.txt`. |
8080
| `GGML_OPENVINO_DUMP_IR` | Export OpenVINO IR files with timestamps. |
8181
| `GGML_OPENVINO_DEBUG_INPUT` | Enable input debugging. |
8282
| `GGML_OPENVINO_DEBUG_OUTPUT` | Enable output debugging. |
83-
| *`GGML_OPENVINO_STATEFUL_EXECUTION` | Enable stateful execution for better performance |
83+
| `GGML_OPENVINO_STATEFUL_EXECUTION` | Enable stateful execution for better performance |
8484

85-
*`GGML_OPENVINO_STATEFUL_EXECUTION` is an **Experimental** feature to allow stateful execution for managing the KV cache internally inside the OpenVINO model, improving performance on CPUs and GPUs. Stateful execution is not effective on NPUs, and not all models currently support this feature. This feature is experimental and has been validated only with the llama-simple, llama-cli, llama-bench, and llama-run applications and is recommended to enable for the best performance. Other applications, such as llama-server and llama-perplexity, are not yet supported.
85+
> [!NOTE]
86+
>`GGML_OPENVINO_STATEFUL_EXECUTION` is an **Experimental** feature to allow stateful execution for managing the KV cache internally inside the OpenVINO model, improving performance on CPUs and GPUs. Stateful execution is not effective on NPUs, and not all models currently support this feature. This feature is experimental and has been validated only with the llama-simple, llama-cli, llama-bench, and llama-run applications and is recommended to enable for the best performance. Other applications, such as llama-server and llama-perplexity, are not yet supported.
8687
8788
### Example Usage
8889

docs/build.md

Lines changed: 18 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -735,7 +735,7 @@ To read documentation for how to build on IBM Z & LinuxONE, [click here](./build
735735

736736
## OpenVINO
737737

738-
[OpenVINO](https://docs.openvino.ai/2025/index.html) is an open-source toolkit for optimizing and deploying high-performance AI inference, specifically designed for Intel hardware, including CPUs, GPUs, and NPUs, in the cloud, on-premises, and on the edge.
738+
[OpenVINO](https://docs.openvino.ai/) is an open-source toolkit for optimizing and deploying high-performance AI inference, specifically designed for Intel hardware, including CPUs, GPUs, and NPUs, in the cloud, on-premises, and on the edge.
739739
The OpenVINO backend enhances performance by leveraging hardware-specific optimizations and can be enabled for use with llama.cpp.
740740

741741
Follow the instructions below to install OpenVINO runtime and build llama.cpp with OpenVINO support. For more detailed information on OpenVINO backend, refer to [OPENVINO.md](backend/OPENVINO.md)
@@ -753,12 +753,11 @@ Follow the instructions below to install OpenVINO runtime and build llama.cpp wi
753753
```
754754
- OpenCL
755755
```bash
756-
sudo apt install ocl-icd-opencl-dev opencl-headers opencl-clhpp-headers intel-opencl-icd
756+
sudo apt install ocl-icd-opencl-dev opencl-headers opencl-clhpp-headers intel-opencl-icd
757757
```
758758

759759
- **Windows:**
760-
- Download Microsoft.VisualStudio.2022.BuildTools: [Visual_Studio_Build_Tools](https://aka.ms/vs/17/release/vs_BuildTools.exe)
761-
Select "Desktop development with C++" under workloads
760+
- Download Microsoft.VisualStudio.2022.BuildTools: [Visual_Studio_Build_Tools](https://aka.ms/vs/17/release/vs_BuildTools.exe) and select "Desktop development with C++" under workloads
762761
- Install git
763762
- Install OpenCL with vcpkg
764763
```powershell
@@ -768,7 +767,8 @@ Follow the instructions below to install OpenVINO runtime and build llama.cpp wi
768767
bootstrap-vcpkg.bat
769768
vcpkg install opencl
770769
```
771-
- Use "x64 Native Tools Command Prompt" for Build
770+
> [!NOTE]
771+
> Use `x64 Native Tools Command Prompt` for Windows build.
772772
773773
### 1. Install OpenVINO Runtime
774774
@@ -811,8 +811,8 @@ git switch dev_backend_openvino
811811
```
812812
813813
- **Windows:**
814-
```bash
815-
"C:\Program Files (x86)\Intel\openvino_2025.3.0\setupvars.bat"
814+
```cmd
815+
"C:\Program Files (x86)\Intel\openvino_2026.0.1\setupvars.bat"
816816
cmake -B build\ReleaseOV -G Ninja -DCMAKE_BUILD_TYPE=Release -DGGML_OPENVINO=ON -DGGML_CPU_REPACK=OFF -DLLAMA_CURL=OFF -DCMAKE_TOOLCHAIN_FILE=C:\vcpkg\scripts\buildsystems\vcpkg.cmake
817817
cmake --build build\ReleaseOV --parallel
818818
```
@@ -831,10 +831,18 @@ wget https://huggingface.co/unsloth/Llama-3.2-1B-Instruct-GGUF/resolve/main/Llam
831831
832832
When using the OpenVINO backend, the first inference token may have slightly higher latency due to on-the-fly conversion to the OpenVINO graph. Subsequent tokens and runs will be faster.
833833
834-
```bash
835-
# If device is unset or unavailable, default to CPU.
834+
```
835+
# Linux
836+
# If device is unset or unavailable, defaults to CPU.
836837
export GGML_OPENVINO_DEVICE=GPU
837838
./build/ReleaseOV/bin/llama-simple -m ~/models/Llama-3.2-1B-Instruct-Q4_0.gguf -n 50 "The story of AI is "
839+
840+
# Windows Command Line
841+
set GGML_OPENVINO_DEVICE=GPU
842+
# Windows PowerShell
843+
$env:GGML_OPENVINO_DEVICE = "GPU"
844+
845+
build\ReleaseOV\bin\llama-simple.exe -m "C:\models\Llama-3.2-1B-Instruct-Q4_0.gguf" -n 50 "The story of AI is "
838846
```
839847
840848
To run in chat mode:
@@ -846,15 +854,9 @@ To run in chat mode:
846854
847855
Control OpenVINO behavior using these environment variables:
848856
849-
- **`GGML_OPENVINO_DEVICE`**: Specify the target device for OpenVINO inference. If not set, automatically selects the first available device in priority order: GPU, CPU, NPU. When set to `NPU` to use Intel NPUs, it enables static compilation mode for optimal performance.
850-
- **`GGML_OPENVINO_CACHE_DIR`**: Directory for model caching (recommended: `/tmp/ov_cache`). If set, enables model caching in OpenVINO. Note: Not supported when using NPU devices yet.
851-
- **`GGML_OPENVINO_PROFILING`**: Enable execution time profiling.
852-
- **`GGML_OPENVINO_DUMP_CGRAPH`**: Save compute graph to `cgraph.txt`.
853-
- **`GGML_OPENVINO_DUMP_IR`**: Export OpenVINO IR files with timestamps.
854-
855857
| Variable | Description |
856858
|--------|-------------|
857-
| `GGML_OPENVINO_DEVICE` | Specify the target device for OpenVINO inference. If not set, automatically selects the first available device in priority order: GPU, CPU, NPU. When set to `NPU` to use Intel NPUs, it enables |
859+
| `GGML_OPENVINO_DEVICE` | Specify the target device for OpenVINO inference. When set to `NPU`, static compilation mode is enabled for optimal performance. |
858860
| `GGML_OPENVINO_CACHE_DIR` | Directory for OpenVINO model caching (recommended: `/tmp/ov_cache`). If set, enables model caching in OpenVINO. Note: Not supported when using NPU devices yet. |
859861
| `GGML_OPENVINO_PROFILING` | Enable execution-time profiling. |
860862
| `GGML_OPENVINO_DUMP_CGRAPH` | Save the GGML compute graph to `cgraph.txt`. |

0 commit comments

Comments
 (0)