Skip to content

Commit ce38a4d

Browse files
hexagon: enable offloading to Hexagon on Windows on Snapdragon (ggml-org#19150)
* hexagon: updates to enable offloading to HTP on WoS * Update windows.md * Update windows.md * hexagon: enable -O3 optimizations * hexagon: move all _WINDOWS conditional compilation to _WIN32 * hexagon: updates to enable offloading to HTP on WoS * hexagon: use run-time vs load-time dynamic linking for cdsp driver interface * refactor htp-drv * hexagon: add run-bench.ps1 script * hexagon: htdrv refactor * hexagon: unify Android and Windows build readmes * hexagon: update README.md * hexagon: refactor htpdrv * hexagon: drv refactor * hexagon: more drv refactor * hexagon: fixes for android builds * hexagon: factor out dl into ggml-backend-dl * hexagon: add run-tool.ps1 script * hexagon: merge htp-utils in htp-drv and remove unused code * wos: no need for getopt_custom.h * wos: add missing CR in htpdrv * hexagon: ndev enforecement applies only to the Android devices * hexagon: add support for generating and signing .cat file * hexagon: add .inf file * hexagon: working auto-signing and improved windows builds * hexagon: futher improve skel build * hexagon: add rough WoS guide * hexagon: updated windows guide * hexagon: improve cmake handling of certs and logging * hexagon: improve windows setup/build doc * hexagon: more windows readme updates * hexagon: windows readme updates * hexagon: windows readme updates * hexagon: windows readme updates * hexagon: windows readme updates * Update windows.md * Update windows.md * snapdragon: rename docs/backend/hexagon to docs/backends/snapdragon Also added a power shell script to simplify build env setup. * hexagon: remove trailing whitespace and move cmake requirement to user-presets * hexagon: fix CMakeUserPresets path in workflow yaml * hexagon: introduce local version of libdl.h * hexagon: fix src1 reuse logic gpt-oss needs a bigger lookahead window. The check for src[1] itself being quantized was wrong. --------- Co-authored-by: Max Krasnyansky <maxk@qti.qualcomm.com>
1 parent 4fdbc1e commit ce38a4d

21 files changed

Lines changed: 1326 additions & 840 deletions

.github/workflows/build.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1371,7 +1371,7 @@ jobs:
13711371
id: update_presets
13721372
if: ${{ matrix.build == 'arm64-snapdragon' }}
13731373
run: |
1374-
cp docs/backend/hexagon/CMakeUserPresets.json .
1374+
cp docs/backend/snapdragon/CMakeUserPresets.json .
13751375
13761376
- name: Build
13771377
id: ndk_build

docs/backend/hexagon/CMakeUserPresets.json renamed to docs/backend/snapdragon/CMakeUserPresets.json

Lines changed: 18 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,10 @@
11
{
2-
"version": 4,
2+
"version": 5,
3+
"cmakeMinimumRequired": {
4+
"major": 3,
5+
"minor": 28,
6+
"patch": 0
7+
},
38
"configurePresets": [
49
{
510
"name": "arm64-android-snapdragon",
@@ -16,7 +21,9 @@
1621
"CMAKE_CXX_FLAGS_RELEASE": "-O3 -DNDEBUG",
1722
"CMAKE_C_FLAGS_RELWITHDEBINFO": "-O3 -DNDEBUG -g",
1823
"CMAKE_CXX_FLAGS_RELWITHDEBINFO": "-O3 -DNDEBUG -g",
19-
"HEXAGON_SDK_ROOT": "$env{HEXAGON_SDK_ROOT}",
24+
"CMAKE_PREFIX_PATH": "$env{OPENCL_SDK_ROOT}",
25+
"HEXAGON_SDK_ROOT": "$env{HEXAGON_SDK_ROOT}",
26+
"HEXAGON_TOOLS_ROOT": "$env{HEXAGON_TOOLS_ROOT}",
2027
"PREBUILT_LIB_DIR": "android_aarch64",
2128
"GGML_OPENMP": "OFF",
2229
"GGML_LLAMAFILE": "OFF",
@@ -31,7 +38,15 @@
3138
"name": "arm64-windows-snapdragon",
3239
"inherits": [ "base", "arm64-windows-llvm" ],
3340
"cacheVariables": {
34-
"HEXAGON_SDK_ROOT": "$env{HEXAGON_SDK_ROOT}",
41+
"CMAKE_C_FLAGS": "-march=armv8.7a+fp16 -fvectorize -ffp-model=fast -flto -D_GNU_SOURCE",
42+
"CMAKE_CXX_FLAGS": "-march=armv8.7a+fp16 -fvectorize -ffp-model=fast -flto -D_GNU_SOURCE",
43+
"CMAKE_C_FLAGS_RELEASE": "-O3 -DNDEBUG",
44+
"CMAKE_CXX_FLAGS_RELEASE": "-O3 -DNDEBUG",
45+
"CMAKE_C_FLAGS_RELWITHDEBINFO": "-O3 -DNDEBUG -g",
46+
"CMAKE_CXX_FLAGS_RELWITHDEBINFO": "-O3 -DNDEBUG -g",
47+
"CMAKE_PREFIX_PATH": "$env{OPENCL_SDK_ROOT}",
48+
"HEXAGON_SDK_ROOT": "$env{HEXAGON_SDK_ROOT}",
49+
"HEXAGON_TOOLS_ROOT": "$env{HEXAGON_TOOLS_ROOT}",
3550
"PREBUILT_LIB_DIR": "windows_aarch64",
3651
"GGML_OPENMP": "OFF",
3752
"GGML_LLAMAFILE": "OFF",
Lines changed: 44 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
1-
# Snapdragon-based Android devices
1+
# Snapdragon-based devices
22

3-
## How to Build
3+
## Setup
4+
5+
### Android
46

57
The easiest way to build llama.cpp for a Snapdragon-based Android device is using the toolchain Docker image (see github.com/snapdragon-toolchain).
68
This image includes Android NDK, OpenCL SDK, Hexagon SDK, CMake, etc.
@@ -12,7 +14,24 @@ This method works on Linux, macOS, and Windows. macOS and Windows users should i
1214
[d]/> cd /workspace
1315
```
1416

15-
The rest of the Android build process assumes that you're running inside the toolchain container.
17+
Note: The rest of the **Android** build process assumes that you're running inside the toolchain container.
18+
19+
### Windows On Snapdragon
20+
21+
Native Windows 11 arm64 builds has the following tools dependencies:
22+
- MS Visual Studio 2026 (Community Edition or Pro)
23+
- MSVC arm64 standard and runtime libraries
24+
- UCRT and Driver Kit
25+
- LLVM core libraries and Clang compiler (winget)
26+
- CMake, Git, Python (winget)
27+
- Hexagon SDK Community Edition 6.4 or later (see windows.md)
28+
- OpenCL SDK 2.3 or later (see windows.md)
29+
30+
Note: The rest of the **Windows** build process assumes that you're running natively in Powershell.
31+
Adapt below build commands accordingly.
32+
33+
## How to Build
34+
1635
Let's build llama.cpp with CPU, OpenCL, and Hexagon backends via CMake presets:
1736

1837
```
@@ -49,35 +68,37 @@ Preset CMake variables:
4968
To generate an installable "package" simply use cmake --install:
5069

5170
```
52-
[d]/workspace> cmake --install build-snapdragon --prefix pkg-adb/llama.cpp
71+
[d]/workspace> cmake --install build-snapdragon --prefix pkg-snapdragon/llama.cpp
5372
-- Install configuration: "Release"
54-
-- Installing: /workspace/pkg-adb/llama.cpp/lib/libggml-cpu.so
55-
-- Installing: /workspace/pkg-adb/llama.cpp/lib/libggml-opencl.so
56-
-- Installing: /workspace/pkg-adb/llama.cpp/lib/libggml-hexagon.so
57-
-- Installing: /workspace/pkg-adb/llama.cpp/lib/libggml-htp-v73.so
58-
-- Installing: /workspace/pkg-adb/llama.cpp/lib/libggml-htp-v75.so
59-
-- Installing: /workspace/pkg-adb/llama.cpp/lib/libggml-htp-v79.so
60-
-- Installing: /workspace/pkg-adb/llama.cpp/lib/libggml-htp-v81.so
61-
-- Installing: /workspace/pkg-adb/llama.cpp/lib/libggml.so
73+
-- Installing: /workspace/pkg-snapdragon/llama.cpp/lib/libggml-cpu.so
74+
-- Installing: /workspace/pkg-snapdragon/llama.cpp/lib/libggml-opencl.so
75+
-- Installing: /workspace/pkg-snapdragon/llama.cpp/lib/libggml-hexagon.so
76+
-- Installing: /workspace/pkg-snapdragon/llama.cpp/lib/libggml-htp-v73.so
77+
-- Installing: /workspace/pkg-snapdragon/llama.cpp/lib/libggml-htp-v75.so
78+
-- Installing: /workspace/pkg-snapdragon/llama.cpp/lib/libggml-htp-v79.so
79+
-- Installing: /workspace/pkg-snapdragon/llama.cpp/lib/libggml-htp-v81.so
80+
-- Installing: /workspace/pkg-snapdragon/llama.cpp/lib/libggml.so
6281
...
63-
-- Installing: /workspace/pkg-adb/llama.cpp/bin/llama-bench
64-
-- Installing: /workspace/pkg-adb/llama.cpp/bin/llama-cli
82+
-- Installing: /workspace/pkg-snapdragon/llama.cpp/bin/llama-bench
83+
-- Installing: /workspace/pkg-snapdragon/llama.cpp/bin/llama-cli
6584
...
6685
```
6786

6887
## How to Install
6988

89+
### Android
90+
7091
For this step, your device needs to be configured for on-device development.
7192
Please see https://developer.android.com/studio/debug/dev-options for details.
7293

7394
Once ADB is enabled, use `adb push` to install `pkg-snapdragon` on the device.
7495
**Note that the toolchain Docker image doesn't have ADB and doesn't set up the ADB bridge. Please use native ADB on the host.**
7596

7697
```
77-
~/src/llama.cpp$ adb push pkg-adb/llama.cpp /data/local/tmp/
78-
pkg-adb/llama.cpp/bin/: 67 files pushed, 0 skipped. 190.2 MB/s (919095042 bytes in 4.607s)
79-
pkg-adb/llama.cpp/include/: 19 files pushed, 0 skipped. 20.5 MB/s (255173 bytes in 0.012s)
80-
pkg-adb/llama.cpp/lib/: 16 files pushed, 0 skipped. 144.4 MB/s (43801382 bytes in 0.289s)
98+
~/src/llama.cpp$ adb push pkg-snapdragon/llama.cpp /data/local/tmp/
99+
pkg-snapdragon/llama.cpp/bin/: 67 files pushed, 0 skipped. 190.2 MB/s (919095042 bytes in 4.607s)
100+
pkg-snapdragon/llama.cpp/include/: 19 files pushed, 0 skipped. 20.5 MB/s (255173 bytes in 0.012s)
101+
pkg-snapdragon/llama.cpp/lib/: 16 files pushed, 0 skipped. 144.4 MB/s (43801382 bytes in 0.289s)
81102
102 files pushed, 0 skipped. 186.9 MB/s (963151597 bytes in 4.914s)
82103
```
83104

@@ -92,6 +113,11 @@ At this point, you should also install some models:
92113
Llama-3.2-1B-Instruct-Q4_0.gguf: 1 file pushed, 0 skipped. 38.3 MB/s (773025920 bytes in 19.250s)
93114
```
94115

116+
### Windows
117+
118+
All artifacts are already installed in the `pkg-snapdragon` folder.
119+
To run, adapt below instructions to use Powershell scrits in `scripts/snapdragon/windows`.
120+
95121
## How to Run
96122

97123
The easiest way to run llama.cpp cli tools is using provided wrapper scripts that properly set up all required environment variables.
File renamed without changes.

docs/backend/snapdragon/windows.md

Lines changed: 161 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,161 @@
1+
## Overview
2+
3+
The document covers procedures for installing the latest GPU and NPU drivers, and OpenCL and Hexagon SDKs.
4+
5+
6+
In order to use Hexagon NPU on Snapdragon Windows devices the underlying HTP Ops libraries (e.g libggml-htp-v73.so)
7+
must be included in the .cat file digitally signed with a trusted certificate.
8+
9+
This document covers details on how to generate personal certificate files (.pfx) and how to configure the system
10+
to allow for test signatures (aka test-signing).
11+
12+
## Install the latest Adreno OpenCL SDK
13+
14+
Either use the trimmed down version (optimized for CI) from
15+
16+
https://github.com/snapdragon-toolchain/opencl-sdk/releases/download/v2.3.2/adreno-opencl-sdk-v2.3.2-arm64-wos.tar.xz
17+
18+
Or download the complete official version from
19+
20+
https://softwarecenter.qualcomm.com/catalog/item/Adreno_OpenCL_SDK?version=2.3.2
21+
22+
Unzip/untar the archive into
23+
```
24+
c:\Qualcomm\OpenCL_SDK\2.3.2
25+
```
26+
27+
## Install the latest Hexagon SDK Community Edition
28+
29+
Either use the trimmed down version (optimized for CI) from
30+
31+
https://github.com/snapdragon-toolchain/hexagon-sdk/releases/download/v6.4.0.2/hexagon-sdk-v6.4.0.2-arm64-wos.tar.xz
32+
33+
Or download the complete official version from
34+
35+
https://softwarecenter.qualcomm.com/catalog/item/Hexagon_SDK?version=6.4.0.2
36+
37+
Unzip/untar the archive into
38+
```
39+
c:\Qualcomm\Hexagon_SDK\6.4.0.2
40+
```
41+
42+
## Install the latest Adreno GPU driver
43+
44+
Download the driver from
45+
46+
https://softwarecenter.qualcomm.com/catalog/item/Windows_Graphics_Driver
47+
48+
After the automated installation and reboot please make sure that the GPU device shows up in the `Device Manager` (under 'Display Adapters`)
49+
50+
## Install the latest Qualcomm NPU driver
51+
52+
Download the driver from
53+
54+
https://softwarecenter.qualcomm.com/catalog/item/Qualcomm_HND
55+
56+
After the automated installation and reboot please make sure that the Hexagon NPU device shows up in the `Device Manager` (under `Neural Processors`).
57+
58+
If the device is not available you can try installing all components (`qcnspmcdm8380`, `qcnspmcdm8380_ext`) manually.
59+
The components are extracted into
60+
```
61+
c:\QCDrivers\qcnspmcdm...
62+
```
63+
64+
## Enable NPU driver test signatures
65+
66+
Please note that the following steps are required only for the Hexagon NPU.
67+
Adreno GPU backend does not require test signatures.
68+
69+
### Enable testsigning
70+
71+
Use `bcdedit` to enable test-signing
72+
```
73+
> bcdedit /set TESTSIGNING ON
74+
```
75+
(Secure Boot may need to be disabled for this to work)
76+
77+
Make sure test-signing is enabled after reboot
78+
```
79+
> bcdedit /enum
80+
...
81+
testsigning Yes
82+
...
83+
```
84+
For additional details see Microsoft guide at
85+
86+
https://learn.microsoft.com/en-us/windows-hardware/drivers/install/the-testsigning-boot-configuration-option
87+
88+
### Create personal certificate
89+
90+
The tools required for this procedure are available as part of Windows SDK and Windows Driver Kit which should be
91+
installed as part of the MS Visual Studio.
92+
They are typically located at
93+
```
94+
c:\Program Files (x86)\Windows Kits\10\bin\10.0.26100.0
95+
```
96+
(replace 10.0.26100.0 with correct version).
97+
98+
To create personal self-signed certificate run the following commands (either from cmd or power-shell):
99+
```
100+
> cd c:\Users\MyUser
101+
> mkdir Certs
102+
> cd Certs
103+
> makecert -r -pe -ss PrivateCertStore -n CN=GGML.HTP.v1 -eku 1.3.6.1.5.5.7.3.3 -sv ggml-htp-v1.pvk ggml-htp-v1.cer
104+
> pvk2pfx.exe -pvk ggml-htp-v1.pvk -spc ggml-htp-v1.cer -pfx ggml-htp-v1.pfx
105+
```
106+
(replace `MyUser` with your username).
107+
108+
Add this certificate to `Trusted Root Certification Authorities` and `Trusted Publishers` stores.
109+
This can be done using `certlm` Certificate Manager tool.
110+
Right click on the certificate store, select `All Tasks -> Import` and follow the prompts to import the certificate from the
111+
PFX file you created above.
112+
113+
For additional details see Microsoft guide at
114+
115+
https://learn.microsoft.com/en-us/windows-hardware/drivers/install/introduction-to-test-signing
116+
117+
Make sure to save the PFX file, you will need it for the build procedures.
118+
Please note that the same certificate can be used for signing any number of builds.
119+
120+
## Build Hexagon backend with signed HTP ops libraries
121+
122+
The overall Hexagon backend build procedure for Windows on Snapdragon is the same as for other platforms.
123+
However, additional settings are required for generating and signing HTP Ops libraries.
124+
```
125+
> $env:OPENCL_SDK_ROOT="C:\Qualcomm\OpenCL_SDK\2.3.2"
126+
> $env:HEXAGON_SDK_ROOT="C:\Qualcomm\Hexagon_SDK\6.4.0.2"
127+
> $env:HEXAGON_TOOLS_ROOT="C:\Qualcomm\Hexagon_SDK\6.4.0.2\tools\HEXAGON_Tools\19.0.04"
128+
> $env:HEXAGON_HTP_CERT="c:\Users\MyUsers\Certs\ggml-htp-v1.pfx"
129+
> $env:WINDOWS_SDK_BIN="C:\Program Files (x86)\Windows Kits\10\bin\10.0.26100.0\arm64"
130+
131+
> cmake --preset arm64-windows-snapdragon -B build-wos
132+
...
133+
> cmake --install build-wos --prefix pkg-snapdragon
134+
```
135+
136+
Once the build is complete HTP ops libraries will be installed like this
137+
```
138+
> dir pkg-snapdragon/lib
139+
...
140+
-a---- 1/22/2026 6:01 PM 187656 libggml-htp-v73.so
141+
-a---- 1/22/2026 6:01 PM 191752 libggml-htp-v75.so
142+
-a---- 1/22/2026 6:01 PM 187656 libggml-htp-v79.so
143+
-a---- 1/22/2026 6:01 PM 187656 libggml-htp-v81.so
144+
-a---- 1/22/2026 6:01 PM 4139 libggml-htp.cat
145+
```
146+
147+
The .cat file, the signature and proper certicate installation can be verified with
148+
149+
```
150+
> signtool.exe verify /v /pa .\pkg-snapdragon\lib\libggml-htp.cat
151+
Verifying: .\pkg-snapdragon\lib\libggml-htp.cat
152+
153+
Signature Index: 0 (Primary Signature)
154+
Hash of file (sha256): 9820C664DA59D5EAE31DBB664127FCDAEF59CDC31502496BC567544EC2F401CF
155+
156+
Signing Certificate Chain:
157+
Issued to: GGML.HTP.v1
158+
...
159+
Successfully verified: .\pkg-snapdragon\lib\libggml-htp.cat
160+
...
161+
```

ggml/src/CMakeLists.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -222,6 +222,7 @@ if (GGML_SCHED_NO_REALLOC)
222222
endif()
223223

224224
add_library(ggml
225+
ggml-backend-dl.cpp
225226
ggml-backend-reg.cpp)
226227
add_library(ggml::ggml ALIAS ggml)
227228

ggml/src/ggml-backend-dl.cpp

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
#include "ggml-backend-dl.h"
2+
3+
#ifdef _WIN32
4+
5+
dl_handle * dl_load_library(const fs::path & path) {
6+
// suppress error dialogs for missing DLLs
7+
DWORD old_mode = SetErrorMode(SEM_FAILCRITICALERRORS);
8+
SetErrorMode(old_mode | SEM_FAILCRITICALERRORS);
9+
10+
HMODULE handle = LoadLibraryW(path.wstring().c_str());
11+
12+
SetErrorMode(old_mode);
13+
14+
return handle;
15+
}
16+
17+
void * dl_get_sym(dl_handle * handle, const char * name) {
18+
DWORD old_mode = SetErrorMode(SEM_FAILCRITICALERRORS);
19+
SetErrorMode(old_mode | SEM_FAILCRITICALERRORS);
20+
21+
void * p = (void *) GetProcAddress(handle, name);
22+
23+
SetErrorMode(old_mode);
24+
25+
return p;
26+
}
27+
28+
const char * dl_error() {
29+
return "";
30+
}
31+
32+
#else
33+
34+
dl_handle * dl_load_library(const fs::path & path) {
35+
dl_handle * handle = dlopen(path.string().c_str(), RTLD_NOW | RTLD_LOCAL);
36+
return handle;
37+
}
38+
39+
void * dl_get_sym(dl_handle * handle, const char * name) {
40+
return dlsym(handle, name);
41+
}
42+
43+
const char * dl_error() {
44+
const char *rslt = dlerror();
45+
return rslt != nullptr ? rslt : "";
46+
}
47+
48+
#endif

0 commit comments

Comments
 (0)