You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ggml-zendnn: update code for latest ZenDNN API (ggml-org#19923)
- adapt ggml-zendnn.cpp to the new lowoha::matmul interface
- update the ZenDNN git tag in CMake to the latest release (ZenDNN‑2026‑WW08)
- add static lib support in CMake
For the latest list of supported operating systems, see the [ZenDNN Supported OS](https://github.com/amd/ZenDNN/blob/zendnnl/README.md#15-supported-os).
35
+
For the latest list of supported operating systems, see the [ZenDNN Supported OS](https://github.com/amd/ZenDNN/blob/a18adf8c605fb5f5e52cefd7eda08a7b18febbaf/README.md#15-supported-os).
36
36
37
37
## Hardware
38
38
@@ -44,9 +44,9 @@ ZenDNN is optimized for AMD EPYC™ processors and AMD Ryzen™ processors based
| MUL_MAT |Support| Accelerated via ZenDNN LowOHA MatMul |
65
65
66
66
*Note:* Since only MUL_MAT is accelerated, models will benefit most from ZenDNN when matrix multiplications dominate the computational workload (which is typical for transformer-based LLMs).
67
67
@@ -104,7 +104,6 @@ If you want to build ZenDNN yourself or use a specific version:
104
104
# Clone ZenDNN repository
105
105
git clone https://github.com/amd/ZenDNN.git
106
106
cd ZenDNN
107
-
git checkout zendnnl
108
107
109
108
# Build and install (requires CMake >= 3.25)
110
109
mkdir build &&cd build
@@ -114,7 +113,7 @@ cmake --build . --target all
114
113
115
114
Default installation path: `ZenDNN/build/install`
116
115
117
-
**For detailed build instructions**, refer to the [ZenDNN README](https://github.com/amd/ZenDNN/blob/zendnnl/README.md).
116
+
**For detailed build instructions**, refer to the [ZenDNN README](https://github.com/amd/ZenDNN/blob/a18adf8c605fb5f5e52cefd7eda08a7b18febbaf/README.md).
118
117
119
118
**Step 2: Build llama.cpp with custom ZenDNN path**
120
119
@@ -146,8 +145,7 @@ Run llama.cpp server with ZenDNN acceleration:
146
145
147
146
```sh
148
147
# Set optimal configuration
149
-
export OMP_NUM_THREADS=64 # Adjust to your CPU core count
150
-
export ZENDNNL_MATMUL_ALGO=2 # Blocked AOCL BLIS for best performance
148
+
export ZENDNNL_MATMUL_ALGO=1 # Blocked AOCL DLP algo for best performance
151
149
152
150
# Start server
153
151
./build/bin/llama-server \
@@ -160,62 +158,26 @@ export ZENDNNL_MATMUL_ALGO=2 # Blocked AOCL BLIS for best performance
160
158
Access the server at `http://localhost:8080`.
161
159
162
160
**Performance tips**:
163
-
- Set `OMP_NUM_THREADS` to match your physical core count
164
-
- Use `ZENDNNL_MATMUL_ALGO=2` for optimal performance
161
+
- Use `ZENDNNL_MATMUL_ALGO=1` for optimal performance
165
162
- For NUMA systems: `numactl --cpunodebind=0 --membind=0 ./build/bin/llama-server ...`
166
163
167
164
## Environment Variable
168
165
169
-
### Build Time
166
+
For environment variables related to ZenDNN, refer to the [ZenDNN Environment Variables Documentation](https://github.com/amd/ZenDNN/blob/a18adf8c605fb5f5e52cefd7eda08a7b18febbaf/docs/runtime_env.md).
For more details on available algorithms, see the [ZenDNN MatMul Algorithm Documentation](https://github.com/amd/ZenDNN/blob/a18adf8c605fb5f5e52cefd7eda08a7b18febbaf/docs/runtime_env.md#algorithm-details).
215
177
216
178
### Profiling and Debugging
217
179
218
-
For detailed profiling and logging options, refer to the [ZenDNN Logging Documentation](https://github.com/amd/ZenDNN/blob/zendnnl/docs/logging.md).
180
+
For detailed profiling and logging options, refer to the [ZenDNN Logging Documentation](https://github.com/amd/ZenDNN/blob/a18adf8c605fb5f5e52cefd7eda08a7b18febbaf/docs/logging.md).
219
181
220
182
## Known Issues
221
183
@@ -245,10 +207,9 @@ A: Currently, ZenDNN primarily supports FP32 and BF16 data types. Quantized mode
245
207
246
208
A: Ensure:
247
209
1. You're using an AMD EPYC or Ryzen processor (Zen 2 or newer)
248
-
2.`OMP_NUM_THREADS` is set appropriately (physical core count)
249
-
3.`ZENDNNL_MATMUL_ALGO=2` is set for best performance (Blocked AOCL BLIS)
250
-
4. You're using a sufficiently large model (small models may not benefit as much)
251
-
5. Enable profiling to verify ZenDNN MatMul is being called
210
+
2.`ZENDNNL_MATMUL_ALGO=1` is set for best performance (Blocked AOCL DLP)
211
+
3. You're using a sufficiently large model (small models may not benefit as much)
212
+
4. Enable profiling to verify ZenDNN MatMul is being called
252
213
253
214
### **GitHub Contribution**:
254
215
Please add the **[ZenDNN]** prefix/tag in issues/PRs titles to help the ZenDNN-team check/address them without delay.
0 commit comments