Commit 632219a
authored
CANN: fix multi-thread set_tensor race conditions (#20151)
* CANN: fix multi-thread set_tensor race conditions
When ollama calls ggml_backend_tensor_set from multiple threads (each
writing a different chunk of the same tensor), the CANN backend had
three concurrency issues:
1. Quantized tensors (Q4_0/Q8_0) require a full-tensor format transform
before uploading to device. Per-chunk transforms produced corrupt data.
2. ND-to-NZ weight conversion requires complete tensor data on device.
Per-chunk conversion operated on incomplete data.
3. The global g_nz_workspaces array had unprotected concurrent access.
Fix by introducing a TensorSetTracker that accumulates write progress
per tensor. For quantized tensors, raw data is staged in a host buffer
and the transform + upload is deferred until all chunks arrive. For NZ
weights, chunks are uploaded directly but conversion is deferred. The
tracker and its staging buffer are released immediately after
post-processing completes.
Add per-device mutex to g_nz_workspaces to prevent data races.
* CANN: fix L2_NORM ignoring eps parameter
The L2_NORM implementation was not using the eps parameter from
op_params, causing incorrect results when eps is large (e.g. 10.0).
The CPU reference computes scale = 1/fmaxf(norm, eps), so add a
Clamp step to clamp the norm to at least eps before dividing.
* ggml/cann: compare op_params for POOL_2D in ACL graph cache matching
When ACL graph mode is enabled, the graph LRU cache checks whether a
cached graph matches the current computation graph. Previously,
GGML_OP_POOL_2D was not included in the op_params comparison, so two
POOL_2D nodes with different pooling parameters (kernel size, stride,
padding) but identical tensor shapes and addresses could incorrectly
reuse a cached graph, leading to wrong results or aclnn errors.
Add GGML_OP_POOL_2D to the list of ops that require op_params matching
in ggml_graph_node_properties::has_matching_properties().
* cann: fix ACL graph cache matching by adding tensor type and unconditional op_params comparison
The ACL graph LRU cache was incorrectly reusing cached graphs for
operations with different tensor types or op_params, causing test
failures for CPY (f16 vs bf16), POOL_2D, L2_NORM, NORM_MUL_ADD,
RMS_NORM_MUL_ADD, and ADD_RMS_NORM.
Changes:
- Add node_type and src_type[] fields to ggml_graph_node_properties
so the cache can distinguish tensors with different types but
identical ne/nb (e.g. f16 and bf16 both have 2-byte elements)
- Compare op_params unconditionally for all ops instead of only for
SCALE/UNARY/GLU/ROPE/POOL_2D1 parent 4a00bbf commit 632219a
3 files changed
Lines changed: 145 additions & 24 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
434 | 434 | | |
435 | 435 | | |
436 | 436 | | |
| 437 | + | |
| 438 | + | |
| 439 | + | |
437 | 440 | | |
438 | 441 | | |
439 | 442 | | |
| |||
456 | 459 | | |
457 | 460 | | |
458 | 461 | | |
| 462 | + | |
| 463 | + | |
| 464 | + | |
| 465 | + | |
| 466 | + | |
| 467 | + | |
| 468 | + | |
459 | 469 | | |
460 | 470 | | |
461 | 471 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
216 | 216 | | |
217 | 217 | | |
218 | 218 | | |
219 | | - | |
220 | | - | |
221 | | - | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
222 | 223 | | |
223 | 224 | | |
224 | | - | |
225 | | - | |
226 | | - | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
227 | 229 | | |
228 | 230 | | |
229 | 231 | | |
| |||
247 | 249 | | |
248 | 250 | | |
249 | 251 | | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
250 | 256 | | |
251 | 257 | | |
252 | 258 | | |
| |||
262 | 268 | | |
263 | 269 | | |
264 | 270 | | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
265 | 275 | | |
266 | 276 | | |
267 | 277 | | |
| |||
277 | 287 | | |
278 | 288 | | |
279 | 289 | | |
280 | | - | |
281 | | - | |
282 | | - | |
283 | | - | |
| 290 | + | |
284 | 291 | | |
285 | 292 | | |
286 | 293 | | |
| |||
322 | 329 | | |
323 | 330 | | |
324 | 331 | | |
| 332 | + | |
325 | 333 | | |
326 | 334 | | |
327 | 335 | | |
328 | 336 | | |
329 | 337 | | |
330 | 338 | | |
331 | 339 | | |
| 340 | + | |
332 | 341 | | |
333 | 342 | | |
334 | 343 | | |
335 | 344 | | |
| 345 | + | |
336 | 346 | | |
337 | 347 | | |
338 | 348 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
36 | 36 | | |
37 | 37 | | |
38 | 38 | | |
| 39 | + | |
39 | 40 | | |
40 | 41 | | |
41 | 42 | | |
| 43 | + | |
42 | 44 | | |
| 45 | + | |
43 | 46 | | |
44 | 47 | | |
45 | 48 | | |
| |||
770 | 773 | | |
771 | 774 | | |
772 | 775 | | |
| 776 | + | |
| 777 | + | |
| 778 | + | |
| 779 | + | |
| 780 | + | |
| 781 | + | |
| 782 | + | |
| 783 | + | |
| 784 | + | |
| 785 | + | |
| 786 | + | |
| 787 | + | |
| 788 | + | |
| 789 | + | |
| 790 | + | |
773 | 791 | | |
774 | 792 | | |
775 | 793 | | |
| |||
780 | 798 | | |
781 | 799 | | |
782 | 800 | | |
| 801 | + | |
| 802 | + | |
| 803 | + | |
783 | 804 | | |
784 | 805 | | |
785 | 806 | | |
| |||
792 | 813 | | |
793 | 814 | | |
794 | 815 | | |
| 816 | + | |
| 817 | + | |
| 818 | + | |
| 819 | + | |
| 820 | + | |
| 821 | + | |
| 822 | + | |
| 823 | + | |
| 824 | + | |
| 825 | + | |
| 826 | + | |
| 827 | + | |
| 828 | + | |
| 829 | + | |
| 830 | + | |
| 831 | + | |
| 832 | + | |
| 833 | + | |
| 834 | + | |
| 835 | + | |
| 836 | + | |
| 837 | + | |
| 838 | + | |
| 839 | + | |
| 840 | + | |
795 | 841 | | |
796 | 842 | | |
797 | 843 | | |
| |||
1124 | 1170 | | |
1125 | 1171 | | |
1126 | 1172 | | |
| 1173 | + | |
1127 | 1174 | | |
1128 | 1175 | | |
1129 | 1176 | | |
| |||
1190 | 1237 | | |
1191 | 1238 | | |
1192 | 1239 | | |
1193 | | - | |
1194 | | - | |
| 1240 | + | |
| 1241 | + | |
1195 | 1242 | | |
1196 | 1243 | | |
1197 | 1244 | | |
1198 | 1245 | | |
1199 | 1246 | | |
| 1247 | + | |
| 1248 | + | |
1200 | 1249 | | |
1201 | 1250 | | |
1202 | 1251 | | |
| |||
1210 | 1259 | | |
1211 | 1260 | | |
1212 | 1261 | | |
1213 | | - | |
| 1262 | + | |
| 1263 | + | |
| 1264 | + | |
| 1265 | + | |
| 1266 | + | |
| 1267 | + | |
| 1268 | + | |
1214 | 1269 | | |
1215 | 1270 | | |
1216 | 1271 | | |
| |||
1226 | 1281 | | |
1227 | 1282 | | |
1228 | 1283 | | |
1229 | | - | |
1230 | | - | |
1231 | | - | |
1232 | 1284 | | |
1233 | 1285 | | |
1234 | 1286 | | |
1235 | | - | |
| 1287 | + | |
| 1288 | + | |
| 1289 | + | |
| 1290 | + | |
| 1291 | + | |
| 1292 | + | |
| 1293 | + | |
1236 | 1294 | | |
1237 | | - | |
1238 | | - | |
| 1295 | + | |
| 1296 | + | |
| 1297 | + | |
| 1298 | + | |
| 1299 | + | |
| 1300 | + | |
| 1301 | + | |
| 1302 | + | |
| 1303 | + | |
| 1304 | + | |
| 1305 | + | |
| 1306 | + | |
1239 | 1307 | | |
1240 | 1308 | | |
1241 | | - | |
| 1309 | + | |
| 1310 | + | |
1242 | 1311 | | |
| 1312 | + | |
| 1313 | + | |
| 1314 | + | |
| 1315 | + | |
| 1316 | + | |
| 1317 | + | |
| 1318 | + | |
| 1319 | + | |
| 1320 | + | |
| 1321 | + | |
| 1322 | + | |
| 1323 | + | |
| 1324 | + | |
1243 | 1325 | | |
1244 | | - | |
1245 | | - | |
| 1326 | + | |
| 1327 | + | |
| 1328 | + | |
1246 | 1329 | | |
1247 | | - | |
1248 | | - | |
| 1330 | + | |
| 1331 | + | |
| 1332 | + | |
| 1333 | + | |
| 1334 | + | |
| 1335 | + | |
| 1336 | + | |
| 1337 | + | |
| 1338 | + | |
| 1339 | + | |
| 1340 | + | |
| 1341 | + | |
| 1342 | + | |
| 1343 | + | |
| 1344 | + | |
| 1345 | + | |
| 1346 | + | |
| 1347 | + | |
| 1348 | + | |
| 1349 | + | |
1249 | 1350 | | |
1250 | 1351 | | |
1251 | 1352 | | |
| |||
0 commit comments