Commit 91a6b57
authored
feat(cuda): fuse narrower-than-output Dict codes and RunEnd ends (#7617)
Dict codes and RunEnd ends that are narrower than the output type (e.g.
u8 BitPacked codes in a u32 Dict) previously required a separate kernel
launch. They are now fused by decoding at the source's native width and
widening to T in shared memory.
---
Fixes the race from #7603 by
applying the type widening with one warp within a block. Further this PR
adds benchmarks exercising the widening logic:
```
Benchmarking dict_widen_u8_to_u32/dynamic_dispatch_u32/100M: Warming up for 1.0000 ns
Warning: Unable to complete 10 samples in 1.0ns. You may wish to increase target time to 48.4ms.
dict_widen_u8_to_u32/dynamic_dispatch_u32/100M
time: [203.97 µs 204.37 µs 204.78 µs]
thrpt: [1819.2 GiB/s 1822.8 GiB/s 1826.4 GiB/s]
Benchmarking dict_widen_u16_to_u32/dynamic_dispatch_u32/100M: Warming up for 1.0000 ns
Warning: Unable to complete 10 samples in 1.0ns. You may wish to increase target time to 50.4ms.
dict_widen_u16_to_u32/dynamic_dispatch_u32/100M
time: [203.74 µs 204.92 µs 205.15 µs]
thrpt: [1815.9 GiB/s 1817.9 GiB/s 1828.5 GiB/s]
Benchmarking dict_nowiden_u32_to_u32/dynamic_dispatch_u32/100M: Warming up for 1.0000 ns
Warning: Unable to complete 10 samples in 1.0ns. You may wish to increase target time to 49.6ms.
dict_nowiden_u32_to_u32/dynamic_dispatch_u32/100M
time: [170.86 µs 171.18 µs 171.59 µs]
thrpt: [2171.0 GiB/s 2176.2 GiB/s 2180.3 GiB/s]
```
Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>1 parent 543dbe7 commit 91a6b57
5 files changed
Lines changed: 529 additions & 133 deletions
File tree
- vortex-cuda
- benches
- kernels/src
- src/dynamic_dispatch
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
425 | 425 | | |
426 | 426 | | |
427 | 427 | | |
| 428 | + | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
| 433 | + | |
| 434 | + | |
| 435 | + | |
| 436 | + | |
| 437 | + | |
| 438 | + | |
| 439 | + | |
| 440 | + | |
| 441 | + | |
| 442 | + | |
| 443 | + | |
| 444 | + | |
| 445 | + | |
| 446 | + | |
| 447 | + | |
| 448 | + | |
| 449 | + | |
| 450 | + | |
| 451 | + | |
| 452 | + | |
| 453 | + | |
| 454 | + | |
| 455 | + | |
| 456 | + | |
| 457 | + | |
| 458 | + | |
| 459 | + | |
| 460 | + | |
| 461 | + | |
| 462 | + | |
| 463 | + | |
| 464 | + | |
| 465 | + | |
| 466 | + | |
| 467 | + | |
| 468 | + | |
| 469 | + | |
| 470 | + | |
| 471 | + | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
| 475 | + | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
| 489 | + | |
| 490 | + | |
| 491 | + | |
| 492 | + | |
| 493 | + | |
| 494 | + | |
| 495 | + | |
| 496 | + | |
| 497 | + | |
| 498 | + | |
| 499 | + | |
| 500 | + | |
| 501 | + | |
| 502 | + | |
| 503 | + | |
| 504 | + | |
| 505 | + | |
| 506 | + | |
| 507 | + | |
| 508 | + | |
| 509 | + | |
| 510 | + | |
| 511 | + | |
| 512 | + | |
| 513 | + | |
| 514 | + | |
| 515 | + | |
| 516 | + | |
| 517 | + | |
| 518 | + | |
| 519 | + | |
| 520 | + | |
| 521 | + | |
| 522 | + | |
| 523 | + | |
| 524 | + | |
| 525 | + | |
| 526 | + | |
| 527 | + | |
| 528 | + | |
| 529 | + | |
| 530 | + | |
| 531 | + | |
| 532 | + | |
| 533 | + | |
| 534 | + | |
| 535 | + | |
| 536 | + | |
| 537 | + | |
| 538 | + | |
| 539 | + | |
| 540 | + | |
| 541 | + | |
| 542 | + | |
| 543 | + | |
| 544 | + | |
| 545 | + | |
| 546 | + | |
| 547 | + | |
| 548 | + | |
| 549 | + | |
| 550 | + | |
| 551 | + | |
| 552 | + | |
| 553 | + | |
| 554 | + | |
| 555 | + | |
| 556 | + | |
| 557 | + | |
| 558 | + | |
| 559 | + | |
| 560 | + | |
428 | 561 | | |
429 | 562 | | |
430 | 563 | | |
431 | 564 | | |
432 | 565 | | |
433 | 566 | | |
| 567 | + | |
| 568 | + | |
| 569 | + | |
434 | 570 | | |
435 | 571 | | |
436 | 572 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
39 | 39 | | |
40 | 40 | | |
41 | 41 | | |
42 | | - | |
43 | | - | |
44 | | - | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
45 | 52 | | |
46 | 53 | | |
47 | 54 | | |
| |||
203 | 210 | | |
204 | 211 | | |
205 | 212 | | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
206 | 235 | | |
207 | 236 | | |
208 | 237 | | |
| |||
236 | 265 | | |
237 | 266 | | |
238 | 267 | | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
239 | 330 | | |
240 | 331 | | |
241 | 332 | | |
| |||
354 | 445 | | |
355 | 446 | | |
356 | 447 | | |
357 | | - | |
358 | | - | |
359 | | - | |
360 | | - | |
361 | | - | |
| 448 | + | |
| 449 | + | |
| 450 | + | |
| 451 | + | |
| 452 | + | |
| 453 | + | |
362 | 454 | | |
363 | 455 | | |
364 | | - | |
365 | | - | |
366 | | - | |
367 | 456 | | |
368 | 457 | | |
369 | 458 | | |
| |||
438 | 527 | | |
439 | 528 | | |
440 | 529 | | |
441 | | - | |
442 | | - | |
443 | | - | |
444 | | - | |
445 | | - | |
| 530 | + | |
| 531 | + | |
| 532 | + | |
| 533 | + | |
| 534 | + | |
| 535 | + | |
446 | 536 | | |
447 | 537 | | |
448 | 538 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
78 | 78 | | |
79 | 79 | | |
80 | 80 | | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
81 | 102 | | |
82 | 103 | | |
83 | 104 | | |
| |||
0 commit comments