Skip to content

Commit b5c9834

Browse files
authored
🔥🔥[FLASH-ATTENTION RNG] Reducing the Cost of Dropout in Flash-Attention by Hiding RNG with GEMM
1 parent 6ad7b30 commit b5c9834

1 file changed

Lines changed: 2 additions & 1 deletion

File tree

README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -451,7 +451,8 @@ Awesome-LLM-Inference: A curated list of [📙Awesome LLM Inference Papers with
451451
|2024.09|🔥🔥[**HiFloat8**] Ascend HiFloat8 Format for Deep Learning(@Huawei)|[[pdf]](https://arxiv.org/pdf/2409.16626)|⚠️|⭐️ |
452452
|2024.09|🔥🔥[**Tensor Cores**] Efficient Arbitrary Precision Acceleration for Large Language Models on GPU Tensor Cores(@nju.edu.cn)|[[pdf]](https://arxiv.org/pdf/2409.17870)|⚠️|⭐️ |
453453
|2024.07|🔥🔥[**Tensor Product**] Acceleration of Tensor-Product Operations with Tensor Cores(@Heidelberg University)|[[pdf]](https://arxiv.org/pdf/2407.09621)|⚠️|⭐️ |
454-
|2024.12| 🔥🔥[**HADACORE**] HADACORE: TENSOR CORE ACCELERATED HADAMARD TRANSFORM KERNEL(@Meta)|[[pdf]](https://arxiv.org/pdf/2407.09621)|[[hadamard_transform]](https://github.com/pytorch-labs/applied-ai/tree/main/kernels/cuda/inference/hadamard_transform) ![](https://img.shields.io/github/stars/pytorch-labs/applied-ai.svg?style=social)|⭐️ |
454+
|2024.12| 🔥🔥[**HADACORE**] HADACORE: TENSOR CORE ACCELERATED HADAMARD TRANSFORM KERNEL(@Meta)|[[pdf]](https://arxiv.org/pdf/2407.09621)|[[hadamard_transform]](https://github.com/pytorch-labs/applied-ai/tree/main/kernels/cuda/inference/hadamard_transform) ![](https://img.shields.io/github/stars/pytorch-labs/applied-ai.svg?style=social)|⭐️ |
455+
|2024.10| 🔥🔥[**FLASH-ATTENTION RNG**] Reducing the Cost of Dropout in Flash-Attention by Hiding RNG with GEMM(@Princeton University)|[[pdf]](https://arxiv.org/pdf/2410.07531)|⚠️|⭐️ |
455456

456457
### 📖VLM/Position Embed/Others ([©️back👆🏻](#paperlist))
457458
<div id="Others"></div>

0 commit comments

Comments
 (0)