triton-tutorial/README.en.md at main · dsl-learn/triton-tutorial

Hands-On Triton Tutorial 📖

Learn Triton: No GPU Experience Required

English | 中文

This tutorial is dedicated to beginners who are not familiar with Python and GPU programming, offering a walkthrough from vector addition to RoPE, matmul_ogs, Top-K, Gluon Attention, and other important GPU kernels for Large Language Models (LLMs).

For those who haven't installed Python yet or want to learn some baic Python syntax before getting started, Python.org provides a comprehensive tutorial.

Author: BobHuang - OpenMLIR

Introduction of Triton

OpenAI/Triton is a Python dialect that allows developers to write efficient GPU kernels using Python syntax instead of coding in C++. As to now, Triton supports multiple backends, including: NVIDIA, AMD, 华为昇腾, 寒武纪、摩尔线程, 沐曦. This means kernel written in Triton can compile and run on different hardware architectures. For more details, see FlagOpen/FlagGems.

Advantages: Easy-to-use Python-like syntax, with high optimization for GPU performance and memory management.

Applications: Enhancing the efficiency of the inference phase of LLMs by speeding up deep learning operators and enabling customization of operators.

This repository requires Triton 3.4.0 (released on July 31, 2025), which comes with torch == 2.8.0. Since Triton has excellent backward compatibility, other versions of PyTorch might work as well if Triton is manually upgraded.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Hands-On Triton Tutorial 📖

Learn Triton: No GPU Experience Required

Introduction of Triton

Uh oh!

FilesExpand file tree

README.en.md

Latest commit

History

README.en.md

File metadata and controls

Hands-On Triton Tutorial 📖

Learn Triton: No GPU Experience Required

Introduction of Triton