[中文版|English]
💫 Moore Threads(摩尔线程)mthreads (Triton 3.6)
- Based on Triton 3.6, x64
- Available for S4000/S5000
If your network connection is available, you do not need to perform the later step 1.x, because dependencies will be fetched automatically during the build.
# Plan A: docker pull (60.5GB)
IMAGE=harbor.baai.ac.cn/flagtree/flagtree-mthreads3.6-py310-torch2.7.1-musa5.1.0-ubuntu22.04:202605-base
docker pull ${IMAGE}
# Plan B: docker load (17GB)
IMAGE=flagtree-mthreads3.6-py310-torch2.7.1-musa5.1.0-ubuntu22.04:202605-base
wget https://baai-cp-web.ks3-cn-beijing.ksyuncs.com/trans/flagtree-mthreads3.6-py310-torch2.7.1-musa5.1.0-ubuntu22.04.202605-base.tar.gz
docker load -i flagtree-mthreads3.6-py310-torch2.7.1-musa5.1.0-ubuntu22.04.202605-base.tar.gz
mcc_version # 5.1.0CONTAINER=flagtree-dev-xxx
docker run -dit \
--network=host --pid=host --privileged \
--cap-add=SYS_PTRACE \
--shm-size 16gb \
--security-opt seccomp=unconfined \
-e MTHREADS_VISIBLE_DEVICES=all -e MTHREADS_DRIVER_CAPABILITIES=all \
-v /usr/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu \
-v /lib/x86_64-linux-gnu:/lib/x86_64-linux-gnu \
-v /etc/alternatives:/etc/alternatives \
-v /etc/localtime:/etc/localtime:ro \
-v /data:/data -v /home:/home -v /tmp:/tmp \
-w /root --name ${CONTAINER} ${IMAGE} bash
docker exec -it ${CONTAINER} /bin/bashmkdir -p ~/.flagtree/mthreads; cd ~/.flagtree/mthreads
wget https://baai-cp-web.ks3-cn-beijing.ksyuncs.com/trans/mthreads-llvm22-x64_v0.5.1.tar.gz
tar zxvf mthreads-llvm22-x64_v0.5.1.tar.gzThe Triton dependencies are already downloaded and installed in the image. If you do not need to build FlagTree or Triton from source, you do not need to download the Triton dependencies.
cd ${YOUR_CODE_DIR}/FlagTree
# For Triton 3.6 (x64)
wget https://baai-cp-web.ks3-cn-beijing.ksyuncs.com/trans/build-deps-triton_3.6.x-linux-x64.tar.gz
sh python/scripts/unpack_triton_build_deps.sh ./build-deps-triton_3.6.x-linux-x64.tar.gzAfter executing the above script, the original ~/.triton directory will be renamed, and a new ~/.triton directory will be created to store the pre-downloaded packages. Note that the script will prompt for manual confirmation during execution.
# Note: First install PyTorch, then execute the following commands
python3 -m pip uninstall -y triton # Repeat the cmd until fully uninstalled
RES="--index-url=https://resource.flagos.net/repository/flagos-pypi-hosted/simple"
python3.10 -m pip install flagtree===0.5.2rc1+mthreads3.6 $RESAfter installing flagtree, you can check it with:
python3 -m pip show flagtreecd ${YOUR_CODE_DIR}/FlagTree
git checkout -b triton_v3.6.x origin/triton_v3.6.x
export FLAGTREE_BACKEND=mthreads
MAX_JOBS=32 python3 -m pip install . --no-build-isolation -vRefer to Tests of mthreads3.2 backend
💫 Moore Threads(摩尔线程)mthreads (Triton 3.2)
- Based on Triton 3.2, x64
- Available for S4000/S5000
If your network connection is available, you do not need to perform the later step 1.x, because dependencies will be fetched automatically during the build.
# Plan A: docker pull (59.4GB)
IMAGE=harbor.baai.ac.cn/flagtree/flagtree-mthreads3.2-py310-torch2.7.1-musa5.1.0-ubuntu22.04:202605-base
docker pull ${IMAGE}
# Plan B: docker load (17GB)
IMAGE=flagtree-mthreads3.2-py310-torch2.7.1-musa5.1.0-ubuntu22.04:202605-base
wget https://baai-cp-web.ks3-cn-beijing.ksyuncs.com/trans/flagtree-mthreads3.2-py310-torch2.7.1-musa5.1.0-ubuntu22.04.202605-base.tar.gz
docker load -i flagtree-mthreads3.2-py310-torch2.7.1-musa5.1.0-ubuntu22.04.202605-base.tar.gz
mcc_version # 5.1.0CONTAINER=flagtree-dev-xxx
docker run -dit \
--network=host --pid=host --privileged \
--cap-add=SYS_PTRACE \
--shm-size 16gb \
--security-opt seccomp=unconfined \
-e MTHREADS_VISIBLE_DEVICES=all -e MTHREADS_DRIVER_CAPABILITIES=all \
-v /usr/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu \
-v /lib/x86_64-linux-gnu:/lib/x86_64-linux-gnu \
-v /etc/alternatives:/etc/alternatives \
-v /etc/localtime:/etc/localtime:ro \
-v /data:/data -v /home:/home -v /tmp:/tmp \
-w /root --name ${CONTAINER} ${IMAGE} bash
docker exec -it ${CONTAINER} /bin/bashmkdir -p ~/.flagtree/mthreads; cd ~/.flagtree/mthreads
wget https://baai-cp-web.ks3-cn-beijing.ksyuncs.com/trans/mthreads-llvm20-x64_v0.5.0.tar.gz
tar zxvf mthreads-llvm20-x64_v0.5.0.tar.gz
wget https://baai-cp-web.ks3-cn-beijing.ksyuncs.com/trans/mthreadsTritonPlugin-triton3.2-cpython3.10-glibc2.35-glibcxx3.4.30-cxxabi1.3.13-x64_v0.5.0.tar.gz
tar zxvf mthreadsTritonPlugin-triton3.2-cpython3.10-glibc2.35-glibcxx3.4.30-cxxabi1.3.13-x64_v0.5.0.tar.gzThe Triton dependencies are already downloaded and installed in the image. If you do not need to build FlagTree or Triton from source, you do not need to download the Triton dependencies.
cd ${YOUR_CODE_DIR}/FlagTree
# For Triton 3.2 (x64)
wget https://baai-cp-web.ks3-cn-beijing.ksyuncs.com/trans/build-deps-triton_3.2.x-linux-x64.tar.gz
sh python/scripts/unpack_triton_build_deps.sh ./build-deps-triton_3.2.x-linux-x64.tar.gzAfter executing the above script, the original ~/.triton directory will be renamed, and a new ~/.triton directory will be created to store the pre-downloaded packages. Note that the script will prompt for manual confirmation during execution.
# Note: First install PyTorch, then execute the following commands
python3 -m pip uninstall -y triton # Repeat the cmd until fully uninstalled
RES="--index-url=https://resource.flagos.net/repository/flagos-pypi-hosted/simple"
python3.10 -m pip install flagtree===0.5.1+mthreads3.2 $RESAfter installing flagtree, you can check it with:
python3 -m pip show flagtreecd ${YOUR_CODE_DIR}/FlagTree/python
git checkout -b triton_v3.2.x origin/triton_v3.2.x
export FLAGTREE_BACKEND=mthreads
MAX_JOBS=32 python3 -m pip install . --no-build-isolation -vRefer to Tests of mthreads3.2 backend
💫 Moore Threads(摩尔线程)mthreads (Triton 3.1)
- Based on Triton 3.1, x64/aarch64
- Available for S4000/S5000
If you use this preinstalled image, you do not need to perform the later step 1.x. If your network connection is available, you also do not need to perform the later step 1.x, because dependencies will be fetched automatically during the build.
# Plan A: docker pull (55.3GB)
IMAGE=harbor.baai.ac.cn/flagtree/flagtree-mthreads-py310-torch2.7.1-musa4.3.5-ubuntu22.04:202603
docker pull ${IMAGE}
# Plan B: docker load (18GB)
IMAGE=flagtree-mthreads-py310-torch2.7.1-musa4.3.5-ubuntu22.04:202603
wget https://baai-cp-web.ks3-cn-beijing.ksyuncs.com/trans/flagtree-mthreads-py310-torch2.7.1-musa4.3.5-ubuntu22.04.202603.tar.gz
docker load -i flagtree-mthreads-py310-torch2.7.1-musa4.3.5-ubuntu22.04.202603.tar.gzCONTAINER=flagtree-dev-xxx
docker run -dit \
--network=host --pid=host --privileged \
--cap-add=SYS_PTRACE \
--shm-size 16gb \
--security-opt seccomp=unconfined \
-e MTHREADS_VISIBLE_DEVICES=all -e MTHREADS_DRIVER_CAPABILITIES=all \
-v /usr/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu \
-v /lib/x86_64-linux-gnu:/lib/x86_64-linux-gnu \
-v /etc/alternatives:/etc/alternatives \
-v /etc/localtime:/etc/localtime:ro \
-v /data:/data -v /home:/home -v /tmp:/tmp \
-w /root --name ${CONTAINER} ${IMAGE} bash
docker exec -it ${CONTAINER} /bin/bashmkdir -p ~/.flagtree/mthreads; cd ~/.flagtree/mthreads
# x64
wget https://baai-cp-web.ks3-cn-beijing.ksyuncs.com/trans/mthreads-llvm19-glibc2.35-glibcxx3.4.30-x64_v0.4.0.tar.gz
tar zxvf mthreads-llvm19-glibc2.35-glibcxx3.4.30-x64_v0.4.0.tar.gz \
-C ./mthreads-llvm19-glibc2.35-glibcxx3.4.30 --strip-components=1
wget https://baai-cp-web.ks3-cn-beijing.ksyuncs.com/trans/mthreadsTritonPlugin-cpython3.10-glibc2.35-glibcxx3.4.30-cxxabi1.3.13-ubuntu-x64_v0.4.1.tar.gz
tar zxvf mthreadsTritonPlugin-cpython3.10-glibc2.35-glibcxx3.4.30-cxxabi1.3.13-ubuntu-x64_v0.4.1.tar.gz
# aarch64
wget https://baai-cp-web.ks3-cn-beijing.ksyuncs.com/trans/mthreads-llvm19-glibc2.35-glibcxx3.4.30-aarch64_v0.4.0.tar.gz
tar zxvf mthreads-llvm19-glibc2.35-glibcxx3.4.30-aarch64_v0.4.0.tar.gz \
-C ./mthreads-llvm19-glibc2.35-glibcxx3.4.30 --strip-components=1
wget https://baai-cp-web.ks3-cn-beijing.ksyuncs.com/trans/mthreadsTritonPlugin-cpython3.10-glibc2.35-glibcxx3.4.30-cxxabi1.3.13-ubuntu-aarch64_v0.4.0.tar.gz
tar zxvf mthreadsTritonPlugin-cpython3.10-glibc2.35-glibcxx3.4.30-cxxabi1.3.13-ubuntu-aarch64_v0.4.0.tar.gzThe Triton dependencies are already downloaded and installed in the image. If you do not need to build FlagTree or Triton from source, you do not need to download the Triton dependencies.
cd ${YOUR_CODE_DIR}/FlagTree
# For Triton 3.1 (x64)
wget https://baai-cp-web.ks3-cn-beijing.ksyuncs.com/trans/build-deps-triton_3.1.x-linux-x64.tar.gz
sh python/scripts/unpack_triton_build_deps.sh ./build-deps-triton_3.1.x-linux-x64.tar.gzAfter executing the above script, the original ~/.triton directory will be renamed, and a new ~/.triton directory will be created to store the pre-downloaded packages. Note that the script will prompt for manual confirmation during execution.
# Note: First install PyTorch, then execute the following commands
python3 -m pip uninstall -y triton # Repeat the cmd until fully uninstalled
RES="--index-url=https://resource.flagos.net/repository/flagos-pypi-hosted/simple"
python3.10 -m pip install flagtree===0.5.1+mthreads3.1 $RESflagtree is already installed in the preinstalled image. You can check it with:
python3 -m pip show flagtreecd ${YOUR_CODE_DIR}/FlagTree/python
export FLAGTREE_BACKEND=mthreads
MAX_JOBS=32 python3 -m pip install . --no-build-isolation -vRefer to Tests of mthreads3.1 backend
For triton 3.1 kernels that use tl.dot, setting the environment variable export MUSA_ENABLE_SQMMA=1 can improve performance.