11# llama-cpp-python-wheels
2-
32Pre-built wheels for llama-cpp-python across platforms and CUDA versions.
43
54## Available Wheels
65
7- | File | OS | Python | CUDA | Driver | GPU Support | Size |
8- | ------| -----| --------| ------| --------| -------------| ------|
9- | [ llama_cpp_python-0.3.16+cuda13.0.sm86.ampere-cp313-cp313-win_amd64.whl] ( https://github.com/DougRahden/llama-cpp-python-wheels/releases/download/v0.3.16-cuda13.0-py313/llama_cpp_python-0.3.16%2Bcuda13.0.sm86.ampere-cp313-cp313-win_amd64.whl ) | Windows 10/11 | 3.13 | 13.0 | 580+ | RTX 30 series (Ampere, sm_86) | 61.4 MB |
6+ ### CUDA 13.0 - Latest
7+ | File | OS | Python | Driver | GPU Support | Size |
8+ | ------| -----| --------| --------| -------------| ------|
9+ | [ llama_cpp_python-0.3.16+cuda13.0.sm86.ampere-cp313-cp313-win_amd64.whl] ( https://github.com/dougeeai/llama-cpp-python-wheels/releases/download/v0.3.16-cuda13.0-sm86-py313/llama_cpp_python-0.3.16+cuda13.0.sm86.ampere-cp313-cp313-win_amd64.whl ) | Windows 10/11 | 3.13 | 580+ | RTX 30 series (Ampere, sm_86) | 61.4 MB |
10+ | [ llama_cpp_python-0.3.16+cuda13.0.sm86.ampere-cp312-cp312-win_amd64.whl] ( https://github.com/dougeeai/llama-cpp-python-wheels/releases/download/v0.3.16-cuda13.0-sm86-py312/llama_cpp_python-0.3.16+cuda13.0.sm86.ampere-cp312-cp312-win_amd64.whl ) | Windows 10/11 | 3.12 | 580+ | RTX 30 series (Ampere, sm_86) | 61.4 MB |
11+ | [ llama_cpp_python-0.3.16+cuda13.0.sm86.ampere-cp311-cp311-win_amd64.whl] ( https://github.com/dougeeai/llama-cpp-python-wheels/releases/download/v0.3.16-cuda13.0-sm86-py311/llama_cpp_python-0.3.16+cuda13.0.sm86.ampere-cp311-cp311-win_amd64.whl ) | Windows 10/11 | 3.11 | 580+ | RTX 30 series (Ampere, sm_86) | 61.4 MB |
12+ | [ llama_cpp_python-0.3.16+cuda13.0.sm86.ampere-cp310-cp310-win_amd64.whl] ( https://github.com/dougeeai/llama-cpp-python-wheels/releases/download/v0.3.16-cuda13.0-sm86-py310/llama_cpp_python-0.3.16+cuda13.0.sm86.ampere-cp310-cp310-win_amd64.whl ) | Windows 10/11 | 3.10 | 580+ | RTX 30 series (Ampere, sm_86) | 61.4 MB |
13+ | [ llama_cpp_python-0.3.16+cuda13.0.sm89.ada-cp313-cp313-win_amd64.whl] ( https://github.com/dougeeai/llama-cpp-python-wheels/releases/download/v0.3.16-cuda13.0-sm89-py313/llama_cpp_python-0.3.16+cuda13.0.sm89.ada-cp313-cp313-win_amd64.whl ) | Windows 10/11 | 3.13 | 580+ | RTX 40 series/Ada Pro (sm_89) | 61.4 MB |
14+ | [ llama_cpp_python-0.3.16+cuda13.0.sm89.ada-cp312-cp312-win_amd64.whl] ( https://github.com/dougeeai/llama-cpp-python-wheels/releases/download/v0.3.16-cuda13.0-sm89-py312/llama_cpp_python-0.3.16+cuda13.0.sm89.ada-cp312-cp312-win_amd64.whl ) | Windows 10/11 | 3.12 | 580+ | RTX 40 series/Ada Pro (sm_89) | 61.4 MB |
15+ | [ llama_cpp_python-0.3.16+cuda13.0.sm89.ada-cp311-cp311-win_amd64.whl] ( https://github.com/dougeeai/llama-cpp-python-wheels/releases/download/v0.3.16-cuda13.0-sm89-py311/llama_cpp_python-0.3.16+cuda13.0.sm89.ada-cp311-cp311-win_amd64.whl ) | Windows 10/11 | 3.11 | 580+ | RTX 40 series/Ada Pro (sm_89) | 61.4 MB |
16+ | [ llama_cpp_python-0.3.16+cuda13.0.sm89.ada-cp310-cp310-win_amd64.whl] ( https://github.com/dougeeai/llama-cpp-python-wheels/releases/download/v0.3.16-cuda13.0-sm89-py310/llama_cpp_python-0.3.16+cuda13.0.sm89.ada-cp310-cp310-win_amd64.whl ) | Windows 10/11 | 3.10 | 580+ | RTX 40 series/Ada Pro (sm_89) | 61.3 MB |
17+
18+ ### CUDA 12.1 - Recommended
19+ | File | OS | Python | Driver | GPU Support | Size |
20+ | ------| -----| --------| --------| -------------| ------|
21+ | [ llama_cpp_python-0.3.16+cuda12.1.sm86.ampere-cp313-cp313-win_amd64.whl] ( https://github.com/dougeeai/llama-cpp-python-wheels/releases/download/v0.3.16-cuda12.1-sm86-py313/llama_cpp_python-0.3.16+cuda12.1.sm86.ampere-cp313-cp313-win_amd64.whl ) | Windows 10/11 | 3.13 | 525.60.13+ | RTX 30 series (Ampere, sm_86) | 92.2 MB |
22+ | [ llama_cpp_python-0.3.16+cuda12.1.sm86.ampere-cp312-cp312-win_amd64.whl] ( https://github.com/dougeeai/llama-cpp-python-wheels/releases/download/v0.3.16-cuda12.1-sm86-py312/llama_cpp_python-0.3.16+cuda12.1.sm86.ampere-cp312-cp312-win_amd64.whl ) | Windows 10/11 | 3.12 | 525.60.13+ | RTX 30 series (Ampere, sm_86) | 61.4 MB |
23+ | [ llama_cpp_python-0.3.16+cuda12.1.sm86.ampere-cp311-cp311-win_amd64.whl] ( https://github.com/dougeeai/llama-cpp-python-wheels/releases/download/v0.3.16-cuda12.1-sm86-py311/llama_cpp_python-0.3.16+cuda12.1.sm86.ampere-cp311-cp311-win_amd64.whl ) | Windows 10/11 | 3.11 | 525.60.13+ | RTX 30 series (Ampere, sm_86) | 61.4 MB |
24+ | [ llama_cpp_python-0.3.16+cuda12.1.sm86.ampere-cp310-cp310-win_amd64.whl] ( https://github.com/dougeeai/llama-cpp-python-wheels/releases/download/v0.3.16-cuda12.1-sm86-py310/llama_cpp_python-0.3.16+cuda12.1.sm86.ampere-cp310-cp310-win_amd64.whl ) | Windows 10/11 | 3.10 | 525.60.13+ | RTX 30 series (Ampere, sm_86) | 61.4 MB |
25+ | [ llama_cpp_python-0.3.16+cuda12.1.sm89.ada-cp313-cp313-win_amd64.whl] ( https://github.com/dougeeai/llama-cpp-python-wheels/releases/download/v0.3.16-cuda12.1-sm89-py313/llama_cpp_python-0.3.16+cuda12.1.sm89.ada-cp313-cp313-win_amd64.whl ) | Windows 10/11 | 3.13 | 525.60.13+ | RTX 40 series/Ada Pro (sm_89) | 100.6 MB |
26+ | [ llama_cpp_python-0.3.16+cuda12.1.sm89.ada-cp312-cp312-win_amd64.whl] ( https://github.com/dougeeai/llama-cpp-python-wheels/releases/download/v0.3.16-cuda12.1-sm89-py312/llama_cpp_python-0.3.16+cuda12.1.sm89.ada-cp312-cp312-win_amd64.whl ) | Windows 10/11 | 3.12 | 525.60.13+ | RTX 40 series/Ada Pro (sm_89) | 100.6 MB |
27+ | [ llama_cpp_python-0.3.16+cuda12.1.sm89.ada-cp311-cp311-win_amd64.whl] ( https://github.com/dougeeai/llama-cpp-python-wheels/releases/download/v0.3.16-cuda12.1-sm89-py311/llama_cpp_python-0.3.16+cuda12.1.sm89.ada-cp311-cp311-win_amd64.whl ) | Windows 10/11 | 3.11 | 525.60.13+ | RTX 40 series/Ada Pro (sm_89) | 100.6 MB |
28+ | [ llama_cpp_python-0.3.16+cuda12.1.sm89.ada-cp310-cp310-win_amd64.whl] ( https://github.com/dougeeai/llama-cpp-python-wheels/releases/download/v0.3.16-cuda12.1-sm89-py310/llama_cpp_python-0.3.16+cuda12.1.sm89.ada-cp310-cp310-win_amd64.whl ) | Windows 10/11 | 3.10 | 525.60.13+ | RTX 40 series/Ada Pro (sm_89) | 100.6 MB |
29+
30+ ### CUDA 11.8 - Most Compatible
31+ | File | OS | Python | Driver | GPU Support | Size |
32+ | ------| -----| --------| --------| -------------| ------|
33+ | [ llama_cpp_python-0.3.16+cuda11.8.sm86.ampere-cp313-cp313-win_amd64.whl] ( https://github.com/dougeeai/llama-cpp-python-wheels/releases/download/v0.3.16-cuda11.8-sm86-py313/llama_cpp_python-0.3.16+cuda11.8.sm86.ampere-cp313-cp313-win_amd64.whl ) | Windows 10/11 | 3.13 | 450.80.02+ | RTX 30 series (Ampere, sm_86) | 100.6 MB |
34+ | [ llama_cpp_python-0.3.16+cuda11.8.sm86.ampere-cp312-cp312-win_amd64.whl] ( https://github.com/dougeeai/llama-cpp-python-wheels/releases/download/v0.3.16-cuda11.8-sm86-py312/llama_cpp_python-0.3.16+cuda11.8.sm86.ampere-cp312-cp312-win_amd64.whl ) | Windows 10/11 | 3.12 | 450.80.02+ | RTX 30 series (Ampere, sm_86) | 100.6 MB |
35+ | [ llama_cpp_python-0.3.16+cuda11.8.sm86.ampere-cp311-cp311-win_amd64.whl] ( https://github.com/dougeeai/llama-cpp-python-wheels/releases/download/v0.3.16-cuda11.8-sm86-py311/llama_cpp_python-0.3.16+cuda11.8.sm86.ampere-cp311-cp311-win_amd64.whl ) | Windows 10/11 | 3.11 | 450.80.02+ | RTX 30 series (Ampere, sm_86) | 100.6 MB |
36+ | [ llama_cpp_python-0.3.16+cuda11.8.sm86.ampere-cp310-cp310-win_amd64.whl] ( https://github.com/dougeeai/llama-cpp-python-wheels/releases/download/v0.3.16-cuda11.8-sm86-py310/llama_cpp_python-0.3.16+cuda11.8.sm86.ampere-cp310-cp310-win_amd64.whl ) | Windows 10/11 | 3.10 | 450.80.02+ | RTX 30 series (Ampere, sm_86) | 100.6 MB |
37+ | [ llama_cpp_python-0.3.16+cuda11.8.sm89.ada-cp313-cp313-win_amd64.whl] ( https://github.com/dougeeai/llama-cpp-python-wheels/releases/download/v0.3.16-cuda11.8-sm89-py313/llama_cpp_python-0.3.16+cuda11.8.sm89.ada-cp313-cp313-win_amd64.whl ) | Windows 10/11 | 3.13 | 450.80.02+ | RTX 40 series/Ada Pro (sm_89) | 100.5 MB |
38+ | [ llama_cpp_python-0.3.16+cuda11.8.sm89.ada-cp312-cp312-win_amd64.whl] ( https://github.com/dougeeai/llama-cpp-python-wheels/releases/download/v0.3.16-cuda11.8-sm89-py312/llama_cpp_python-0.3.16+cuda11.8.sm89.ada-cp312-cp312-win_amd64.whl ) | Windows 10/11 | 3.12 | 450.80.02+ | RTX 40 series/Ada Pro (sm_89) | 100.5 MB |
39+ | [ llama_cpp_python-0.3.16+cuda11.8.sm89.ada-cp311-cp311-win_amd64.whl] ( https://github.com/dougeeai/llama-cpp-python-wheels/releases/download/v0.3.16-cuda11.8-sm89-py311/llama_cpp_python-0.3.16+cuda11.8.sm89.ada-cp311-cp311-win_amd64.whl ) | Windows 10/11 | 3.11 | 450.80.02+ | RTX 40 series/Ada Pro (sm_89) | 100.5 MB |
40+ | [ llama_cpp_python-0.3.16+cuda11.8.sm89.ada-cp310-cp310-win_amd64.whl] ( https://github.com/dougeeai/llama-cpp-python-wheels/releases/download/v0.3.16-cuda11.8-sm89-py310/llama_cpp_python-0.3.16+cuda11.8.sm89.ada-cp310-cp310-win_amd64.whl ) | Windows 10/11 | 3.10 | 450.80.02+ | RTX 40 series/Ada Pro (sm_89) | 100.5 MB |
41+
42+ ## GPU Support
43+ - ** Ampere (sm_86)** : RTX 3060, 3060 Ti, 3070, 3070 Ti, 3080, 3080 Ti, 3090, 3090 Ti
44+ - ** Ada Lovelace (sm_89)** : RTX 4060, 4060 Ti, 4070, 4070 Ti, 4070 Ti Super, 4080, 4080 Super, 4090, RTX A6000 Ada, RTX 6000 Ada, RTX 5000 Ada, L40, L40S
1045
1146## Installation
12-
13- Download wheel from [ Releases] ( ../../releases ) and install:
47+ Download the appropriate wheel from [ Releases] ( ../../releases ) and install:
1448``` bash
15- pip install llama_cpp_python-0.3.16+cuda13.0.sm86.ampere-cp313-cp313 -win_amd64.whl
49+ pip install llama_cpp_python-[version]+cuda[cuda_version].sm[arch].[gpu]-cp[python]-cp[python] -win_amd64.whl
1650```
1751
1852## Verification
@@ -22,23 +56,20 @@ print("llama-cpp-python with CUDA support installed successfully")
2256```
2357
2458## Build Notes
25-
2659Built with:
27- - Visual Studio 2022 Build Tools
28- - CUDA Toolkit 13.0
29- - CMAKE_CUDA_ARCHITECTURES=86
60+ - Visual Studio 2019/2022 Build Tools
61+ - CUDA Toolkit 11.8, 12.1, 13.0
62+ - CMAKE_CUDA_ARCHITECTURES=86 (Ampere) or 89 (Ada)
63+ - 12+ hour marathon debugging session 😅
3064
3165## License
32-
3366MIT
3467
3568Wheels are built from [ llama-cpp-python] ( https://github.com/abetlen/llama-cpp-python ) (MIT License)
3669
37-
3870## Contributing
39-
4071** Need a different configuration?**
41- Open an [ issue] ( https://github.com/DougRahden /llama-cpp-python-wheels/issues ) with:
72+ Open an [ issue] ( https://github.com/dougeeai /llama-cpp-python-wheels/issues ) with:
4273- OS (Windows/Linux/macOS)
4374- Python version
4475- CUDA version (if applicable)
@@ -47,5 +78,4 @@ Open an [issue](https://github.com/DougRahden/llama-cpp-python-wheels/issues) wi
4778I'll try to build it if I have access to similar hardware.
4879
4980## Contact
50-
51- Questions or issues? Open a [ GitHub issue] ( https://github.com/DougRahden/llama-cpp-python-wheels/issues ) .
81+ Questions or issues? Open a [ GitHub issue] ( https://github.com/dougeeai/llama-cpp-python-wheels/issues ) .
0 commit comments