11<div align =" center " >
2- <img src =" docs/images/logo.png " alt =" QuantLLM Logo " />
2+ <img src =" docs/images/logo.png " alt =" QuantLLM Logo " width = " 150 " />
33
44 # 🚀 QuantLLM v2.0
55
6- <p align =" center " >
7- <strong>The Ultra-Fast LLM Quantization & Export Library</strong>
8- </p >
9-
10- <p align =" center " >
11- <a href="https://pepy.tech/projects/quantllm"><img src="https://static.pepy.tech/badge/quantllm" alt="Downloads"/></a>
12- <img alt="PyPI - Version" src="https://img.shields.io/pypi/v/quantllm?logo=pypi&label=version&color=orange"/>
13- <img alt="Python" src="https://img.shields.io/badge/python-3.10+-orange.svg"/>
14- <img alt="License" src="https://img.shields.io/badge/license-MIT-orange.svg"/>
15- <img alt="Stars" src="https://img.shields.io/github/stars/codewithdark-git/QuantLLM?style=social"/>
16- </p >
17-
18- <p align =" center " >
19- <b>Load → Quantize → Fine-tune → Export</b> — All in One Line
20- </p >
6+ ** The Ultra-Fast LLM Quantization & Export Library**
7+
8+ [ ![ Downloads] ( https://static.pepy.tech/badge/quantllm )] ( https://pepy.tech/projects/quantllm )
9+ [ ![ PyPI] ( https://img.shields.io/pypi/v/quantllm?logo=pypi&label=version&color=orange )] ( https://pypi.org/project/quantllm/ )
10+ [ ![ Python] ( https://img.shields.io/badge/python-3.10+-orange.svg )] ( https://www.python.org/ )
11+ [ ![ License] ( https://img.shields.io/badge/license-MIT-orange.svg )] ( LICENSE )
12+ [ ![ Stars] ( https://img.shields.io/github/stars/codewithdark-git/QuantLLM?style=social )] ( https://github.com/codewithdark-git/QuantLLM )
13+
14+ ** Load → Quantize → Fine-tune → Export** — All in One Line
2115
22- <p align =" center " >
23- <a href="#-quick-start">Quick Start</a> •
24- <a href="#-features">Features</a> •
25- <a href="#-export-formats">Export Formats</a> •
26- <a href="#-examples">Examples</a> •
27- <a href="https://quantllm.readthedocs.io">Documentation</a>
28- </p >
16+ [ Quick Start] ( #-quick-start ) •
17+ [ Features] ( #-features ) •
18+ [ Export Formats] ( #-export-formats ) •
19+ [ Examples] ( #-examples ) •
20+ [ Documentation] ( https://quantllm.readthedocs.io )
21+
2922</div >
3023
3124---
3225
3326## 🎯 Why QuantLLM?
3427
35- <table >
36- <tr >
37- <td width =" 50% " >
28+ ### ❌ Without QuantLLM (50+ lines of code)
3829
39- ### ❌ Without QuantLLM
4030``` python
41- # 50+ lines of configuration...
4231from transformers import AutoModelForCausalLM, BitsAndBytesConfig
4332from peft import LoraConfig, get_peft_model
4433import torch
@@ -53,44 +42,29 @@ model = AutoModelForCausalLM.from_pretrained(
5342 " meta-llama/Llama-3-8B" ,
5443 quantization_config = bnb_config,
5544 device_map = " auto" ,
56- # ... more config
5745)
5846# Then llama.cpp compilation for GGUF...
5947# Then manual tensor conversion...
6048```
6149
62- </td >
63- <td width =" 50% " >
50+ ### ✅ With QuantLLM (4 lines of code)
6451
65- ### ✅ With QuantLLM
6652``` python
6753from quantllm import turbo
6854
69- # One line does everything
70- model = turbo(" meta-llama/Llama-3-8B" )
71-
72- # Generate
73- print (model.generate(" Hello!" ))
74-
75- # Fine-tune
76- model.finetune(dataset, epochs = 3 )
77-
78- # Export to any format
79- model.export(" gguf" , quantization = " Q4_K_M" )
55+ model = turbo(" meta-llama/Llama-3-8B" ) # Auto-quantizes
56+ model.generate(" Hello!" ) # Generate text
57+ model.export(" gguf" , quantization = " Q4_K_M" ) # Export to GGUF
8058```
8159
82- </td >
83- </tr >
84- </table >
85-
8660---
8761
8862## ⚡ Quick Start
8963
9064### Installation
9165
9266``` bash
93- # Recommended installation
67+ # Recommended
9468pip install git+https://github.com/codewithdark-git/QuantLLM.git
9569
9670# With all export formats
@@ -102,14 +76,14 @@ pip install "quantllm[full] @ git+https://github.com/codewithdark-git/QuantLLM.g
10276``` python
10377from quantllm import turbo
10478
105- # Load any model with automatic optimization
79+ # Load with automatic optimization
10680model = turbo(" meta-llama/Llama-3.2-3B" )
10781
10882# Generate text
10983response = model.generate(" Explain quantum computing simply" )
11084print (response)
11185
112- # Export to GGUF for Ollama/llama.cpp
86+ # Export to GGUF
11387model.export(" gguf" , " model.Q4_K_M.gguf" , quantization = " Q4_K_M" )
11488```
11589
@@ -123,79 +97,59 @@ model.export("gguf", "model.Q4_K_M.gguf", quantization="Q4_K_M")
12397
12498## ✨ Features
12599
126- <table >
127- <tr >
128- <td width =" 50% " >
129-
130100### 🔥 TurboModel API
101+
102+ One unified interface for everything:
103+
131104``` python
132- # One unified API for everything
133105model = turbo(" mistralai/Mistral-7B" )
134106model.generate(" Hello!" )
135107model.finetune(data, epochs = 3 )
136108model.export(" gguf" , quantization = " Q4_K_M" )
137109model.push(" user/repo" , format = " gguf" )
138110```
139111
140- </td >
141- <td width =" 50% " >
112+ ### ⚡ Performance Optimizations
142113
143- ### ⚡ Performance
144- - ** Flash Attention 2** — Auto-enabled
114+ - ** Flash Attention 2** — Auto-enabled for speed
145115- ** torch.compile** — 2x faster training
146116- ** Dynamic Padding** — 50% less VRAM
147117- ** Triton Kernels** — Fused operations
148118
149- </td >
150- </tr >
151- <tr >
152- <td >
153-
154119### 🧠 45+ Model Architectures
155- Llama 2/3, Mistral, Mixtral, Qwen 1/2, Phi 1/2/3, Gemma, Falcon, DeepSeek, Yi, StarCoder, ChatGLM, InternLM, Baichuan, StableLM, BLOOM, OPT, MPT, GPT-NeoX...
156120
157- </td >
158- <td >
121+ Llama 2/3, Mistral, Mixtral, Qwen 1/2, Phi 1/2/3, Gemma, Falcon, DeepSeek, Yi, StarCoder, ChatGLM, InternLM, Baichuan, StableLM, BLOOM, OPT, MPT, GPT-NeoX...
159122
160123### 📦 Multi-Format Export
161- - ** GGUF** — llama.cpp, Ollama, LM Studio
162- - ** ONNX** — ONNX Runtime, TensorRT
163- - ** MLX** — Apple Silicon (M1/M2/M3/M4)
164- - ** SafeTensors** — HuggingFace
165124
166- </td >
167- </tr >
168- <tr >
169- <td >
125+ | Format | Use Case | Command |
126+ | --------| ----------| ---------|
127+ | ** GGUF** | llama.cpp, Ollama, LM Studio | ` model.export("gguf") ` |
128+ | ** ONNX** | ONNX Runtime, TensorRT | ` model.export("onnx") ` |
129+ | ** MLX** | Apple Silicon (M1/M2/M3/M4) | ` model.export("mlx") ` |
130+ | ** SafeTensors** | HuggingFace | ` model.export("safetensors") ` |
131+
132+ ### 🎨 Beautiful Console UI
170133
171- ### 🎨 Beautiful UI
172134```
173- ╔════════════════════════════════════╗
174- ║ 🚀 QuantLLM v2.0 ║
175- ║ ✓ GGUF ✓ ONNX ✓ MLX ║
176- ╚════════════════════════════════════╝
135+ ╔════════════════════════════════════════════════════════════╗
136+ ║ 🚀 QuantLLM v2.0.0 ║
137+ ║ Ultra-fast LLM Quantization & Export ║
138+ ║ ✓ GGUF ✓ ONNX ✓ MLX ✓ SafeTensors ║
139+ ╚════════════════════════════════════════════════════════════╝
177140
178141📊 Model: meta-llama/Llama-3.2-3B
179142 Parameters: 3.21B
180143 Memory: 6.4 GB → 1.9 GB (70% saved)
181144```
182145
183- </td >
184- <td >
185-
186146### 🤗 One-Click Hub Publishing
187- ``` python
188- # Auto-generates model cards with:
189- # - YAML frontmatter
190- # - Usage examples
191- # - "Use this model" button
192147
193- model.push(" user/my-model" , format = " gguf" )
194- ```
148+ Auto-generates model cards with YAML frontmatter, usage examples, and "Use this model" button:
195149
196- </ td >
197- </ tr >
198- </ table >
150+ ``` python
151+ model.push( " user/my-model " , format = " gguf " , quantization = " Q4_K_M " )
152+ ```
199153
200154---
201155
@@ -225,12 +179,12 @@ model.export("safetensors", "./model-hf/")
225179
226180| Type | Bits | Quality | Use Case |
227181| ------| ------| ---------| ----------|
228- | ` Q2_K ` | 2-bit | Low | Minimum size |
229- | ` Q3_K_M ` | 3-bit | Fair | Very constrained |
230- | ` Q4_K_M ` | 4-bit | Good | ** Recommended** ⭐ |
231- | ` Q5_K_M ` | 5-bit | High | Quality-focused |
232- | ` Q6_K ` | 6-bit | Very High | Near-original |
233- | ` Q8_0 ` | 8-bit | Excellent | Best quality |
182+ | ` Q2_K ` | 2-bit | 🔴 Low | Minimum size |
183+ | ` Q3_K_M ` | 3-bit | 🟠 Fair | Very constrained |
184+ | ` Q4_K_M ` | 4-bit | 🟢 Good | ** Recommended** ⭐ |
185+ | ` Q5_K_M ` | 5-bit | 🟢 High | Quality-focused |
186+ | ` Q6_K ` | 6-bit | 🔵 Very High | Near-original |
187+ | ` Q8_0 ` | 8-bit | 🔵 Excellent | Best quality |
234188
235189---
236190
@@ -260,17 +214,15 @@ response = model.chat(messages)
260214print (response)
261215```
262216
263- ### Load GGUF Models from HuggingFace
217+ ### Load GGUF Models
264218
265219``` python
266220from quantllm import TurboModel
267221
268- # Load any GGUF model directly
269222model = TurboModel.from_gguf(
270223 " TheBloke/Llama-2-7B-Chat-GGUF" ,
271224 filename = " llama-2-7b-chat.Q4_K_M.gguf"
272225)
273-
274226print (model.generate(" Hello!" ))
275227```
276228
@@ -281,10 +233,10 @@ from quantllm import turbo
281233
282234model = turbo(" mistralai/Mistral-7B" )
283235
284- # Simple — everything auto-configured
236+ # Simple training
285237model.finetune(" training_data.json" , epochs = 3 )
286238
287- # Advanced — full control
239+ # Advanced configuration
288240model.finetune(
289241 " training_data.json" ,
290242 epochs = 5 ,
@@ -296,6 +248,7 @@ model.finetune(
296248```
297249
298250** Supported data formats:**
251+
299252``` json
300253[
301254 {"instruction" : " What is Python?" , "output" : " Python is..." },
@@ -320,18 +273,12 @@ model.push(
320273)
321274```
322275
323- The model card includes:
324- - ✅ Proper YAML frontmatter (` library_name ` , ` tags ` , ` base_model ` )
325- - ✅ Format-specific usage examples
326- - ✅ "Use this model" button compatibility
327- - ✅ Quantization details
328-
329276---
330277
331278## 💻 Hardware Requirements
332279
333- | Configuration | GPU VRAM | Models |
334- | ---------------| ----------| --------|
280+ | Configuration | GPU VRAM | Recommended Models |
281+ | ---------------| ----------| ------------------- |
335282| 🟢 ** Entry** | 6-8 GB | 1-7B (4-bit) |
336283| 🟡 ** Mid-Range** | 12-24 GB | 7-30B (4-bit) |
337284| 🔴 ** High-End** | 24-80 GB | 70B+ |
@@ -343,7 +290,7 @@ The model card includes:
343290## 📦 Installation Options
344291
345292``` bash
346- # Basic installation
293+ # Basic
347294pip install git+https://github.com/codewithdark-git/QuantLLM.git
348295
349296# With specific features
@@ -356,17 +303,16 @@ pip install "quantllm[full]" # Everything
356303
357304---
358305
359- ## 🏗️ Architecture
306+ ## 🏗️ Project Structure
360307
361308```
362309quantllm/
363- ├── core/ # Core functionality
310+ ├── core/ # Core API
364311│ ├── turbo_model.py # TurboModel unified API
365- │ ├── smart_config.py # Auto-configuration
366- │ └── export.py # Universal exporter
312+ │ └── smart_config.py # Auto-configuration
367313├── quant/ # Quantization
368314│ └── llama_cpp.py # GGUF conversion
369- ├── hub/ # HuggingFace integration
315+ ├── hub/ # HuggingFace
370316│ ├── hub_manager.py # Push/pull models
371317│ └── model_card.py # Auto model cards
372318├── kernels/ # Custom kernels
@@ -402,12 +348,12 @@ MIT License — see [LICENSE](LICENSE) for details.
402348
403349<div align =" center " >
404350
405- ### Made with 🧡 by [ Dark Coder] ( https://github.com/codewithdark-git )
351+ ### Made with 🧡 by [ Dark Coder] ( https://github.com/codewithdark-git )
406352
407- < a href = " https://github.com/codewithdark-git/QuantLLM " >⭐ Star on GitHub</ a > •
408- < a href = " https://github.com/codewithdark-git/QuantLLM/issues " >🐛 Report Bug</ a > •
409- < a href = " https://github.com/sponsors/codewithdark-git " >💖 Sponsor</ a >
353+ [ ⭐ Star on GitHub ] ( https://github.com/codewithdark-git/QuantLLM ) •
354+ [ 🐛 Report Bug ] ( https://github.com/codewithdark-git/QuantLLM/issues ) •
355+ [ 💖 Sponsor ] ( https://github.com/sponsors/codewithdark-git )
410356
411- ** Happy Quantizing! 🚀**
357+ ** Happy Quantizing! 🚀**
412358
413359</div >
0 commit comments