Skip to content

Commit 9c456b1

Browse files
Update Doc.
1 parent 264aad1 commit 9c456b1

File tree

2 files changed

+266
-256
lines changed

2 files changed

+266
-256
lines changed

README.md

Lines changed: 69 additions & 123 deletions
Original file line numberDiff line numberDiff line change
@@ -1,44 +1,33 @@
11
<div align="center">
2-
<img src="docs/images/logo.png" alt="QuantLLM Logo" />
2+
<img src="docs/images/logo.png" alt="QuantLLM Logo" width="150"/>
33

44
# 🚀 QuantLLM v2.0
55

6-
<p align="center">
7-
<strong>The Ultra-Fast LLM Quantization & Export Library</strong>
8-
</p>
9-
10-
<p align="center">
11-
<a href="https://pepy.tech/projects/quantllm"><img src="https://static.pepy.tech/badge/quantllm" alt="Downloads"/></a>
12-
<img alt="PyPI - Version" src="https://img.shields.io/pypi/v/quantllm?logo=pypi&label=version&color=orange"/>
13-
<img alt="Python" src="https://img.shields.io/badge/python-3.10+-orange.svg"/>
14-
<img alt="License" src="https://img.shields.io/badge/license-MIT-orange.svg"/>
15-
<img alt="Stars" src="https://img.shields.io/github/stars/codewithdark-git/QuantLLM?style=social"/>
16-
</p>
17-
18-
<p align="center">
19-
<b>Load → Quantize → Fine-tune → Export</b> — All in One Line
20-
</p>
6+
**The Ultra-Fast LLM Quantization & Export Library**
7+
8+
[![Downloads](https://static.pepy.tech/badge/quantllm)](https://pepy.tech/projects/quantllm)
9+
[![PyPI](https://img.shields.io/pypi/v/quantllm?logo=pypi&label=version&color=orange)](https://pypi.org/project/quantllm/)
10+
[![Python](https://img.shields.io/badge/python-3.10+-orange.svg)](https://www.python.org/)
11+
[![License](https://img.shields.io/badge/license-MIT-orange.svg)](LICENSE)
12+
[![Stars](https://img.shields.io/github/stars/codewithdark-git/QuantLLM?style=social)](https://github.com/codewithdark-git/QuantLLM)
13+
14+
**Load → Quantize → Fine-tune → Export** — All in One Line
2115

22-
<p align="center">
23-
<a href="#-quick-start">Quick Start</a> •
24-
<a href="#-features">Features</a> •
25-
<a href="#-export-formats">Export Formats</a> •
26-
<a href="#-examples">Examples</a> •
27-
<a href="https://quantllm.readthedocs.io">Documentation</a>
28-
</p>
16+
[Quick Start](#-quick-start)
17+
[Features](#-features)
18+
[Export Formats](#-export-formats)
19+
[Examples](#-examples)
20+
[Documentation](https://quantllm.readthedocs.io)
21+
2922
</div>
3023

3124
---
3225

3326
## 🎯 Why QuantLLM?
3427

35-
<table>
36-
<tr>
37-
<td width="50%">
28+
### ❌ Without QuantLLM (50+ lines of code)
3829

39-
### ❌ Without QuantLLM
4030
```python
41-
# 50+ lines of configuration...
4231
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
4332
from peft import LoraConfig, get_peft_model
4433
import torch
@@ -53,44 +42,29 @@ model = AutoModelForCausalLM.from_pretrained(
5342
"meta-llama/Llama-3-8B",
5443
quantization_config=bnb_config,
5544
device_map="auto",
56-
# ... more config
5745
)
5846
# Then llama.cpp compilation for GGUF...
5947
# Then manual tensor conversion...
6048
```
6149

62-
</td>
63-
<td width="50%">
50+
### ✅ With QuantLLM (4 lines of code)
6451

65-
### ✅ With QuantLLM
6652
```python
6753
from quantllm import turbo
6854

69-
# One line does everything
70-
model = turbo("meta-llama/Llama-3-8B")
71-
72-
# Generate
73-
print(model.generate("Hello!"))
74-
75-
# Fine-tune
76-
model.finetune(dataset, epochs=3)
77-
78-
# Export to any format
79-
model.export("gguf", quantization="Q4_K_M")
55+
model = turbo("meta-llama/Llama-3-8B") # Auto-quantizes
56+
model.generate("Hello!") # Generate text
57+
model.export("gguf", quantization="Q4_K_M") # Export to GGUF
8058
```
8159

82-
</td>
83-
</tr>
84-
</table>
85-
8660
---
8761

8862
## ⚡ Quick Start
8963

9064
### Installation
9165

9266
```bash
93-
# Recommended installation
67+
# Recommended
9468
pip install git+https://github.com/codewithdark-git/QuantLLM.git
9569

9670
# With all export formats
@@ -102,14 +76,14 @@ pip install "quantllm[full] @ git+https://github.com/codewithdark-git/QuantLLM.g
10276
```python
10377
from quantllm import turbo
10478

105-
# Load any model with automatic optimization
79+
# Load with automatic optimization
10680
model = turbo("meta-llama/Llama-3.2-3B")
10781

10882
# Generate text
10983
response = model.generate("Explain quantum computing simply")
11084
print(response)
11185

112-
# Export to GGUF for Ollama/llama.cpp
86+
# Export to GGUF
11387
model.export("gguf", "model.Q4_K_M.gguf", quantization="Q4_K_M")
11488
```
11589

@@ -123,79 +97,59 @@ model.export("gguf", "model.Q4_K_M.gguf", quantization="Q4_K_M")
12397

12498
## ✨ Features
12599

126-
<table>
127-
<tr>
128-
<td width="50%">
129-
130100
### 🔥 TurboModel API
101+
102+
One unified interface for everything:
103+
131104
```python
132-
# One unified API for everything
133105
model = turbo("mistralai/Mistral-7B")
134106
model.generate("Hello!")
135107
model.finetune(data, epochs=3)
136108
model.export("gguf", quantization="Q4_K_M")
137109
model.push("user/repo", format="gguf")
138110
```
139111

140-
</td>
141-
<td width="50%">
112+
### ⚡ Performance Optimizations
142113

143-
### ⚡ Performance
144-
- **Flash Attention 2** — Auto-enabled
114+
- **Flash Attention 2** — Auto-enabled for speed
145115
- **torch.compile** — 2x faster training
146116
- **Dynamic Padding** — 50% less VRAM
147117
- **Triton Kernels** — Fused operations
148118

149-
</td>
150-
</tr>
151-
<tr>
152-
<td>
153-
154119
### 🧠 45+ Model Architectures
155-
Llama 2/3, Mistral, Mixtral, Qwen 1/2, Phi 1/2/3, Gemma, Falcon, DeepSeek, Yi, StarCoder, ChatGLM, InternLM, Baichuan, StableLM, BLOOM, OPT, MPT, GPT-NeoX...
156120

157-
</td>
158-
<td>
121+
Llama 2/3, Mistral, Mixtral, Qwen 1/2, Phi 1/2/3, Gemma, Falcon, DeepSeek, Yi, StarCoder, ChatGLM, InternLM, Baichuan, StableLM, BLOOM, OPT, MPT, GPT-NeoX...
159122

160123
### 📦 Multi-Format Export
161-
- **GGUF** — llama.cpp, Ollama, LM Studio
162-
- **ONNX** — ONNX Runtime, TensorRT
163-
- **MLX** — Apple Silicon (M1/M2/M3/M4)
164-
- **SafeTensors** — HuggingFace
165124

166-
</td>
167-
</tr>
168-
<tr>
169-
<td>
125+
| Format | Use Case | Command |
126+
|--------|----------|---------|
127+
| **GGUF** | llama.cpp, Ollama, LM Studio | `model.export("gguf")` |
128+
| **ONNX** | ONNX Runtime, TensorRT | `model.export("onnx")` |
129+
| **MLX** | Apple Silicon (M1/M2/M3/M4) | `model.export("mlx")` |
130+
| **SafeTensors** | HuggingFace | `model.export("safetensors")` |
131+
132+
### 🎨 Beautiful Console UI
170133

171-
### 🎨 Beautiful UI
172134
```
173-
╔════════════════════════════════════╗
174-
║ 🚀 QuantLLM v2.0 ║
175-
║ ✓ GGUF ✓ ONNX ✓ MLX ║
176-
╚════════════════════════════════════╝
135+
╔════════════════════════════════════════════════════════════╗
136+
║ 🚀 QuantLLM v2.0.0 ║
137+
║ Ultra-fast LLM Quantization & Export ║
138+
║ ✓ GGUF ✓ ONNX ✓ MLX ✓ SafeTensors ║
139+
╚════════════════════════════════════════════════════════════╝
177140
178141
📊 Model: meta-llama/Llama-3.2-3B
179142
Parameters: 3.21B
180143
Memory: 6.4 GB → 1.9 GB (70% saved)
181144
```
182145

183-
</td>
184-
<td>
185-
186146
### 🤗 One-Click Hub Publishing
187-
```python
188-
# Auto-generates model cards with:
189-
# - YAML frontmatter
190-
# - Usage examples
191-
# - "Use this model" button
192147

193-
model.push("user/my-model", format="gguf")
194-
```
148+
Auto-generates model cards with YAML frontmatter, usage examples, and "Use this model" button:
195149

196-
</td>
197-
</tr>
198-
</table>
150+
```python
151+
model.push("user/my-model", format="gguf", quantization="Q4_K_M")
152+
```
199153

200154
---
201155

@@ -225,12 +179,12 @@ model.export("safetensors", "./model-hf/")
225179

226180
| Type | Bits | Quality | Use Case |
227181
|------|------|---------|----------|
228-
| `Q2_K` | 2-bit | Low | Minimum size |
229-
| `Q3_K_M` | 3-bit | Fair | Very constrained |
230-
| `Q4_K_M` | 4-bit | Good | **Recommended**|
231-
| `Q5_K_M` | 5-bit | High | Quality-focused |
232-
| `Q6_K` | 6-bit | Very High | Near-original |
233-
| `Q8_0` | 8-bit | Excellent | Best quality |
182+
| `Q2_K` | 2-bit | 🔴 Low | Minimum size |
183+
| `Q3_K_M` | 3-bit | 🟠 Fair | Very constrained |
184+
| `Q4_K_M` | 4-bit | 🟢 Good | **Recommended**|
185+
| `Q5_K_M` | 5-bit | 🟢 High | Quality-focused |
186+
| `Q6_K` | 6-bit | 🔵 Very High | Near-original |
187+
| `Q8_0` | 8-bit | 🔵 Excellent | Best quality |
234188

235189
---
236190

@@ -260,17 +214,15 @@ response = model.chat(messages)
260214
print(response)
261215
```
262216

263-
### Load GGUF Models from HuggingFace
217+
### Load GGUF Models
264218

265219
```python
266220
from quantllm import TurboModel
267221

268-
# Load any GGUF model directly
269222
model = TurboModel.from_gguf(
270223
"TheBloke/Llama-2-7B-Chat-GGUF",
271224
filename="llama-2-7b-chat.Q4_K_M.gguf"
272225
)
273-
274226
print(model.generate("Hello!"))
275227
```
276228

@@ -281,10 +233,10 @@ from quantllm import turbo
281233

282234
model = turbo("mistralai/Mistral-7B")
283235

284-
# Simple — everything auto-configured
236+
# Simple training
285237
model.finetune("training_data.json", epochs=3)
286238

287-
# Advanced — full control
239+
# Advanced configuration
288240
model.finetune(
289241
"training_data.json",
290242
epochs=5,
@@ -296,6 +248,7 @@ model.finetune(
296248
```
297249

298250
**Supported data formats:**
251+
299252
```json
300253
[
301254
{"instruction": "What is Python?", "output": "Python is..."},
@@ -320,18 +273,12 @@ model.push(
320273
)
321274
```
322275

323-
The model card includes:
324-
- ✅ Proper YAML frontmatter (`library_name`, `tags`, `base_model`)
325-
- ✅ Format-specific usage examples
326-
- ✅ "Use this model" button compatibility
327-
- ✅ Quantization details
328-
329276
---
330277

331278
## 💻 Hardware Requirements
332279

333-
| Configuration | GPU VRAM | Models |
334-
|---------------|----------|--------|
280+
| Configuration | GPU VRAM | Recommended Models |
281+
|---------------|----------|-------------------|
335282
| 🟢 **Entry** | 6-8 GB | 1-7B (4-bit) |
336283
| 🟡 **Mid-Range** | 12-24 GB | 7-30B (4-bit) |
337284
| 🔴 **High-End** | 24-80 GB | 70B+ |
@@ -343,7 +290,7 @@ The model card includes:
343290
## 📦 Installation Options
344291

345292
```bash
346-
# Basic installation
293+
# Basic
347294
pip install git+https://github.com/codewithdark-git/QuantLLM.git
348295

349296
# With specific features
@@ -356,17 +303,16 @@ pip install "quantllm[full]" # Everything
356303

357304
---
358305

359-
## 🏗️ Architecture
306+
## 🏗️ Project Structure
360307

361308
```
362309
quantllm/
363-
├── core/ # Core functionality
310+
├── core/ # Core API
364311
│ ├── turbo_model.py # TurboModel unified API
365-
│ ├── smart_config.py # Auto-configuration
366-
│ └── export.py # Universal exporter
312+
│ └── smart_config.py # Auto-configuration
367313
├── quant/ # Quantization
368314
│ └── llama_cpp.py # GGUF conversion
369-
├── hub/ # HuggingFace integration
315+
├── hub/ # HuggingFace
370316
│ ├── hub_manager.py # Push/pull models
371317
│ └── model_card.py # Auto model cards
372318
├── kernels/ # Custom kernels
@@ -402,12 +348,12 @@ MIT License — see [LICENSE](LICENSE) for details.
402348

403349
<div align="center">
404350

405-
### Made with 🧡 by [Dark Coder](https://github.com/codewithdark-git)
351+
### Made with 🧡 by [Dark Coder](https://github.com/codewithdark-git)
406352

407-
<a href="https://github.com/codewithdark-git/QuantLLM">⭐ Star on GitHub</a>
408-
<a href="https://github.com/codewithdark-git/QuantLLM/issues">🐛 Report Bug</a>
409-
<a href="https://github.com/sponsors/codewithdark-git">💖 Sponsor</a>
353+
[⭐ Star on GitHub](https://github.com/codewithdark-git/QuantLLM)
354+
[🐛 Report Bug](https://github.com/codewithdark-git/QuantLLM/issues)
355+
[💖 Sponsor](https://github.com/sponsors/codewithdark-git)
410356

411-
**Happy Quantizing! 🚀**
357+
**Happy Quantizing! 🚀**
412358

413359
</div>

0 commit comments

Comments
 (0)