11<div align =" center " >
22 <img src =" docs/images/1.png " alt =" QuantLLM Logo " />
33
4- # 🚀 QuantLLM v2.0
4+ # 🚀 QuantLLM v2.1 (pre-release)
55
66 ** The Ultra-Fast LLM Quantization & Export Library**
77
@@ -52,9 +52,12 @@ model = AutoModelForCausalLM.from_pretrained(
5252``` python
5353from quantllm import turbo
5454
55- model = turbo(" meta-llama/Llama-3-8B" ) # Auto-quantizes
55+ model = turbo(
56+ " meta-llama/Llama-3-8B" ,
57+ config = {" format" : " gguf" , " quantization" : " Q4_K_M" , " push_format" : " gguf" },
58+ ) # Auto-quantizes
5659model.generate(" Hello!" ) # Generate text
57- model.export(" gguf " , quantization = " Q4_K_M " ) # Export to GGUF
60+ model.export() # Export to GGUF with shared config
5861```
5962
6063---
@@ -77,14 +80,17 @@ pip install "quantllm[full] @ git+https://github.com/codewithdark-git/QuantLLM.g
7780from quantllm import turbo
7881
7982# Load with automatic optimization
80- model = turbo(" meta-llama/Llama-3.2-3B" )
83+ model = turbo(
84+ " meta-llama/Llama-3.2-3B" ,
85+ config = {" format" : " gguf" , " quantization" : " Q4_K_M" , " push_format" : " gguf" },
86+ )
8187
8288# Generate text
8389response = model.generate(" Explain quantum computing simply" )
8490print (response)
8591
8692# Export to GGUF
87- model.export(" gguf" , " model.Q4_K_M.gguf" , quantization = " Q4_K_M " )
93+ model.export(" gguf" , " model.Q4_K_M.gguf" )
8894```
8995
9096** QuantLLM automatically:**
@@ -102,11 +108,14 @@ model.export("gguf", "model.Q4_K_M.gguf", quantization="Q4_K_M")
102108One unified interface for everything:
103109
104110``` python
105- model = turbo(" mistralai/Mistral-7B" )
111+ model = turbo(
112+ " mistralai/Mistral-7B" ,
113+ config = {" format" : " gguf" , " quantization" : " Q4_K_M" , " push_format" : " gguf" },
114+ )
106115model.generate(" Hello!" )
107116model.finetune(data, epochs = 3 )
108- model.export(" gguf " , quantization = " Q4_K_M " )
109- model.push(" user/repo" , format = " gguf " )
117+ model.export()
118+ model.push(" user/repo" )
110119```
111120
112121### ⚡ Performance Optimizations
@@ -133,7 +142,7 @@ Llama 2/3, Mistral, Mixtral, Qwen 1/2, Phi 1/2/3, Gemma, Falcon, DeepSeek, Yi, S
133142
134143```
135144╔════════════════════════════════════════════════════════════╗
136- ║ 🚀 QuantLLM v2.0.0 ║
145+ ║ 🚀 QuantLLM v2.1.0rc1 ║
137146║ Ultra-fast LLM Quantization & Export ║
138147║ ✓ GGUF ✓ ONNX ✓ MLX ✓ SafeTensors ║
139148╚════════════════════════════════════════════════════════════╝
@@ -148,7 +157,7 @@ Llama 2/3, Mistral, Mixtral, Qwen 1/2, Phi 1/2/3, Gemma, Falcon, DeepSeek, Yi, S
148157Auto-generates model cards with YAML frontmatter, usage examples, and "Use this model" button:
149158
150159``` python
151- model.push(" user/my-model" , format = " gguf " , quantization = " Q4_K_M " )
160+ model.push(" user/my-model" )
152161```
153162
154163---
@@ -195,7 +204,10 @@ model.export("safetensors", "./model-hf/")
195204``` python
196205from quantllm import turbo
197206
198- model = turbo(" meta-llama/Llama-3.2-3B" )
207+ model = turbo(
208+ " meta-llama/Llama-3.2-3B" ,
209+ config = {" format" : " gguf" , " quantization" : " Q4_K_M" , " push_format" : " gguf" },
210+ )
199211
200212# Simple generation
201213response = model.generate(
@@ -267,8 +279,6 @@ model = turbo("meta-llama/Llama-3.2-3B")
267279# Push with auto-generated model card
268280model.push(
269281 " your-username/my-model" ,
270- format = " gguf" ,
271- quantization = " Q4_K_M" ,
272282 license = " apache-2.0"
273283)
274284```
0 commit comments