Skip to content

Commit 521dfb1

Browse files
documentation
1 parent 17d32f1 commit 521dfb1

1 file changed

Lines changed: 130 additions & 6 deletions

File tree

docs/source/quickstart.mdx

Lines changed: 130 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,139 @@
11
# Quickstart
22

3-
## How does it work?
3+
Welcome to bitsandbytes! This library enables accessible large language models via k-bit quantization for PyTorch, dramatically reducing memory consumption for inference and training.
44

5-
... work in progress ...
5+
## Installation
66

7-
(Community contributions would we very welcome!)
7+
```bash
8+
pip install bitsandbytes
9+
```
10+
11+
**Requirements:** Python 3.10+, PyTorch 2.3+
12+
13+
For detailed installation instructions, see the [Installation Guide](./installation).
14+
15+
## What is bitsandbytes?
16+
17+
bitsandbytes provides three main features:
18+
19+
- **LLM.int8()**: 8-bit quantization for inference (50% memory reduction)
20+
- **QLoRA**: 4-bit quantization for training (75% memory reduction)
21+
- **8-bit Optimizers**: Memory-efficient optimizers for training
22+
23+
## Quick Examples
24+
25+
### 8-bit Inference
26+
27+
Load and run a model using 8-bit quantization:
28+
29+
```py
30+
from transformers import AutoModelForCausalLM, AutoTokenizer
31+
32+
model = AutoModelForCausalLM.from_pretrained(
33+
"meta-llama/Llama-2-7b-hf",
34+
device_map="auto",
35+
load_in_8bit=True,
36+
)
37+
38+
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-hf")
39+
inputs = tokenizer("Hello, my name is", return_tensors="pt").to("cuda")
40+
outputs = model.generate(**inputs, max_new_tokens=20)
41+
print(tokenizer.decode(outputs[0]))
42+
```
43+
44+
### 4-bit Quantization
45+
46+
For even greater memory savings:
47+
48+
```py
49+
import torch
50+
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
51+
52+
bnb_config = BitsAndBytesConfig(
53+
load_in_4bit=True,
54+
bnb_4bit_compute_dtype=torch.bfloat16,
55+
)
56+
57+
model = AutoModelForCausalLM.from_pretrained(
58+
"meta-llama/Llama-2-7b-hf",
59+
quantization_config=bnb_config,
60+
device_map="auto",
61+
)
62+
```
63+
64+
### QLoRA Fine-tuning
865

9-
## Minimal examples
66+
Combine 4-bit quantization with LoRA for efficient training:
1067

11-
The following code illustrates the steps above.
68+
```py
69+
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
70+
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
71+
72+
# Load 4-bit model
73+
bnb_config = BitsAndBytesConfig(load_in_4bit=True)
74+
model = AutoModelForCausalLM.from_pretrained(
75+
"meta-llama/Llama-2-7b-hf",
76+
quantization_config=bnb_config,
77+
)
78+
79+
# Prepare for training
80+
model = prepare_model_for_kbit_training(model)
81+
82+
# Add LoRA adapters
83+
lora_config = LoraConfig(
84+
r=16,
85+
lora_alpha=32,
86+
target_modules=["q_proj", "v_proj"],
87+
task_type="CAUSAL_LM",
88+
)
89+
model = get_peft_model(model, lora_config)
90+
91+
# Now train with your preferred trainer
92+
```
93+
94+
### 8-bit Optimizers
95+
96+
Use 8-bit optimizers to reduce training memory by 75%:
97+
98+
```py
99+
import bitsandbytes as bnb
100+
101+
model = YourModel()
102+
103+
# Replace standard optimizer with 8-bit version
104+
optimizer = bnb.optim.Adam8bit(model.parameters(), lr=1e-3)
105+
106+
# Use in training loop as normal
107+
for batch in dataloader:
108+
loss = model(batch)
109+
loss.backward()
110+
optimizer.step()
111+
optimizer.zero_grad()
112+
```
113+
114+
### Custom Quantized Layers
115+
116+
Use quantized linear layers directly in your models:
12117

13118
```py
14-
code examples will soon follow
119+
import bitsandbytes as bnb
120+
121+
# 8-bit linear layer
122+
linear_8bit = bnb.nn.Linear8bitLt(1024, 1024, has_fp16_weights=False)
123+
124+
# 4-bit linear layer
125+
linear_4bit = bnb.nn.Linear4bit(1024, 1024, compute_dtype=torch.bfloat16)
15126
```
127+
128+
## Next Steps
129+
130+
- [8-bit Optimizers Guide](./optimizers) - Detailed optimizer usage
131+
- [FSDP-QLoRA](./fsdp_qlora) - Train 70B+ models on consumer GPUs
132+
- [Integrations](./integrations) - Use with Transformers, PEFT, Accelerate
133+
- [FAQs](./faqs) - Common questions and troubleshooting
134+
135+
## Getting Help
136+
137+
- Check the [FAQs](./faqs) and [Common Errors](./errors)
138+
- Visit [official documentation](https://huggingface.co/docs/bitsandbytes)
139+
- Open an issue on [GitHub](https://github.com/bitsandbytes-foundation/bitsandbytes/issues)

0 commit comments

Comments
 (0)