Skip to content

Commit 67755b9

Browse files
Merge pull request #1872 from Abdennacer-Badaoui/quick-start-doc
[Docs] Create quickstart guide
2 parents a4ed4f6 + fbcf010 commit 67755b9

File tree

2 files changed

+139
-7
lines changed

2 files changed

+139
-7
lines changed

agents/kbit_gemm_context.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1089,7 +1089,7 @@ void kbit_gemm(
10891089
cudaDeviceGetAttribute(&sms, cudaDevAttrMultiProcessorCount, dev);
10901090
int max_shmem;
10911091
cudaDeviceGetAttribute(&max_shmem,
1092-
cudaDevAttrMaxSharedMemoryPerBlockOptin, dev);
1092+
cudaDevAttrMaxSharedMemoryPerBlockOption, dev);
10931093

10941094
// Choose M-blocking
10951095
int m_blocks;

docs/source/quickstart.mdx

Lines changed: 138 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,147 @@
11
# Quickstart
22

3-
## How does it work?
3+
Welcome to bitsandbytes! This library enables accessible large language models via k-bit quantization for PyTorch, dramatically reducing memory consumption for inference and training.
44

5-
... work in progress ...
5+
## Installation
66

7-
(Community contributions would we very welcome!)
7+
```bash
8+
pip install bitsandbytes
9+
```
10+
11+
**Requirements:** Python 3.10+, PyTorch 2.3+
12+
13+
For detailed installation instructions, see the [Installation Guide](./installation).
14+
15+
## What is bitsandbytes?
16+
17+
bitsandbytes provides three main features:
18+
19+
- **LLM.int8()**: 8-bit quantization for inference (50% memory reduction)
20+
- **QLoRA**: 4-bit quantization for training (75% memory reduction)
21+
- **8-bit Optimizers**: Memory-efficient optimizers for training
22+
23+
## Quick Examples
24+
25+
### 8-bit Inference
26+
27+
Load and run a model using 8-bit quantization:
28+
29+
```py
30+
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
31+
32+
model = AutoModelForCausalLM.from_pretrained(
33+
"meta-llama/Llama-2-7b-hf",
34+
device_map="auto",
35+
quantization_config=BitsAndBytesConfig(load_in_8bit=True),
36+
)
837

9-
## Minimal examples
38+
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-hf")
39+
inputs = tokenizer("Hello, my name is", return_tensors="pt").to("cuda")
40+
outputs = model.generate(**inputs, max_new_tokens=20)
41+
print(tokenizer.decode(outputs[0]))
42+
```
43+
44+
> **Learn more:** See the [Integrations guide](./integrations) for more details on using bitsandbytes with Transformers.
45+
46+
### 4-bit Quantization
1047

11-
The following code illustrates the steps above.
48+
For even greater memory savings:
1249

1350
```py
14-
code examples will soon follow
51+
import torch
52+
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
53+
54+
bnb_config = BitsAndBytesConfig(
55+
load_in_4bit=True,
56+
bnb_4bit_compute_dtype=torch.bfloat16,
57+
bnb_4bit_quant_type="nf4",
58+
)
59+
60+
model = AutoModelForCausalLM.from_pretrained(
61+
"meta-llama/Llama-2-7b-hf",
62+
quantization_config=bnb_config,
63+
device_map="auto",
64+
)
1565
```
66+
67+
### QLoRA Fine-tuning
68+
69+
Combine 4-bit quantization with LoRA for efficient training:
70+
71+
```py
72+
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
73+
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
74+
75+
# Load 4-bit model
76+
bnb_config = BitsAndBytesConfig(load_in_4bit=True)
77+
model = AutoModelForCausalLM.from_pretrained(
78+
"meta-llama/Llama-2-7b-hf",
79+
quantization_config=bnb_config,
80+
)
81+
82+
# Prepare for training
83+
model = prepare_model_for_kbit_training(model)
84+
85+
# Add LoRA adapters
86+
lora_config = LoraConfig(
87+
r=16,
88+
lora_alpha=32,
89+
target_modules=["q_proj", "v_proj"],
90+
task_type="CAUSAL_LM",
91+
)
92+
model = get_peft_model(model, lora_config)
93+
94+
# Now train with your preferred trainer
95+
```
96+
97+
> **Learn more:** See the [FSDP-QLoRA guide](./fsdp_qlora) for advanced training techniques and the [Integrations guide](./integrations) for using with PEFT.
98+
99+
### 8-bit Optimizers
100+
101+
Use 8-bit optimizers to reduce training memory by 75%:
102+
103+
```py
104+
import bitsandbytes as bnb
105+
106+
model = YourModel()
107+
108+
# Replace standard optimizer with 8-bit version
109+
optimizer = bnb.optim.Adam8bit(model.parameters(), lr=1e-3)
110+
111+
# Use in training loop as normal
112+
for batch in dataloader:
113+
loss = model(batch)
114+
loss.backward()
115+
optimizer.step()
116+
optimizer.zero_grad()
117+
```
118+
119+
> **Learn more:** See the [8-bit Optimizers guide](./optimizers) for detailed usage and configuration options.
120+
121+
### Custom Quantized Layers
122+
123+
Use quantized linear layers directly in your models:
124+
125+
```py
126+
import torch
127+
import bitsandbytes as bnb
128+
129+
# 8-bit linear layer
130+
linear_8bit = bnb.nn.Linear8bitLt(1024, 1024, has_fp16_weights=False)
131+
132+
# 4-bit linear layer
133+
linear_4bit = bnb.nn.Linear4bit(1024, 1024, compute_dtype=torch.bfloat16)
134+
```
135+
136+
## Next Steps
137+
138+
- [8-bit Optimizers Guide](./optimizers) - Detailed optimizer usage
139+
- [FSDP-QLoRA](./fsdp_qlora) - Train 70B+ models on consumer GPUs
140+
- [Integrations](./integrations) - Use with Transformers, PEFT, Accelerate
141+
- [FAQs](./faqs) - Common questions and troubleshooting
142+
143+
## Getting Help
144+
145+
- Check the [FAQs](./faqs) and [Common Errors](./errors)
146+
- Visit [official documentation](https://huggingface.co/docs/bitsandbytes)
147+
- Open an issue on [GitHub](https://github.com/bitsandbytes-foundation/bitsandbytes/issues)

0 commit comments

Comments
 (0)