Error Message:
RuntimeError: CUDA out of memory. Tried to allocate 2.00 GiB
π§ Solutions (Try in order):
1. Reduce Batch Size
SFTConfig(
per_device_train_batch_size=1, # Start with 1
gradient_accumulation_steps=4, # Maintain effective batch size
)2. Enable 4-bit Quantization
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="your_model",
load_in_4bit=True, # 75% memory reduction
)3. Lower LoRA Rank
model = FastLanguageModel.get_peft_model(
model,
r=8, # Down from 16 or 32
)4. Enable Gradient Checkpointing
model = FastLanguageModel.get_peft_model(
model,
use_gradient_checkpointing="unsloth", # 40% memory savings
)5. Clear GPU Cache
import torch
torch.cuda.empty_cache()
import gc; gc.collect()Symptoms: Memory usage gradually increases
π§ Solutions:
# Add to training config
SFTConfig(
dataloader_pin_memory=False, # Reduce memory pressure
dataloader_num_workers=0, # Avoid multiprocessing issues
save_steps=50, # Regular cleanup
)
# Manual cleanup every N steps
if step % 50 == 0:
torch.cuda.empty_cache()Error Message:
ERROR: Could not build wheels for unsloth
π§ Solutions:
1. Use Colab-Specific Installation
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"2. Install Dependencies First
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
!pip install transformers accelerate peft bitsandbytes
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"3. For Local Installation
# Check CUDA version first
nvcc --version
# Install matching PyTorch
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
# Install Unsloth
pip install "unsloth[cu121-ampere-torch230] @ git+https://github.com/unslothai/unsloth.git"Error Message:
AttributeError: 'TrainingArguments' object has no attribute 'padding_value'
π§ Solution:
# Use exact compatible versions
!pip install transformers==4.45.2
!pip install trl==0.11.4
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"Symptoms: Training takes much longer than expected
π§ Optimizations:
1. Use Unsloth Optimizations
from unsloth import FastLanguageModel # 2x speedup
model, tokenizer = FastLanguageModel.from_pretrained(...)2. Enable Mixed Precision
SFTConfig(
fp16=not torch.cuda.is_bf16_supported(),
bf16=torch.cuda.is_bf16_supported(),
)3. Optimize Batch Size
# Find optimal batch size for your GPU
batch_sizes_to_try = [1, 2, 4, 8]
for batch_size in batch_sizes_to_try:
try:
# Test training step
break
except RuntimeError:
continue4. Disable Unnecessary Features
SFTConfig(
logging_steps=50, # Less frequent logging
save_strategy="no", # No checkpointing during training
evaluation_strategy="no", # No evaluation during training
)Symptoms: Training loss keeps increasing
π§ Solutions:
1. Lower Learning Rate
SFTConfig(
learning_rate=1e-5, # Down from 2e-4
)2. Add Gradient Clipping
SFTConfig(
max_grad_norm=1.0, # Prevent gradient explosion
)3. Check Data Quality
# Inspect your dataset
for example in dataset.take(5):
print(f"Input length: {len(example['text'])}")
print(f"Sample: {example['text'][:200]}...")Symptoms: Gibberish or incoherent responses
π§ Debugging Steps:
1. Check Tokenizer Setup
# Ensure proper tokenizer configuration
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token2. Verify Data Formatting
# Check if chat template is applied correctly
sample = dataset[0]
print("Formatted text:")
print(sample['text'])3. Test Base Model First
# Before fine-tuning, test the base model
base_model, tokenizer = FastLanguageModel.from_pretrained("base_model")
FastLanguageModel.for_inference(base_model)
# Generate and check output quality4. Reduce Learning Rate
SFTConfig(
learning_rate=5e-5, # Much lower
warmup_ratio=0.1, # Gradual warmup
)Symptoms: Model ignores prompts or gives generic responses
π§ Solutions:
1. Use DPO Training
# After SFT, add DPO training for better instruction following
from trl import DPOTrainer
# Add preference data and DPO training2. Improve Dataset Quality
# Ensure clear instruction-response format
good_format = {
"instruction": "Explain photosynthesis",
"output": "Photosynthesis is the process..."
}3. Use Better Chat Template
# Apply proper chat template
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": instruction},
{"role": "assistant", "content": response}
]
text = tokenizer.apply_chat_template(messages, tokenize=False)Error Message:
DatasetGenerationError: An error occurred while generating the dataset
π§ Solutions:
1. Check Dataset Format
# Verify dataset exists and is accessible
from datasets import load_dataset
dataset = load_dataset("dataset_name", split="train")
print(dataset.features)2. Handle Missing Columns
# Check required columns exist
required_columns = ['instruction', 'output']
available_columns = dataset.column_names
missing = [col for col in required_columns if col not in available_columns]
if missing:
print(f"Missing columns: {missing}")3. Use Local Dataset
# If online loading fails, download manually
dataset = load_dataset("json", data_files="your_data.jsonl")Symptoms: Model performs badly after training
π§ Diagnostic Steps:
1. Check Dataset Size
print(f"Dataset size: {len(dataset)}")
# Need at least 100-200 examples for meaningful results2. Inspect Data Quality
# Look for patterns in your data
lengths = [len(example['text']) for example in dataset]
print(f"Average length: {sum(lengths)/len(lengths)}")
print(f"Max length: {max(lengths)}")
# Check for duplicates
texts = [example['text'] for example in dataset]
unique_texts = set(texts)
print(f"Unique examples: {len(unique_texts)} / {len(texts)}")3. Validate Formatting
# Check if examples follow expected format
for i, example in enumerate(dataset.take(3)):
print(f"Example {i}:")
print(example['text'])
print("-" * 50)Symptoms: Training interrupted by Colab timeouts
π§ Prevention:
1. Enable Automatic Saving
SFTConfig(
save_strategy="steps",
save_steps=100,
output_dir="checkpoints",
)2. Use Shorter Training Runs
SFTConfig(
max_steps=200, # Instead of epochs
)3. Activity Script
// Run in browser console to prevent idle timeout
function ClickConnect(){
console.log("Working");
document.querySelector("colab-toolbar-button#connect").click()
}
setInterval(ClickConnect,60000)Error Message:
ModuleNotFoundError: No module named 'unsloth'
π§ Solutions:
1. Restart Runtime
# After installation, restart runtime
Runtime > Restart Runtime
2. Check Installation
import subprocess
result = subprocess.run(['pip', 'list'], capture_output=True, text=True)
if 'unsloth' in result.stdout:
print("β
Unsloth installed")
else:
print("β Unsloth not found")3. Reinstall Clean
!pip uninstall unsloth -y
!pip cache purge
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"def quick_model_test(model, tokenizer):
"""Test model with simple prompts"""
test_prompts = [
"Hello, how are you?",
"Explain machine learning in one sentence.",
"Write a haiku about programming.",
]
FastLanguageModel.for_inference(model)
for prompt in test_prompts:
inputs = tokenizer([prompt], return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(f"Q: {prompt}")
print(f"A: {response[len(prompt):]}")
print("-" * 40)import time
def benchmark_model(model, tokenizer, num_tests=10):
"""Benchmark inference speed"""
prompt = "Explain artificial intelligence."
inputs = tokenizer([prompt], return_tensors="pt")
# Warmup
model.generate(**inputs, max_new_tokens=50)
# Benchmark
times = []
for _ in range(num_tests):
start = time.time()
outputs = model.generate(**inputs, max_new_tokens=50)
end = time.time()
times.append(end - start)
avg_time = sum(times) / len(times)
print(f"Average inference time: {avg_time:.2f}s")
print(f"Tokens per second: {50/avg_time:.1f}")Error Message:
AttributeError: 'FastVisionModel' object has no attribute 'from_pretrained'
π§ Solutions:
1. Use Correct Import
from unsloth import FastVisionModel # Not FastLanguageModel
model, tokenizer = FastVisionModel.from_pretrained(
model_name="unsloth/Qwen2-VL-7B-Instruct",
load_in_4bit=True,
)2. Check Image Format
# Ensure images are in correct format
from PIL import Image
image = Image.open("path/to/image.jpg")
# Image should be PIL.Image objectSymptoms: Higher memory usage than text-only models
π§ Solutions:
# Use more aggressive memory optimization
model = FastVisionModel.get_peft_model(
model,
r=8, # Lower rank for vision models
finetune_vision_layers=True,
finetune_language_layers=False, # Only vision layers
)
# Smaller batch sizes
SFTConfig(
per_device_train_batch_size=1,
gradient_accumulation_steps=8,
)- Check Error Messages: Copy the full error traceback
- Verify Installation: Ensure all packages are correctly installed
- Test Minimal Example: Try the simplest possible case first
- Check Resources: Monitor GPU memory and usage
Include this information when asking for help:
# System information
import torch
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
print(f"GPU: {torch.cuda.get_device_name(0)}")
print(f"GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.1f}GB")
# Package versions
import subprocess
result = subprocess.run(['pip', 'list'], capture_output=True, text=True)
for package in ['unsloth', 'transformers', 'trl', 'peft']:
if package in result.stdout:
lines = [line for line in result.stdout.split('\n') if package in line]
print(lines[0] if lines else f"{package}: not found")- π Bug Reports: GitHub Issues
- π¬ General Questions: GitHub Discussions
- π Documentation: Check the docs/ folder
- π Community: Join our Discord (link in repo)
| Problem | Quick Fix |
|---|---|
| CUDA OOM | per_device_train_batch_size=1 |
| Slow training | Use Unsloth + fp16=True |
| Bad outputs | Check tokenizer + reduce learning rate |
| Import errors | Restart runtime after installation |
| Version conflicts | Use exact version pins |
| Memory leak | dataloader_pin_memory=False |
| Training divergence | learning_rate=1e-5 + max_grad_norm=1.0 |
Still stuck? β Open an issue with the system information above and detailed error description.