Skip to content

Latest commit

 

History

History
76 lines (57 loc) · 1.23 KB

File metadata and controls

76 lines (57 loc) · 1.23 KB

Model API

Model

.. automodule:: quantllm.model.model
   :members:
   :undoc-members:
   :show-inheritance:

Model Configuration

.. automodule:: quantllm.model.lora_config
   :members:
   :undoc-members:
   :show-inheritance:

Example Usage

Basic Usage

from quantllm import Model, ModelConfig

# Configure model
config = ModelConfig(
    model_name="facebook/opt-125m",
    load_in_4bit=True
)

# Load model
model = Model(config)
model_instance = model.get_model()

With LoRA

config = ModelConfig(
    model_name="facebook/opt-125m",
    load_in_4bit=True,
    use_lora=True
)
model = Model(config)

CPU Offloading

config = ModelConfig(
    model_name="facebook/opt-125m",
    cpu_offload=True
)
model = Model(config)

Advanced Configuration

config = ModelConfig(
    model_name="facebook/opt-125m",
    load_in_4bit=True,
    use_lora=True,
    gradient_checkpointing=True,
    bf16=True,
    trust_remote_code=True
)
model = Model(config)