Skip to content

Commit c32c63d

Browse files
Merge pull request #27 from codewithdark-git/copilot/add-flexible-model-registration
Add architecture registration + fallback loading path for newly released HF model types
2 parents 4828488 + c752dfa commit c32c63d

6 files changed

Lines changed: 558 additions & 5 deletions

File tree

README.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -348,6 +348,14 @@ pytest
348348
- 📚 Documentation
349349
- 🐛 Bug fixes
350350

351+
**Quick template for new architecture support:**
352+
```python
353+
from quantllm import register_architecture, turbo
354+
355+
register_architecture("new-arch", base_model_type="llama")
356+
model = turbo("org/new-arch-7b", base_model_fallback=True, trust_remote_code=True)
357+
```
358+
351359
---
352360

353361
## 📜 License

docs/guide/loading-models.md

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,59 @@ model = turbo(
7474
)
7575
```
7676

77+
### New Architecture Fallbacks (for very recent model releases)
78+
79+
If `transformers` does not recognize a just-released architecture yet, register a fallback family:
80+
81+
```python
82+
from quantllm import turbo, register_architecture
83+
84+
# Map new architecture/model_type to a compatible base family
85+
register_architecture("newmodel", base_model_type="llama")
86+
87+
model = turbo(
88+
"new-model-org/NewModel-7B",
89+
model_type_override="llama", # optional explicit override
90+
base_model_fallback=True, # enabled by default; can be disabled
91+
trust_remote_code=True,
92+
)
93+
```
94+
95+
> ⚠️ **Security note:** `trust_remote_code=True` executes model-provided code.
96+
> Only enable it for trusted publishers, especially when loading unregistered or very new architectures.
97+
98+
You can also load from config only (no checkpoint weights) while waiting for upstream support:
99+
100+
```python
101+
model = turbo(
102+
"new-model-org/NewModel-7B",
103+
from_config_only=True,
104+
trust_remote_code=True,
105+
)
106+
```
107+
108+
#### Fast contribution template for new architectures
109+
110+
1. Add a registration in your code or PR:
111+
- `register_architecture("new-arch", base_model_type="llama")`
112+
2. Validate loading with:
113+
- `turbo("org/model", base_model_fallback=True, trust_remote_code=True)`
114+
3. Add/extend a focused test in `tests/test_architecture_fallback.py`.
115+
116+
#### Real-world style "released yesterday" example
117+
118+
```python
119+
from quantllm import turbo, register_architecture
120+
121+
# Example: transformers doesn't recognize Qwen3 yet
122+
register_architecture("qwen3", base_model_type="qwen2")
123+
124+
model = turbo(
125+
"Qwen/Qwen3-8B",
126+
trust_remote_code=True,
127+
)
128+
```
129+
77130
### Memory Options
78131

79132
```python

quantllm/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@
3535
from .core import (
3636
turbo,
3737
TurboModel,
38+
register_architecture,
3839
SmartConfig,
3940
HardwareProfiler,
4041
ModelAnalyzer,
@@ -117,6 +118,7 @@ def show_banner(force: bool = False):
117118
# Main API
118119
"turbo",
119120
"TurboModel",
121+
"register_architecture",
120122
"SmartConfig",
121123
"HardwareProfiler",
122124
"ModelAnalyzer",

quantllm/core/__init__.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
from .hardware import HardwareProfiler
99
from .smart_config import SmartConfig
1010
from .model_analyzer import ModelAnalyzer
11-
from .turbo_model import TurboModel, turbo
11+
from .turbo_model import TurboModel, turbo, register_architecture
1212
from .compilation import (
1313
compile_model,
1414
compile_for_inference,
@@ -51,6 +51,7 @@
5151
"ModelAnalyzer",
5252
"TurboModel",
5353
"turbo",
54+
"register_architecture",
5455
# Compilation
5556
"compile_model",
5657
"compile_for_inference",

0 commit comments

Comments
 (0)