Skip to content

Commit 7fd4dab

Browse files
feat(lfm2_5_vl): add LFM2.5-VL-450M support
Bundle an architecture config for the 450M checkpoint (dim=1024, hidden_dim=4608, same layer_types as 1.6B) and auto-select the right params JSON from --model_dir in export_lfm2_5_vl.py. The existing convert_weights and model.py are already dim-agnostic, so no code changes are needed on the loading path. Verified by exporting lfm2_5_vl_450m_quantized_xnnpack.pte (619MB, fp32 vision encoder + 8da4w decoder + int8 embedding). The 450M shares the same tokenizer, EOS/BOS tokens and chat template as the 1.6B, so the C++ runner needs no changes. Authored with Claude Code. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 341e36e commit 7fd4dab

3 files changed

Lines changed: 92 additions & 20 deletions

File tree

examples/models/lfm2_5_vl/README.md

Lines changed: 17 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,35 +1,46 @@
11
# LFM2.5-VL ExecuTorch Export
22

3-
Export [LiquidAI/LFM2-VL-1.6B](https://huggingface.co/LiquidAI/LFM2-VL-1.6B) to ExecuTorch as a single multi-method PTE compatible with the LLaVA C++ runner.
3+
Export the LFM2.5-VL family to ExecuTorch as a single multi-method PTE compatible with the LLaVA C++ runner. Both checkpoints are supported and share the same export path:
4+
5+
- [LiquidAI/LFM2-VL-1.6B](https://huggingface.co/LiquidAI/LFM2-VL-1.6B) — text dim 2048
6+
- [LiquidAI/LFM2.5-VL-450M](https://huggingface.co/LiquidAI/LFM2.5-VL-450M) — text dim 1024
47

58
LFM2.5-VL is a **hybrid SSM+attention vision-language model** — 16 decoder layers alternating between short convolution blocks and full attention blocks, paired with a SigLIP ViT vision encoder.
69

710
## Architecture
811

9-
Three named methods in one PTE:
12+
Three named methods in one PTE (`D` = text hidden dim: 2048 for 1.6B, 1024 for 450M):
1013

1114
| Method | Input | Output |
1215
|--------|-------|--------|
13-
| `vision_encoder` | `[1, 3, 512, 512]` float32 NCHW pixels [0,255] | `[1, 256, 2048]` float32 |
14-
| `token_embedding` | `[1, seq_len]` int64 token IDs | `[1, seq_len, 2048]` float32 |
15-
| `text_decoder` | `([1, seq_len, 2048]` float32, `[seq_len]` int64) | `[1, 65536]` float32 |
16+
| `vision_encoder` | `[1, 3, 512, 512]` float32 NCHW pixels [0,255] | `[1, 256, D]` float32 |
17+
| `token_embedding` | `[1, seq_len]` int64 token IDs | `[1, seq_len, D]` float32 |
18+
| `text_decoder` | `([1, seq_len, D]` float32, `[seq_len]` int64) | `[1, 65536]` float32 |
1619

1720
## Export
1821

1922
```bash
23+
# 1.6B (default)
2024
python examples/models/lfm2_5_vl/export_lfm2_5_vl.py \
2125
--model_dir LiquidAI/LFM2-VL-1.6B \
2226
--dtype fp32
27+
28+
# 450M — bundled config is auto-selected from --model_dir
29+
python examples/models/lfm2_5_vl/export_lfm2_5_vl.py \
30+
--model_dir LiquidAI/LFM2.5-VL-450M \
31+
--dtype fp32
2332
```
2433

2534
With quantization (8da4w decoder + int8 embedding + float32 vision encoder):
2635

2736
```bash
2837
python examples/models/lfm2_5_vl/export_lfm2_5_vl.py \
29-
--model_dir LiquidAI/LFM2-VL-1.6B \
38+
--model_dir LiquidAI/LFM2.5-VL-450M \
3039
--quantize
3140
```
3241

42+
The bundled architecture configs live in [config/](config/). Pass `--params /path/to/custom.json` to override.
43+
3344
### Required runner configuration
3445

3546
- Resize image to exactly 512×512
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
{
2+
"dim": 1024,
3+
"ffn_dim_multiplier": 1,
4+
"hidden_dim": 4608,
5+
"n_heads": 16,
6+
"n_kv_heads": 8,
7+
"n_layers": 16,
8+
"norm_eps": 1e-5,
9+
"rope_theta": 1000000.0,
10+
"use_scaled_rope": false,
11+
"vocab_size": 65536,
12+
"use_hf_rope": true,
13+
"use_qk_norm": true,
14+
"qk_norm_before_rope": true,
15+
"layer_types": [
16+
"conv",
17+
"conv",
18+
"full_attention",
19+
"conv",
20+
"conv",
21+
"full_attention",
22+
"conv",
23+
"conv",
24+
"full_attention",
25+
"conv",
26+
"full_attention",
27+
"conv",
28+
"full_attention",
29+
"conv",
30+
"full_attention",
31+
"conv"
32+
]
33+
}

examples/models/lfm2_5_vl/export_lfm2_5_vl.py

Lines changed: 42 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -5,21 +5,25 @@
55
# LICENSE file in the root directory of this source tree.
66

77
"""
8-
Export LFM2.5-VL-1.6B as a single multi-method PTE for ExecuTorch's
9-
generic MultimodalRunner (C++ llava_main).
8+
Export LFM2.5-VL as a single multi-method PTE for ExecuTorch's generic
9+
MultimodalRunner (C++ llava_main). Supports both LFM2.5-VL-1.6B (text dim
10+
2048) and LFM2.5-VL-450M (text dim 1024); the architecture config is picked
11+
from the bundled config/ directory based on --model_dir, or you can pass
12+
--params to point at a custom JSON.
1013
11-
Methods:
12-
vision_encoder : [1, 3, 512, 512] f32 NCHW pixels [0,255] -> [1, 256, 2048] f32
13-
token_embedding : [1, seq_len] i64 -> [1, seq_len, 2048] f32
14-
text_decoder : ([1, seq_len, 2048] f32, [seq_len] i64) -> [1, 65536] f32
14+
Methods (D = text hidden dim: 2048 for 1.6B, 1024 for 450M):
15+
vision_encoder : [1, 3, 512, 512] f32 NCHW pixels [0,255] -> [1, 256, D] f32
16+
token_embedding : [1, seq_len] i64 -> [1, seq_len, D] f32
17+
text_decoder : ([1, seq_len, D] f32, [seq_len] i64) -> [1, 65536] f32
1518
1619
Usage:
1720
python examples/models/lfm2_5_vl/export_lfm2_5_vl.py \
18-
--model_dir /path/to/LFM2-VL-1.6B \
21+
--model_dir LiquidAI/LFM2.5-VL-450M \
1922
[--dtype fp32|fp16] [--quantize] [--output lfm2_5_vl_xnnpack.pte]
2023
"""
2124

2225
import logging
26+
import os
2327
from argparse import ArgumentParser, BooleanOptionalAction
2428
from typing import Optional
2529

@@ -66,6 +70,23 @@
6670
FORMAT = "[%(levelname)s %(asctime)s %(filename)s:%(lineno)s] %(message)s"
6771
logging.basicConfig(level=logging.INFO, format=FORMAT)
6872

73+
_CONFIG_DIR = os.path.join(os.path.dirname(__file__), "config")
74+
75+
76+
def _resolve_params_path(model_dir: str, params: Optional[str]) -> Optional[str]:
77+
"""Pick a bundled config based on model_dir if --params was not provided.
78+
79+
Returns None to fall back to model.py's default (1.6B).
80+
"""
81+
if params is not None:
82+
return params
83+
name = model_dir.lower()
84+
if "450m" in name:
85+
return os.path.join(_CONFIG_DIR, "lfm2_5_vl_450m_config.json")
86+
if "1.6b" in name or "1_6b" in name:
87+
return os.path.join(_CONFIG_DIR, "lfm2_5_vl_1_6b_config.json")
88+
return None
89+
6990

7091
class Lfm2p5VlEdgeManager(LLMEdgeManager):
7192
"""LLMEdgeManager subclass for LFM2.5-VL.
@@ -354,11 +375,14 @@ def export_all(
354375

355376

356377
def main():
357-
parser = ArgumentParser(description="Export LFM2.5-VL-1.6B to ExecuTorch")
378+
parser = ArgumentParser(description="Export LFM2.5-VL to ExecuTorch")
358379
parser.add_argument(
359380
"--model_dir",
360381
default="LiquidAI/LFM2-VL-1.6B",
361-
help="HuggingFace model ID or local path",
382+
help=(
383+
"HuggingFace model ID or local path. Supported: "
384+
"LiquidAI/LFM2-VL-1.6B, LiquidAI/LFM2.5-VL-450M."
385+
),
362386
)
363387
parser.add_argument(
364388
"--dtype",
@@ -388,8 +412,8 @@ def main():
388412
"--params",
389413
default=None,
390414
help=(
391-
"Path to model params JSON (architecture config). "
392-
"Defaults to the bundled config/lfm2_5_vl_1_6b_config.json."
415+
"Path to model params JSON (architecture config). When omitted, "
416+
"the bundled 1.6B or 450M config is selected from --model_dir."
393417
),
394418
)
395419
parser.add_argument(
@@ -400,8 +424,12 @@ def main():
400424
args = parser.parse_args()
401425

402426
dtype = DType.fp16 if args.dtype == "fp16" else DType.fp32
403-
suffix = ("_fp16" if dtype == DType.fp16 else "") + (
404-
"_quantized" if args.quantize else ""
427+
params_path = _resolve_params_path(args.model_dir, args.params)
428+
size_tag = "_450m" if (params_path or "").endswith("450m_config.json") else ""
429+
suffix = (
430+
size_tag
431+
+ ("_fp16" if dtype == DType.fp16 else "")
432+
+ ("_quantized" if args.quantize else "")
405433
)
406434
output = args.output or f"lfm2_5_vl{suffix}_xnnpack.pte"
407435

@@ -412,7 +440,7 @@ def main():
412440
args.quantize,
413441
args.max_seq_len,
414442
args.max_context_len,
415-
args.params,
443+
params_path,
416444
)
417445

418446

0 commit comments

Comments
 (0)