ARM2 is an adaptive reasoning model with vision understanding and executable code capabilities. This repository contains the codebase for Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) based on LLaMA-Factory and VeRL.
- Data & Model
- Environment Setup
- Stage 1: Supervised Fine-Tuning (SFT)
- Stage 2: Reinforcement Learning (RL)
- Contact
- Citation
You can download our model from 🤗HuggingFace.
For SFT, please download the images from 🤗HuggingFace.
Note: After downloading, you should adjust the file paths of images in LLaMA-Factory/data/visualwebinstruct_sft.json.
This project requires two separate conda environments for SFT and RL stages.
For easy reproduction, we provide exported environment files:
environment.yml: Complete conda environment export (recommended)requirements.txt: All pip dependenciessetup_verl_env.sh: Automated setup script
# Create conda environment
conda create -n llamafactory python=3.11
conda activate llamafactory
# Install LLaMA-Factory
cd LLaMA-Factory
pip install -e ".[torch,metrics]" --no-build-isolation
pip3 install flash-attn --no-build-isolationWe provide exported environment files for easy reproduction:
# Method A: Using conda environment file (recommended)
conda env create -f environment.yml
conda activate verl
cd verl
pip install -e .
# Method B: Using automated setup script
bash setup_verl_env.sh
# Method C: Manual installation from requirements.txt
conda create -n verl python=3.11
conda activate verl
cd verl
pip install -e .
pip install flash-attn==2.7.4.post1 --no-build-isolation # Install flash-attn separately
pip install -r ../requirements.txtNote: flash-attn may need to be installed separately with --no-build-isolation flag if installation fails.
# Create conda environment
conda create -n verl python=3.11
conda activate verl
# Install VeRL and dependencies
cd verl
pip3 install -e .
pip3 install flash-attn --no-build-isolation
pip3 install fastapi uvicorn openai vllm==0.8.3 numpy<2.0.0
pip install "opentelemetry-api>=1.34.0" "opentelemetry-sdk>=1.34.0" "opentelemetry-exporter-otlp>=1.34.0"Note: The exported environment includes:
- Python 3.11.0
- PyTorch 2.6.0
- vLLM 0.8.3
- Ray 2.43.0
- Transformers 4.57.3
- Flash Attention 2.7.4.post1
- All other dependencies (see
requirements.txtfor full list)
conda activate llamafactory
cd LLaMA-Factoryllamafactory-cli train examples/train_lora/qwen2_5vl_lora_sft.yamlconda activate verl
cd verlYou can find examples in verl/verl/data.
Important: Before running the script, please adjust the paths of policy models and datasets to your own paths.
System Requirements:
- Ensure sufficient system resources (process limits, memory, etc.)
bash verl/verl/scripts/run.shTroubleshooting:
- If you see import errors for
Qwen2_5_VLFlashAttention2, this is expected in newer transformers versions (4.57+) and can be safely ignored - For Ray-related issues, check Ray logs in
/tmp/ray/session_*/logs/
If you have any problems, please contact Jian Xie.
If our paper or related resources prove valuable to your research, we kindly ask for a citation.
@article{xie2025arm2,
title={ARM2: Adaptive Reasoning Model with Vision Understanding and Executable Code},
author={Jian Xie and Zhendong Chu and Aoxiao Zhong and Kai Zhang and Mingzhe Han and Xing Fan and Jialie Shen and Qingsong Wen},
journal={arXiv preprint arXiv:2510.08163},
year={2025}
}