ARM2: Adaptive Reasoning Model with Vision Understanding and Executable Code

Overview

ARM2 is an adaptive reasoning model with vision understanding and executable code capabilities. This repository contains the codebase for Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) based on LLaMA-Factory and VeRL.

Data & Model

Model Download

You can download our model from 🤗HuggingFace.

Dataset Download

For SFT, please download the images from 🤗HuggingFace.

Note: After downloading, you should adjust the file paths of images in LLaMA-Factory/data/visualwebinstruct_sft.json.

Environment Setup

This project requires two separate conda environments for SFT and RL stages.

Environment Files

For easy reproduction, we provide exported environment files:

environment.yml: Complete conda environment export (recommended)
requirements.txt: All pip dependencies
setup_verl_env.sh: Automated setup script

SFT Environment Setup

# Create conda environment
conda create -n llamafactory python=3.11
conda activate llamafactory

# Install LLaMA-Factory
cd LLaMA-Factory
pip install -e ".[torch,metrics]" --no-build-isolation
pip3 install flash-attn --no-build-isolation

RL Environment Setup

Option 1: Quick Setup (Recommended - Using Exported Environment)

We provide exported environment files for easy reproduction:

# Method A: Using conda environment file (recommended)
conda env create -f environment.yml
conda activate verl
cd verl
pip install -e .

# Method B: Using automated setup script
bash setup_verl_env.sh

# Method C: Manual installation from requirements.txt
conda create -n verl python=3.11
conda activate verl
cd verl
pip install -e .
pip install flash-attn==2.7.4.post1 --no-build-isolation  # Install flash-attn separately
pip install -r ../requirements.txt

Note: flash-attn may need to be installed separately with --no-build-isolation flag if installation fails.

Option 2: Manual Setup

# Create conda environment
conda create -n verl python=3.11
conda activate verl

# Install VeRL and dependencies
cd verl
pip3 install -e .
pip3 install flash-attn --no-build-isolation
pip3 install fastapi uvicorn openai vllm==0.8.3 numpy<2.0.0
pip install "opentelemetry-api>=1.34.0" "opentelemetry-sdk>=1.34.0" "opentelemetry-exporter-otlp>=1.34.0"

Note: The exported environment includes:

Python 3.11.0
PyTorch 2.6.0
vLLM 0.8.3
Ray 2.43.0
Transformers 4.57.3
Flash Attention 2.7.4.post1
All other dependencies (see requirements.txt for full list)

Stage 1: Supervised Fine-Tuning (SFT)

Activate Environment

conda activate llamafactory
cd LLaMA-Factory

Training

llamafactory-cli train examples/train_lora/qwen2_5vl_lora_sft.yaml

Stage 2: Reinforcement Learning (RL)

Activate Environment

conda activate verl
cd verl

Data Processing

You can find examples in verl/verl/data.

Training

Important: Before running the script, please adjust the paths of policy models and datasets to your own paths.

System Requirements:

Ensure sufficient system resources (process limits, memory, etc.)

bash verl/verl/scripts/run.sh

Troubleshooting:

If you see import errors for Qwen2_5_VLFlashAttention2, this is expected in newer transformers versions (4.57+) and can be safely ignored
For Ray-related issues, check Ray logs in /tmp/ray/session_*/logs/

Contact

If you have any problems, please contact Jian Xie.

Citation

If our paper or related resources prove valuable to your research, we kindly ask for a citation.

@article{xie2025arm2,
  title={ARM2: Adaptive Reasoning Model with Vision Understanding and Executable Code},
  author={Jian Xie and Zhendong Chu and Aoxiao Zhong and Kai Zhang and Mingzhe Han and Xing Fan and Jialie Shen and Qingsong Wen},
  journal={arXiv preprint arXiv:2510.08163},
  year={2025}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ARM2: Adaptive Reasoning Model with Vision Understanding and Executable Code

Overview

Table of Contents

Data & Model

Model Download

Dataset Download

Environment Setup

Environment Files

SFT Environment Setup

RL Environment Setup

Option 1: Quick Setup (Recommended - Using Exported Environment)

Option 2: Manual Setup

Stage 1: Supervised Fine-Tuning (SFT)

Activate Environment

Training

Stage 2: Reinforcement Learning (RL)

Activate Environment

Data Processing

Training

Contact

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
LLaMA-Factory		LLaMA-Factory
verl		verl
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt
setup_verl_env.sh		setup_verl_env.sh

Folders and files

Latest commit

History

Repository files navigation

ARM2: Adaptive Reasoning Model with Vision Understanding and Executable Code

Overview

Table of Contents

Data & Model

Model Download

Dataset Download

Environment Setup

Environment Files

SFT Environment Setup

RL Environment Setup

Option 1: Quick Setup (Recommended - Using Exported Environment)

Option 2: Manual Setup

Stage 1: Supervised Fine-Tuning (SFT)

Activate Environment

Training

Stage 2: Reinforcement Learning (RL)

Activate Environment

Data Processing

Training

Contact

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages