@@ -115,6 +115,55 @@ We train the model on a mixed dataset comprising 740K samples from LLaVA-OneVisi
115115 </picture >
116116</p >
117117
118+ #### Reproducing LMM Probe Results
119+
120+ > [ !NOTE]
121+ > ** To reproduce these results, use the ` llava_next ` folder contents.**
122+
123+ <details >
124+ <summary >Click to expand reproduction instructions</summary >
125+
126+ 1 . ** Navigate to the LLaVA-NeXT directory:**
127+ ``` bash
128+ cd llava_next
129+ ```
130+
131+ 2 . ** Setup the environment:**
132+ ``` bash
133+ # Using Docker (recommended)
134+ docker build -t ov_encoder_llava:26.01 .
135+ docker run -it --gpus all --ipc host --net host --privileged \
136+ -v " $( pwd) " :/workspace/OV-Encoder-Llava \
137+ -w /workspace/OV-Encoder-Llava \
138+ ov_encoder_llava:26.01 bash
139+ ```
140+
141+ 3 . ** Prepare training data:**
142+ - Follow the [ training data preparation guide] ( llava_next/README.md#training-data-preparation ) to convert your video data to codec format
143+ - The training dataset should include:
144+ - 740K samples from LLaVA-OneVision
145+ - 800K samples from LLaVA-Video SFT
146+
147+ 4 . ** Run Stage-2 fine-tuning:**
148+ ``` bash
149+ # Configure the training script with your data paths
150+ bash scripts/sft_ov_encoder.sh
151+ ```
152+
153+ 5 . ** Evaluate the model:**
154+ ``` bash
155+ # For video benchmarks
156+ bash scripts/precompute_codec_patch/preprocess_video_benchmark.sh videomme
157+ TASKS=" videomme" bash scripts/eval/eval_ov_encoder.sh
158+
159+ # For image benchmarks
160+ TASKS=" ai2d,chartqa,docvqa_val" bash scripts/eval/eval_ov_encoder.sh
161+ ```
162+
163+ For detailed documentation on training data format, evaluation setup, and troubleshooting, refer to the [ LLaVA-NeXT README] ( llava_next/README.md ) .
164+
165+ </details >
166+
118167## ⚡ Quick Start
119168
120169> [ !IMPORTANT]
0 commit comments