@@ -233,15 +233,23 @@ You can set up the environment using **one of the following two methods**:
233233
234234### Option 1 (Conda + Pip)
235235
236+ <details >
237+ <summary >Click to expand setup commands</summary >
238+
236239``` bash
237240conda env create -f environment.yml -n ov_encoder
238241pip install torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0 --index-url https://download.pytorch.org/whl/cu118
239242pip install --extra-index-url https://pypi.nvidia.com --upgrade nvidia-dali-cuda110
240243pip install -r requirements.txt
241244```
242245
246+ </details >
247+
243248### Option 2 (Docker)
244249
250+ <details >
251+ <summary >Click to expand Docker commands</summary >
252+
245253``` bash
246254docker build -t onevision-encoder:2601 .
247255
@@ -251,30 +259,50 @@ docker run -it --rm --gpus all --ipc host --net host --privileged \
251259 onevision-encoder:2601 bash
252260```
253261
262+ </details >
263+
254264### Install Package
255265
256266Inside the container, install the package in editable mode:
257267
268+ <details >
269+ <summary >Click to expand install command</summary >
270+
258271``` bash
259272pip install -e .
260273```
261274
275+ </details >
276+
262277### Single Node Dry Run To Test Setup
263278
279+ <details >
280+ <summary >Click to expand dry run command</summary >
281+
264282``` bash
265283bash shells/ov_encoder_base_stage1_si_dry_run.sh
266284```
267285
286+ </details >
287+
268288### Single Node Stage-1 Single Image
269289
290+ <details >
291+ <summary >Click to expand training command</summary >
292+
270293``` bash
271294bash shells/ov_encoder_base_stage1_si.sh
272295```
273296
297+ </details >
298+
274299### Single Node Stage-2 Video Contine Pretraining
275300
276301Download the Stage-1 checkpoint from HuggingFace:
277302
303+ <details >
304+ <summary >Click to expand download and training commands</summary >
305+
278306``` bash
279307git clone https://huggingface.co/lmms-lab-encoder/onevision-encoder-large-si
280308```
@@ -287,6 +315,8 @@ More documentation will be added soon.
287315bash shells/ov_encoder_large_stage2_residual_8gpus.sh
288316```
289317
318+ </details >
319+
290320Training configurations and hyperparameters will be documented soon. For now, please refer to ` --help ` for available options.
291321
292322## 📊 Evaluation
@@ -297,6 +327,9 @@ Training configurations and hyperparameters will be documented soon. For now, pl
297327
298328To evaluate the encoder with uniform frame sampling, first navigate to the evaluation directory:
299329
330+ <details >
331+ <summary >Click to expand evaluation commands</summary >
332+
300333``` bash
301334pip install -e .
302335cd eval_encoder
@@ -308,6 +341,8 @@ Then run the following command:
308341bash shells_eval_ap/eval_ov_encoder_large_16frames.sh
309342```
310343
344+ </details >
345+
311346** Sampling-Specific Parameters:**
312347
313348- ` frames_token_num ` : Number of tokens per frame (e.g., 256 tokens for standard sampling).
@@ -316,6 +351,9 @@ bash shells_eval_ap/eval_ov_encoder_large_16frames.sh
316351
317352To evaluate the encoder with codec-style patch selection, first navigate to the evaluation directory:
318353
354+ <details >
355+ <summary >Click to expand codec evaluation commands</summary >
356+
319357``` bash
320358cd eval_encoder
321359```
@@ -326,6 +364,8 @@ Then run the following command:
326364bash shells_eval_ap/eval_ov_encoder_large_2kpatches_codec.sh
327365```
328366
367+ </details >
368+
329369## 👥 Contributors
330370
331371<!-- Add contributor list here -->
0 commit comments