Skip to content

Commit 4f4f94d

Browse files
Update README.md
1 parent c5853e6 commit 4f4f94d

1 file changed

Lines changed: 6 additions & 96 deletions

File tree

README.md

Lines changed: 6 additions & 96 deletions
Original file line numberDiff line numberDiff line change
@@ -322,111 +322,21 @@ bash shells_eval_ap/eval_ov_encoder_large_16frames.sh
322322

323323
#### OV-Encoder Codec Evaluation
324324

325-
We release the attentive probing artifacts for our codec-based model across multiple video understanding benchmarks. For each dataset, we provide the codec-derived patch indices, training logs, model checkpoints, and final evaluation results.
326-
327-
> 📦 **Artifacts Repository**: All codec evaluation artifacts (codec indices, training logs, and checkpoints) are available on HuggingFace:
328-
> 🤗 [lmms-lab-encoder/onevision-encoder-codec-eval](https://huggingface.co/datasets/lmms-lab-encoder/onevision-encoder-codec-eval)
329-
330-
**Evaluation Results Summary:**
331-
332-
| Dataset | SSv2 | Diving48 | Perception Test | CharadesEgo | Epic-Verb | Epic-Noun | Kinetics-400 | HMDB51 |
333-
|---------|------|----------|-----------------|-------------|-----------|-----------|--------------|--------|
334-
| **Accuracy** | 58.4% | 66.0% | 59.7% | 12.1% | 62.2% | 53.9% | 84.3% | 83.5% |
335-
336-
**Available Artifacts per Dataset:**
337-
338-
For each dataset listed above, we provide:
339-
- **Codec Index** (`*.json`): Pre-computed temporally salient patch indices for efficient codec-style evaluation
340-
- **Training Logs** (`*.log`): Complete training logs including loss curves and intermediate metrics
341-
- **Checkpoints** (`*.pt`): Fine-tuned attentive probe checkpoints for each dataset
342-
343-
To download artifacts for a specific dataset:
325+
To evaluate the encoder with codec-style patch selection, first navigate to the evaluation directory:
344326

345327
```bash
346-
# Install huggingface-hub if needed
347-
pip install huggingface-hub
348-
349-
# Download all codec indices
350-
huggingface-cli download lmms-lab-encoder/onevision-encoder-codec-eval \
351-
--repo-type dataset \
352-
--include "codec_index/*" \
353-
--local-dir ./codec_artifacts
354-
355-
# Download logs for a specific dataset (e.g., ssv2)
356-
huggingface-cli download lmms-lab-encoder/onevision-encoder-codec-eval \
357-
--repo-type dataset \
358-
--include "logs/ssv2.log" \
359-
--local-dir ./codec_artifacts
360-
361-
# Download checkpoint for a specific dataset (e.g., ssv2)
362-
huggingface-cli download lmms-lab-encoder/onevision-encoder-codec-eval \
363-
--repo-type dataset \
364-
--include "checkpoints/ssv2.pt" \
365-
--local-dir ./codec_artifacts
328+
cd eval_encoder
366329
```
367330

368-
Available datasets: ssv2, diving48, perception_test, charadesego, epic_verb, epic_noun, k400, hmdb51
369-
370-
371-
**Running Codec-Style Evaluation:**
372-
373-
To evaluate the encoder with codec-style patch selection:
331+
Then run the following command:
374332

375333
```bash
376-
# Navigate to evaluation directory
377-
cd eval_encoder
378-
379-
# Run codec evaluation with 2K patches (recommended)
380-
# First positional argument is the model weight path
381-
bash shells_eval_ap/eval_ov_encoder_large_2kpatches_codec.sh lmms-lab-encoder/onevision-encoder-large
382-
383-
# Or with 4K patches for higher quality
384-
bash shells_eval_ap/eval_ov_encoder_large_4kpatches_codec.sh lmms-lab-encoder/onevision-encoder-large
334+
bash shells_eval_ap/eval_ov_encoder_large_2kpatches_codec.sh
385335
```
386336

387-
**Note:**
388-
- The model weight path is a required positional argument. You can use `lmms-lab-encoder/onevision-encoder-large` to load directly from HuggingFace, or provide a local path to your model checkpoint.
389-
- The evaluation scripts are configured for 8 GPUs by default. Adjust `CUDA_VISIBLE_DEVICES` in the shell script if you have a different GPU configuration.
390-
391337
**Codec-Specific Parameters:**
392-
- `K_keep`: Number of patches to keep (e.g., 2048 for 2K patches, 4096 for 4K patches)
393-
- `num_frames`: Total number of frames in the video sequence (typically 64 for codec evaluation)
394-
- `frames_token_num`: Number of tokens per frame (e.g., 256 tokens)
395-
- `cache_dir` (optional): Directory for cached codec patches. Use this to specify where codec-selected patches are stored/loaded when you want to persist or reuse them
396-
397-
**Using Pre-computed Codec Indices:**
398-
399-
To reproduce our exact results using the pre-computed codec indices:
400-
401-
```bash
402-
# Download the codec indices from HuggingFace
403-
huggingface-cli download lmms-lab-encoder/onevision-encoder-codec-eval \
404-
--repo-type dataset \
405-
--include "codec_index/*" \
406-
--local-dir ./codec_artifacts
407-
408-
# Run evaluation with the cache directory by running the Python script directly
409-
# Example for SSv2 dataset with 2K patches (requires 8 GPUs)
410-
cd eval_encoder
411-
torchrun --nproc_per_node 8 --master_port 15555 \
412-
attentive_probe_codec.py \
413-
--model_family ov_encoder_codec \
414-
--model_name ov_encoder_large \
415-
--model_weight lmms-lab-encoder/onevision-encoder-large \
416-
--dataset ssv2 \
417-
--num_frames 64 \
418-
--frames_token_num 256 \
419-
--embedding_size 1024 \
420-
--K_keep 2048 \
421-
--batch_size 4 \
422-
--default_lr_list 0.0001 \
423-
--default_epoch 10 \
424-
--default_weight_decay 0 \
425-
--cache_dir ../codec_artifacts/codec_index \
426-
--save_report ./results/ssv2
427-
428-
# Note: Adjust --nproc_per_node based on your available GPUs
429-
```
338+
- `K_keep`: Number of patches to keep.
339+
- `cache_dir` (optional): Directory for cached codec patches. Use this to specify where codec-selected patches are stored/loaded when you want to persist or reuse them.
430340

431341
#### Shared Parameters
432342

0 commit comments

Comments
 (0)