Skip to content

Commit a06c949

Browse files
authored
Update README.md
1 parent e08ee8d commit a06c949

1 file changed

Lines changed: 51 additions & 6 deletions

File tree

README.md

Lines changed: 51 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -265,16 +265,61 @@ torchrun --nproc_per_node=8 --master_port=29512 attentive_prob_codec.py \
265265
--model_name hf_llava_vit_large_ln \
266266
--embedding_size 1024 \
267267
--default_epoch 30 \
268-
--data_root /data_3/data_attentive_probe/ \
269-
--cache_dir /data_3/data_attentive_probe/diving48_hevc/cache_residuals/ \
268+
--data_root /path/to/your/data_attentive_probe/ \
269+
--cache_dir /path/to/your/cache_residuals/ \
270270
--K_keep 2048 \
271271
--mv_compensate median
272272
```
273273

274-
**Parameter Notes:**
275-
- `K_keep`: Number of patches to keep. For example, 256 patches per frame × 8 frames = 2048 total patches.
276-
- `model_weight`: Path to your pre-trained model weights. Set this to your own model path.
277-
- `cache_dir`: Directory for cached codec patches. Set this to your own cache directory path.
274+
**Codec-Specific Parameters:**
275+
- `cache_dir`: Directory for cached codec patches. This is where the codec-selected patches will be stored/loaded.
276+
- `K_keep`: Number of patches to keep. For example, 256 patches per frame × 8 frames = 2048 total patches. Adjust based on your frame count and desired compression ratio.
277+
- `mv_compensate`: Motion vector compensation method (e.g., `median`).
278+
279+
#### Sampling Evaluation
280+
281+
To evaluate the encoder with uniform frame sampling, first navigate to the evaluation directory:
282+
283+
```bash
284+
cd eval_encoder
285+
```
286+
287+
Then run the following command:
288+
289+
```bash
290+
torchrun --nproc_per_node=8 --master_port=29507 attentive_probe.py \
291+
--eval_freq 1 \
292+
--default_lr_list 0.0001 \
293+
--batch_size 32 \
294+
--default_weight_decay 0 \
295+
--dali_py_num_workers 8 \
296+
--model_family llava_vit_sampling \
297+
--dataset diving48 \
298+
--num_frames 8 \
299+
--model_weight lmms-lab/onevision-encoder-large \
300+
--model_name hf_llava_vit_large_ln \
301+
--embedding_size 1024 \
302+
--frames_token_num 256
303+
```
304+
305+
**Sampling-Specific Parameters:**
306+
- `frames_token_num`: Number of tokens per frame (e.g., 256 tokens for standard sampling).
307+
308+
#### Shared Parameters
309+
310+
The following parameters are common to both evaluation methods:
311+
312+
- `dataset`: Dataset to evaluate on (e.g., `diving48`, `ssv2`, `kinetics400`). Prepare the dataset according to the Attentive Probe format.
313+
- `num_frames`: Total number of frames in the video sequence (e.g., 8 for sampling, 64 for codec).
314+
- `model_weight`: Path to the pre-trained model. Use `lmms-lab/onevision-encoder-large` to load directly from HuggingFace, or provide a local path.
315+
- `model_name`: Model architecture name (e.g., `hf_llava_vit_large_ln`).
316+
- `embedding_size`: Size of the embedding dimension (e.g., 1024).
317+
- `batch_size`: Training batch size (varies by evaluation type).
318+
- `default_lr_list`: Learning rate for the probe training.
319+
- `default_weight_decay`: Weight decay for optimization.
320+
- `eval_freq`: Evaluation frequency during training.
321+
- `dali_py_num_workers`: Number of DALI data loading workers.
322+
- `data_root`: Root directory containing your prepared dataset (codec evaluation only).
278323

279324

280325

0 commit comments

Comments
 (0)