Skip to content

Commit 0953409

Browse files
committed
Update
1 parent 7a700de commit 0953409

2 files changed

Lines changed: 20 additions & 28 deletions

File tree

README.md

Lines changed: 7 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -194,7 +194,7 @@ with torch.no_grad():
194194

195195
</details>
196196

197-
### Loading from Source Code
197+
### Loading from Source Code
198198

199199
<details>
200200
<summary>Click to expand installation and usage code</summary>
@@ -274,17 +274,6 @@ pip install -e .
274274

275275
</details>
276276

277-
### Single Node Dry Run To Test Setup
278-
279-
<details>
280-
<summary>Click to expand dry run command</summary>
281-
282-
```bash
283-
bash shells/ov_encoder_base_stage1_si_dry_run.sh
284-
```
285-
286-
</details>
287-
288277
### Single Node Stage-1 Single Image
289278

290279
<details>
@@ -329,6 +318,8 @@ To evaluate the OneVision Encoder as a vision backbone for LLaVA-NeXT multimodal
329318

330319
Navigate to the llava_next directory and follow the setup instructions:
331320

321+
For more details, refer to the [LLaVA-NeXT documentation](llava_next/README.md).
322+
332323
<details>
333324
<summary>Click to expand LLaVA-NeXT evaluation setup</summary>
334325

@@ -345,7 +336,9 @@ docker run -it --gpus all --ipc host --net host --privileged \
345336

346337
</details>
347338

348-
#### Running Evaluation
339+
#### LLaVA-NeXT-Video Evaluation
340+
341+
349342

350343
For image benchmarks (ChartQA, DocVQA, AI2D, OCRBench, etc.):
351344

@@ -374,7 +367,7 @@ TASKS="videomme" bash scripts/eval/eval_ov_encoder.sh
374367

375368
</details>
376369

377-
For more details, refer to the [LLaVA-NeXT documentation](llava_next/README.md).
370+
378371

379372
### Attentive Probe Evaluation
380373

@@ -398,9 +391,6 @@ bash shells_eval_ap/eval_ov_encoder_large_16frames.sh
398391

399392
</details>
400393

401-
**Sampling-Specific Parameters:**
402-
403-
- `frames_token_num`: Number of tokens per frame (e.g., 256 tokens for standard sampling).
404394

405395
#### OV-Encoder Codec Evaluation
406396

@@ -427,11 +417,6 @@ bash shells_eval_ap/eval_ov_encoder_large_2kpatches_codec.sh
427417

428418
---
429419

430-
## 📄 License
431-
432-
This project is released under the Apache 2.0 License.
433-
434-
435420

436421
## 🔗 Related Projects
437422

eval_encoder/attentive_probe.py

Lines changed: 13 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ def parse_args() -> argparse.Namespace:
5656
parser.add_argument("--smoothing", type=float, default=0.1)
5757
parser.add_argument("--print_freq", type=int, default=10)
5858
parser.add_argument("--eval_freq", type=int, default=1)
59-
parser.add_argument("--frames_token_num", type=int, default=196)
59+
parser.add_argument("--frames_token_num", type=int, default=256)
6060

6161
# Dataloader
6262
parser.add_argument("--dali_num_threads", type=int, default=2)
@@ -454,11 +454,18 @@ def evaluate(
454454

455455
def get_model(args: argparse.Namespace) -> nn.Module:
456456
if args.model_name == "ov_encoder_large":
457-
model = AutoModel.from_pretrained(
458-
"lmms-lab-encoder/onevision-encoder-large", trust_remote_code=True, attn_implementation="flash_attention_2"
459-
)
460-
model = torch.compile(model)
461-
return model
457+
if os.path.isdir(args.model_weight):
458+
from onevision_encoder.modeling_onevision_encoder import OneVisionEncoderModel
459+
model = OneVisionEncoderModel.from_pretrained(
460+
args.model_weight, trust_remote_code=True, attn_implementation="flash_attention_2"
461+
)
462+
return model
463+
else:
464+
model = AutoModel.from_pretrained(
465+
"lmms-lab-encoder/onevision-encoder-large", trust_remote_code=True, attn_implementation="flash_attention_2"
466+
)
467+
model = torch.compile(model)
468+
return model
462469

463470
model = create_model(args.model_name, pretrained=False)
464471
if args.model_family in ["chunk_wise_sampling"]:

0 commit comments

Comments
 (0)