Commit cdc7456
llava-next-video (with codec)
feat: support OneVision, Qwen3-VL, and SigLip2 in LLaVA-Next
- Add OneVision, SigLip2 NaFlex, and Qwen3-VL vision encoders.
- Support Qwen3 LLM backbone and ViT weight extraction.
- Implement codec-based patch selection with stage 1/2 training scripts.
- Integrate lmms-eval framework and offline codec-patch precomputing.
- Support compressed video/image processing and multi-task evaluation.
- Update Docker setup (multi-node SSH) and Quick Start documentation.
Co-authored-by: YunyaoYan <YunyaoYan@users.noreply.github.com>1 parent 7a8ed00 commit cdc7456
2,316 files changed
Lines changed: 237781 additions & 223 deletions
File tree
- llava_next
- Compressed_Video_Reader
- ffmpeg
- ffmpeg_patch
- src/cv_reader
- tool
- examples/training_data_demo
- output
- stage1
- stage2
- sample_XwB-dkS5BVM__ytb_XwB-dkS5BVM__58574aa9
- sample_YVQwAEKZpaU__ytb_YVQwAEKZpaU__22bd1705
- videos
- llava
- model
- language_model
- multimodal_encoder
- multimodal_projector
- multimodal_resampler
- train
- lmms-eval
- .github
- workflows
- cicd
- docs
- i18n
- examples
- chat_templates
- mcp_server
- models
- lmms_eval
- api
- baselines
- caching
- entrypoints
- filters
- llm_judge
- launcher
- providers
- loggers
- mcp
- models
- chat
- model_utils
- qwen
- thyme
- simple
- tasks
- FALCONBench
- VisualPuzzles
- _task_utils
- activitynetqa
- ai2d
- reasoning
- aime
- reasoning
- air_bench
- alpaca_audio
- amber_g
- arc
- av_odyssey
- av_speakerbench
- babyvision_gen
- babyvision
- blink
- camerabench_vqa
- capability
- captionqa
- charades_sta
- chartqa
- charxiv
- reasoning
- cinepile
- clotho_aqa
- cmmmu
- coco_cap_chair
- coco_cap
- common_voice_15
- conbench
- covost2
- csbench
- cuva
- cv_bench
- cvrr
- detailcaps
- docvqa
- dtcbench
- dynamath/reasoning
- egoplan
- egoschema
- egothink
- embspatial
- emma
- erqa
- ferret
- fleurs
- flickr30k
- funqa
- gedit_bench
- viescore
- gigaspeech
- whisper_normalizer
- gpqa
- cot_n_shot
- cot_zeroshot
- generative
- n_shot
- openai
- reasoning
- zeroshot
- gqa_ru
- gqa
- groundingme
- gsm8k
- hallusion_bench
- hellaswag
- hrbench
- iconqa
- ifeval
- ii_bench
- illusionvqa
- imgedit
- infovqa
- internal_eval
- jmmmu_pro
- jmmmu
- k12
- kris_bench
- lemonade
- librispeech
- whisper_normalizer
- live_bench
- livexiv_tqa
- livexiv_vqa
- llava-bench-coco
- llava-in-the-wild
- llava_interleave_bench
- llava_wilder
- logicvista/reasoning
- longtimescope
- longvideobench
- no_visual
- random_choice
- longvt
- no_visual
- lsdbench
- lvbench
- no_visual
- random_choice
- mantis
- mathverse
- reasoning
- mathvision
- reasoning
- mathvista
- reasoning
- medqa
- megabench
- breakdown
- metrics
- aggregation
- parsing
- common
- scoring
- common
- mia_bench
- mindcube
- mirb
- mix_evals
- audio2text
- image2text
- video2text
- mlvu
- mmau
- mmbench
- en_reasoning
- mme_cot
- mme_realworld
- mme_sci_image
- mme_sci
- mme
- mmlu_pro
- mmlu
- continuation
- default
- flan_cot_fewshot
- flan_cot_zeroshot
- flan_n_shot
- generative
- loglikelihood
- generative
- mmmu_pro
- reasoning
- mmmu
- reasoning
- mmrefine
- mmsearch
- prompts
- retrieve_content
- tokenization
- score
- utils
- mmsi_bench
- mmstar
- mmt
- mmupd
- mmvetv2
- mmvet
- mmvu
- mmworld
- moviechat
- muchomusic
- muirbench
- multidocvqa
- multilingual-llava-bench-in-the-wild
- multimodal_rewardbench
- mvbench
- naturalbench
- nextqa
- nocaps
- ocrbench_v2
- reasoning
- spotting_eval
- ocrbench
- ok_vqa
- olympiadbench_mimo
- reasoning
- olympiadbench
- omni_bench
- omnispatial
- open_asr
- openai_math
- openhermes
- openslr_librispeech
- ovobench
- ovobench_data
- backward
- forward
- realtime
- score_utils
- people_speech
- perceptiontest
- test
- val
- phyx
- reasoning
- plm_videobench
- fgqa
- rcap
- rdcap
- rtloc
- sgqa
- pope
- qbench
- realworldqa
- reasoning
- refcoco+
- refcocog
- refcoco
- refspatial
- salbench
- scibench
- scienceqa
- scivideobench
- screenspot
- seedbench_2_plus
- seedbench_2
- seedbench
- seephys
- sitebench
- multi_image_input
- snsbench
- sparbench
- spatialtreebench
- metrics
- mindcube_cogmap
- src
- evaluation
- cogmap
- core
- utils
- spatialviz
- stare
- step2_audio_paralinguistic
- structeditbench
- stvqa
- super_gpqa
- synthdog
- tedlium
- tempcompass
- temporalbench
- textcaps
- textvqa
- timescope
- tomato
- ueval
- vatex
- vcr_wiki
- vdc
- vibe_eval
- video-tt
- video_detail_description
- videochatgpt
- videoevalpro
- videomathqa
- videomme
- convert_mcq_oe
- gt_none_option
- no_visual
- number_option
- random_choice
- revert_oe_mcq
- video_only_abcd
- videommmu
- gt_none_option
- no_visual
- number_option
- random_choice
- viewspatial
- vinoground
- visualwebbench
- vitatecs
- vizwiz_vqa
- vl_rewardbench
- vlms_are_biased
- vlmsareblind
- vmcbench
- vocalsound
- voicebench
- instruction_following_eval
- vqav2
- vsibench
- multi_image_input
- vstar_bench
- wavcaps
- websrc
- wemath
- reasoning
- wenet_speech
- where2place
- wild_vision_bench
- worldqa
- worldsense
- xlrs
- youcook2
- tui
- web
- dist
- src
- miscs
- test/eval
- qwen2_5_vl
- tools
- scripts
- eval
- precompute_codec_patch
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
107 | 107 | | |
108 | 108 | | |
109 | 109 | | |
| 110 | + | |
110 | 111 | | |
111 | 112 | | |
112 | 113 | | |
| |||
369 | 370 | | |
370 | 371 | | |
371 | 372 | | |
| 373 | + | |
372 | 374 | | |
373 | 375 | | |
374 | 376 | | |
| |||
506 | 508 | | |
507 | 509 | | |
508 | 510 | | |
| 511 | + | |
| 512 | + | |
| 513 | + | |
| 514 | + | |
| 515 | + | |
| 516 | + | |
| 517 | + | |
| 518 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
0 commit comments