Skip to content

Commit cdc7456

Browse files
yiyexyYunyaoYan
andauthored
llava-next-video (with codec)
feat: support OneVision, Qwen3-VL, and SigLip2 in LLaVA-Next - Add OneVision, SigLip2 NaFlex, and Qwen3-VL vision encoders. - Support Qwen3 LLM backbone and ViT weight extraction. - Implement codec-based patch selection with stage 1/2 training scripts. - Integrate lmms-eval framework and offline codec-patch precomputing. - Support compressed video/image processing and multi-task evaluation. - Update Docker setup (multi-node SSH) and Quick Start documentation. Co-authored-by: YunyaoYan <YunyaoYan@users.noreply.github.com>
1 parent 7a8ed00 commit cdc7456

2,316 files changed

Lines changed: 237781 additions & 223 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.gitignore

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -107,6 +107,7 @@ secret/
107107
*.log
108108
log/
109109
logs/
110+
eval_log/
110111
*.pid
111112
*.pid.lock
112113
*.seed
@@ -369,6 +370,7 @@ autogen/
369370
.openapi/
370371
openapi_generated/
371372
swagger_generated/
373+
.huggingface_cache/
372374

373375
########################################
374376
# Distributed / cluster training logs
@@ -506,3 +508,11 @@ ckpts
506508
.gitginore
507509

508510
_codeql*
511+
512+
# ===========================================
513+
# Allow example training data demo files
514+
# ===========================================
515+
!llava_next/examples/training_data_demo/output/
516+
!llava_next/examples/training_data_demo/output/**
517+
!llava_next/examples/training_data_demo/videos/
518+
!llava_next/examples/training_data_demo/videos/*.mp4

llava_next/.dockerignore

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
# Exclude model checkpoints (very large)
2+
checkpoints/
3+
*.pt
4+
*.pth
5+
*.bin
6+
*.safetensors
7+
*.ckpt
8+
9+
# Exclude Git related
10+
.git/
11+
.gitignore
12+
13+
# Exclude Python cache
14+
__pycache__/
15+
*.py[cod]
16+
*$py.class
17+
*.so
18+
.Python
19+
*.egg-info/
20+
*.egg
21+
dist/
22+
build/
23+
eggs/
24+
.eggs/
25+
26+
# Exclude editor and IDE files
27+
.vscode/
28+
.idea/
29+
*.swp
30+
*.swo
31+
*~
32+
.DS_Store
33+
34+
# Exclude logs and temporary files
35+
*.log
36+
logs/
37+
wandb/
38+
runs/
39+
outputs/
40+
temp/
41+
tmp/
42+
43+
# Exclude test data (if large)
44+
Compressed_Video_Reader/test_data/
45+
46+
# Exclude build artifacts (FFmpeg will be recompiled inside the image)
47+
Compressed_Video_Reader/ffmpeg/ffmpeg_source/
Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
# Compressed Video Reader
2+
3+
The Compressed Video Reader is designed to read motion vectors and residuals from H.264/H.265 encoded videos.
4+
5+
## Installation
6+
7+
To install the reader, you can run the installation script located in the project root:
8+
9+
```shell
10+
bash install.sh
11+
```
12+
13+
The script will perform the following tasks:
14+
15+
1. Download the source code of FFmpeg
16+
2. Apply patches to the source code
17+
3. Configure and compile the FFmpeg package
18+
4. Build and install the reader
19+
20+
To test if the reader has been successfully installed, run the following command:
21+
22+
```bash
23+
# Test if the reader is installed successfully.
24+
cv_reader -h || echo "Installation failed!"
25+
```
26+
27+
## Python API
28+
29+
```python
30+
import cv_reader
31+
video_frames = cv_reader.read_video(video_path=path_to_video, with_residual=True)
32+
```
33+
34+
## CLI Interface
35+
36+
You can use the following command to extract motion vectors and residuals from a compressed video:
37+
38+
```text
39+
$ cv_reader -h
40+
usage: Compressed Video Reader [-h] video output
41+
42+
positional arguments:
43+
video Path to h.264/h.265 video file
44+
output Path to save extracted motion vectors and residuals
45+
46+
optional arguments:
47+
-h, --help show this help message and exit
48+
```
49+
50+
To run the extraction process on the example video, execute the following command:
51+
52+
```bash
53+
python debug_vis_mvres.py --video ../test_videos/h264_sample.mp4 --num_frames 16 --out_dir ./h264_debug
54+
python debug_vis_mvres.py --video ../test_videos/h265_sample.mp4 --num_frames 16 --out_dir ./h265_debug
55+
```
Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
FROM pytorch/pytorch:2.7.0-cuda11.8-cudnn9-runtime
2+
3+
# Avoid interactive prompts during installation
4+
ENV DEBIAN_FRONTEND=noninteractive
5+
6+
RUN apt-get update && \
7+
apt-get install -y --no-install-recommends \
8+
build-essential \
9+
pkg-config \
10+
wget \
11+
# These are common dependencies used by ffmpeg/install_ffmpeg.sh
12+
libass-dev \
13+
libfreetype6-dev \
14+
libsdl2-dev \
15+
libtool \
16+
libva-dev \
17+
libvdpau-dev \
18+
libvorbis-dev \
19+
libxcb1-dev \
20+
libxcb-shm0-dev \
21+
libxcb-xfixes0-dev \
22+
texinfo \
23+
zlib1g-dev \
24+
nasm \
25+
yasm \
26+
libx264-dev \
27+
libx265-dev \
28+
libnuma-dev \
29+
libvpx-dev \
30+
libmp3lame-dev \
31+
libopus-dev \
32+
libgl1 \
33+
libglib2.0-0 \
34+
libsm6 \
35+
libxext6 \
36+
libxrender1 \
37+
vim \
38+
&& apt-get clean && rm -rf /var/lib/apt/lists/*
39+
40+
WORKDIR /workspace/
41+
COPY . .
42+
43+
# cv_reader CLI imports cv2, install headless version suitable for containers
44+
RUN pip install --no-cache-dir opencv-python-headless
45+
46+
# Execute install.sh to install ffmpeg / cv_reader etc
47+
RUN bash install.sh
48+
49+
# Default working directory
50+
WORKDIR /workspace
51+
52+
# Start bash by default for debugging
53+
CMD ["bash"]

0 commit comments

Comments
 (0)