Skip to content

Commit 1caf68d

Browse files
committed
feat: update Docker setup and dependencies for OneVision Encoder
- Rename Docker image to `onevision-encoder:2601` - Add `--rm` flag to docker run command for cleaner cleanup - Refactor Dockerfile to improve layer caching and reduce size: - Install system deps and static ffmpeg binary - Use requirements.txt for Python dependencies - Set environment variables for better container behavior - Add `torchmetrics` to requirements.txt - Minor formatting fixes in README.md
1 parent 6212cdf commit 1caf68d

3 files changed

Lines changed: 36 additions & 21 deletions

File tree

README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@
3535

3636
## 🔍 Introduction
3737

38-
Video understanding models face a fundamental trade-off: incorporating more frames enables richer temporal reasoning but increases computational cost quadratically.
38+
Video understanding models face a fundamental trade-off: incorporating more frames enables richer temporal reasoning but increases computational cost quadratically.
3939
Conventional approaches mitigate this by sparsely sampling frames, however, this strategy discards fine-grained motion dynamics and treats all spatial regions uniformly, resulting in wasted computation on static content.
4040

4141
We introduce OneVision Encoder, a vision transformer that resolves this trade-off by drawing inspiration from HEVC (High-Efficiency Video Coding). Rather than densely processing all patches from a few frames, OneVision Encoder sparsely selects informative patches from many frames. This codec-inspired patch selection mechanism identifies temporally salient regions (e.g., motion, object interactions, and semantic changes) and allocates computation exclusively to these informative areas.
@@ -64,7 +64,7 @@ Coupled with global contrastive learning over a 2M-scale concept memory bank, On
6464

6565
### Video Processing Pipeline
6666

67-
The visualization below illustrates four different video processing pipelines.
67+
The visualization below illustrates four different video processing pipelines.
6868
(1) Original Video: a continuous 64-frame sequence that preserves the complete temporal context.
6969
(2) Uniform Frame Sampling: a conventional strategy that selects 4–8 evenly spaced frames; while simple and efficient, it is inherently lossy and fails to capture fine-grained inter-frame motion.
7070
(3) Temporal Saliency Detection: a global analysis of all 64 frames to identify regions rich in temporal information, including motion patterns, appearance variations, and semantic events.
@@ -208,14 +208,14 @@ More documentation will be added soon.
208208

209209

210210
```bash
211-
docker build -t ov_encoder:25.12 .
211+
docker build -t onevision-encoder:2601 .
212212
```
213213

214214
```bash
215-
docker run -it --gpus all --ipc host --net host --privileged \
215+
docker run -it --rm --gpus all --ipc host --net host --privileged \
216216
-v "$(pwd)":/workspace/OneVision-Encoder \
217217
-w /workspace/OneVision-Encoder \
218-
ov_encoder:25.12 bash
218+
onevision-encoder:2601 bash
219219
```
220220

221221

dockerfile

Lines changed: 29 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,35 @@
11
FROM pytorch/pytorch:2.7.0-cuda11.8-cudnn9-runtime
22

3-
# Set up environment variables (optional, but often recommended)
4-
ENV DEBIAN_FRONTEND=noninteractive
3+
# Set up environment variables
4+
ENV DEBIAN_FRONTEND=noninteractive \
5+
PYTHONUNBUFFERED=1 \
6+
PIP_NO_CACHE_DIR=1
57

6-
# Install Python packages
7-
RUN pip install --no-cache-dir --extra-index-url https://pypi.nvidia.com --upgrade nvidia-dali-cuda110 \
8-
&& pip install --no-cache-dir decord==0.6.0 \
9-
&& pip install --no-cache-dir timm \
10-
&& pip install --no-cache-dir transformers==4.53.1 \
11-
&& pip install --no-cache-dir tensorboard \
12-
&& pip install --no-cache-dir easydict
8+
# Install system dependencies and ffmpeg in one layer
9+
RUN set -eux; \
10+
apt-get update && apt-get install -y --no-install-recommends \
11+
curl \
12+
ca-certificates \
13+
xz-utils \
14+
&& rm -rf /var/lib/apt/lists/* \
15+
&& cd /tmp \
16+
&& curl -L https://johnvansickle.com/ffmpeg/releases/ffmpeg-release-amd64-static.tar.xz -o ffmpeg.tar.xz \
17+
&& tar -xf ffmpeg.tar.xz \
18+
&& cd ffmpeg-*-static \
19+
&& install -m 0755 ffmpeg /usr/local/bin/ffmpeg \
20+
&& install -m 0755 ffprobe /usr/local/bin/ffprobe \
21+
&& cd / \
22+
&& rm -rf /tmp/ffmpeg* \
23+
&& ffprobe -version
1324

14-
# (Optional) Set working directory
15-
WORKDIR /workspace
25+
# Copy requirements file first (for better caching)
26+
COPY requirements.txt /tmp/requirements.txt
1627

17-
# (Optional) Copy your code into the container
18-
# COPY . /workspace
28+
# Install Python packages in optimized order
29+
RUN pip install --no-cache-dir --upgrade pip setuptools wheel && \
30+
pip install --no-cache-dir --extra-index-url https://pypi.nvidia.com nvidia-dali-cuda110 && \
31+
pip install --no-cache-dir -r /tmp/requirements.txt && \
32+
rm /tmp/requirements.txt
1933

20-
# (Optional) Set entrypoint or CMD
21-
# CMD ["python", "your_script.py"]
34+
# Set default command
35+
CMD ["/bin/bash"]

requirements.txt

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,5 @@ easydict
33
huggingface-hub>=0.23.2,<1.0
44
tensorboard
55
timm
6-
transformers==4.53.1
6+
transformers==4.53.1
7+
torchmetrics

0 commit comments

Comments
 (0)