Skip to content

Commit 78d9505

Browse files
committed
updated
2 parents 931ff44 + 77a48bb commit 78d9505

1 file changed

Lines changed: 56 additions & 16 deletions

File tree

README.md

Lines changed: 56 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ Combined with global contrastive learning using a 2M concept bank, OneVision Enc
3333
### Method Overview
3434

3535
<p align="center">
36-
<img src="https://github.com/anxiangsir/asset/blob/main/OneVision/method.jpg" alt="OneVision Encoder Method Overview" width="800" style="max-width: 100%;">
36+
<img src="https://raw.githubusercontent.com/anxiangsir/asset/main/OneVision/method.jpg" alt="OneVision Encoder Method Overview" width="800" style="max-width: 100%;">
3737
</p>
3838

3939
### Cluster Discrimination Visualization
@@ -52,12 +52,12 @@ The visualization below demonstrates our complete video processing pipeline. The
5252
<table>
5353
<tr>
5454
<td align="center">
55-
<img src="https://github.com/anxiangsir/asset/blob/main/OneVision/case4.gif" alt="Case 4 Demonstration" width="800"><br>
55+
<img src="https://raw.githubusercontent.com/anxiangsir/asset/main/OneVision/case4.gif" alt="Case 4 Demonstration" width="800"><br>
5656
</td>
5757
</tr>
5858
<tr>
5959
<td align="center">
60-
<img src="https://github.com/anxiangsir/asset/blob/main/OneVision/case5.gif" alt="Case 4 Demonstration" width="800"><br>
60+
<img src="https://raw.githubusercontent.com/anxiangsir/asset/main/OneVision/case5.gif" alt="Case 5 Demonstration" width="800"><br>
6161
</td>
6262
</tr>
6363
</table>
@@ -91,16 +91,19 @@ Training on a mixed dataset of 740K samples from LLaVA-OneVision and 800K sample
9191
- Docker with NVIDIA GPU support
9292
- CUDA-compatible GPU(s)
9393

94-
### Mount NFS
94+
### Mount Data Storage (Optional)
9595

96-
```bash
97-
mkdir -p /video_vit
98-
mount -t nfs4 -o minorversion=1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport cfs-iyHiNUmePn.lb-0a25b0a7.cfs.bj.baidubce.com:/ /video_vit
96+
If using shared storage for datasets, mount your NFS/CFS volumes:
9997

100-
mkdir -p /vlm
101-
mount -t nfs4 -o minorversion=1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport cfs-xvbkSb1zPT.lb-563926be.cfs.bj.baidubce.com:/ /vlm
98+
```bash
99+
mkdir -p /video_vit /vlm
100+
mount -t nfs4 -o minorversion=1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport <your-nfs-server>:/ /video_vit
101+
mount -t nfs4 -o minorversion=1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport <your-nfs-server>:/ /vlm
102102
```
103103

104+
> [!NOTE]
105+
> Replace `<your-nfs-server>` with your actual storage endpoint. Internal users should refer to the internal documentation for specific mount configurations.
106+
104107
### Docker Build
105108

106109
#### Option 1: Build from Dockerfile
@@ -122,8 +125,8 @@ docker tag $(docker images -q | head -n 1) llava_vit:25.11.22
122125

123126
```bash
124127
docker run -it --gpus all --ipc host --net host --privileged \
125-
-v "$(pwd)":/workspace/OneVision Encoder \
126-
-w /workspace/OneVision Encoder \
128+
-v "$(pwd)":/workspace/OneVision-Encoder \
129+
-w /workspace/OneVision-Encoder \
127130
llava_vit:25.11.22 bash
128131
```
129132

@@ -135,12 +138,24 @@ docker run -it --gpus all --ipc host --net host --privileged \
135138
```bash
136139
docker run -it --gpus all --ipc host --net host --privileged --cap-add IPC_LOCK \
137140
--ulimit memlock=-1 --ulimit stack=67108864 --rm \
138-
-v "$(pwd)":/workspace/OneVision Encoder -v /train_tmp:/train_tmp \
141+
-v "$(pwd)":/workspace/OneVision-Encoder \
142+
-v /train_tmp:/train_tmp \
139143
-v /vlm:/vlm -v /video_vit:/video_vit -v /rice_ocr:/rice_ocr \
140144
-v /data_0:/data_0 -v /data_1:/data_1 -v /data_2:/data_2 -v /data_3:/data_3 \
141-
-w /workspace/OneVision Encoder/ \
142-
-e NCCL_TIMEOUT=1800 -e CUDA_DEVICE_MAX_CONNECTIONS=1 -e NCCL_SOCKET_IFNAME=eth0 -e NCCL_IB_GID_INDEX=3 -e NCCL_IB_DISABLE=0 -e NCCL_IB_HCA="mlx5_2,mlx5_3,mlx5_4,mlx5_5,mlx5_6,mlx5_7,mlx5_8,mlx5_1" -e NCCL_NET_GDR_LEVEL=2 -e NCCL_IB_QPS_PER_CONNECTION=4 -e NCCL_IB_TC=160 -e NCCL_IB_TIMEOUT=22 -e NCCL_CROSS_NIC=1 -e NCCL_MIN_NCHANNELS=8 -e NCCL_MAX_NCHANNELS=16 \
143-
-e http_proxy=http://172.16.5.77:8889 -e https_proxy=http://172.16.5.77:8889 \
145+
-w /workspace/OneVision-Encoder \
146+
-e NCCL_TIMEOUT=1800 \
147+
-e CUDA_DEVICE_MAX_CONNECTIONS=1 \
148+
-e NCCL_SOCKET_IFNAME=eth0 \
149+
-e NCCL_IB_GID_INDEX=3 \
150+
-e NCCL_IB_DISABLE=0 \
151+
-e NCCL_IB_HCA="mlx5_2,mlx5_3,mlx5_4,mlx5_5,mlx5_6,mlx5_7,mlx5_8,mlx5_1" \
152+
-e NCCL_NET_GDR_LEVEL=2 \
153+
-e NCCL_IB_QPS_PER_CONNECTION=4 \
154+
-e NCCL_IB_TC=160 \
155+
-e NCCL_IB_TIMEOUT=22 \
156+
-e NCCL_CROSS_NIC=1 \
157+
-e NCCL_MIN_NCHANNELS=8 \
158+
-e NCCL_MAX_NCHANNELS=16 \
144159
llava_vit:25.11.22 bash -c "service ssh restart; bash"
145160
```
146161

@@ -194,7 +209,32 @@ torchrun --nproc_per_node 8 --master_port 15555 \
194209

195210
---
196211

212+
## 📦 Packing ViT Model
213+
214+
To package a trained ViT model for distribution or deployment:
215+
216+
```bash
217+
python -m tools.pack_model \
218+
--checkpoint ./output/baseline/checkpoint.pt \
219+
--output ./output/packed_model
220+
```
221+
222+
The packed model can be loaded directly with HuggingFace Transformers:
223+
224+
```python
225+
from onevision_encoder import OneVisionEncoderModel
226+
227+
model = OneVisionEncoderModel.from_pretrained("./output/packed_model")
228+
```
229+
230+
---
231+
232+
## 👥 Contributors
233+
234+
<!-- Add contributor list here -->
235+
236+
---
197237

198238
## 📄 License
199239

200-
This project is open source.
240+
This project is released under the Apache 2.0 License.

0 commit comments

Comments
 (0)