Skip to content

Commit d6db0a5

Browse files
Copilotanxiangsir
andcommitted
Add visible_indices example output and Codec Input TODO section
Co-authored-by: anxiangsir <31175974+anxiangsir@users.noreply.github.com>
1 parent 20a6040 commit d6db0a5

1 file changed

Lines changed: 10 additions & 0 deletions

File tree

README.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -203,11 +203,21 @@ video = torch.randn(1, 3, num_frames, 224, 224).to("cuda")
203203
# Build visible_indices for temporal sampling
204204
frame_pos = torch.linspace(0, target_frames - 1, num_frames).long().cuda()
205205
visible_indices = (frame_pos.unsqueeze(-1) * frame_tokens + torch.arange(frame_tokens).cuda()).reshape(1, -1)
206+
# visible_indices example (with 256 tokens per frame):
207+
# Frame 0 (pos=0): indices [0, 1, 2, ..., 255]
208+
# Frame 1 (pos=4): indices [1024, 1025, 1026, ..., 1279]
209+
# Frame 2 (pos=8): indices [2048, 2049, 2050, ..., 2303]
210+
# ...
211+
# Frame 15 (pos=63): indices [16128, 16129, ..., 16383]
206212

207213
with torch.no_grad():
208214
outputs = model(video, visible_indices=visible_indices)
209215
```
210216

217+
### Codec Input
218+
219+
> **TODO:** Add codec-style input documentation for temporal saliency-based patch selection.
220+
211221
---
212222

213223
## 🚀 Training

0 commit comments

Comments
 (0)