You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add codec-style input documentation for temporal saliency-based patch selection.
220
+
For codec-style temporal saliency-based patch selection, use `patch_positions` to specify the 3D coordinates (temporal, height, width) of selected patches:
221
+
222
+
```python
223
+
import torch
224
+
import math
225
+
226
+
# Assume we have selected patches from multiple frames
227
+
num_frames =16# Number of sampled frames
228
+
frame_tokens =256# Tokens per frame (e.g., 16x16 patches)
229
+
target_frames =64# Target temporal dimension for RoPE
0 commit comments