Skip to content

Commit 8fb1c21

Browse files
authored
Condense and reposition codec patch selection documentation (#81)
1 parent c453abf commit 8fb1c21

1 file changed

Lines changed: 17 additions & 0 deletions

File tree

README.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@
3333
- [Quick Start](#-quick-start)
3434
- [Training](#-training)
3535
- [Evaluation](#-evaluation)
36+
- [Codec Style Patch Selection](#-codec-style-patch-selection)
3637
- [Contributors](#-contributors)
3738
- [License](#-license)
3839
- [Documentation](#-documentation)
@@ -460,6 +461,22 @@ bash shells_eval_ap/eval_ov_encoder_large_2kpatches_codec.sh
460461

461462
</details>
462463

464+
---
465+
466+
## 🎬 Codec Style Patch Selection
467+
468+
The codec-inspired patch selection mechanism identifies and processes only the most informative patches from video frames, inspired by HEVC video coding.
469+
470+
**Implementation in [`llava_next`](llava_next):**
471+
472+
- **Pipeline**: [`Compressed_Video_Reader/tool/`](llava_next/Compressed_Video_Reader/tool/) - Stage 1 extracts codec info (MV/Residual energy), Stage 2 packs patches with position coordinates
473+
- **Training**: [`llava/train/train.py`](llava_next/llava/train/train.py) - Loads `positions_thw.npy` patch positions
474+
- **Model**: [`llava/model/llava_arch.py`](llava_next/llava/model/llava_arch.py) - Passes positions to vision encoder
475+
476+
For detailed usage, see the [LLaVA-Next README](llava_next/README.md).
477+
478+
---
479+
463480
## 👥 Contributors
464481

465482
<!-- Add contributor list here -->

0 commit comments

Comments
 (0)