Skip to content

Support ragged grids in attentional_pool#136

Open
amazloumi wants to merge 5 commits into
mainfrom
feat/ragged-attentional-pool
Open

Support ragged grids in attentional_pool#136
amazloumi wants to merge 5 commits into
mainfrom
feat/ragged-attentional-pool

Conversation

@amazloumi

@amazloumi amazloumi commented Jun 30, 2026

Copy link
Copy Markdown
Member

Summary

  • attentional_pool now pools ragged patch grids (grid not divisible by the pool window) instead of rejecting them: each partial edge window pools only its real patches via a masked attention — pad the grid to ceil(grid/w)·w, build a per-window key-padding mask, use the masked window-mean as the query, and mask padded patches out of the SDPA K/V. Mirrors avgpool's ragged handling.
  • Enables 3×3 video pooling on a 14×14 SigLIP grid (14 % 3 != 0 → 5×5 = 25 tokens), which previously had to fall back to avgpool or a divisible window.
  • Divisible grids are bit-exact with before — no padding/mask is built, so SDPA takes the same attn_mask=None path.
  • AttentionalPoolAdapter.output_num_tokensceil(grid/w)**2 (drops require_divisible); DIVISIBLE_ONLY_POOL_TYPES is emptied (kept as a seam for a future divisible-only connector), so config/adapter.py's output_num_tokens and the max_seq_len checks accept ragged.
  • No new state-dict keys (same projection weights; window count is sized by the existing output_num_tokens). The "v1 requires divisible / ragged is a follow-up" caveats are removed from the docstrings + how-to.
  • Touches: kempnerforge/model/adapter.py, tests/unit/test_adapter.py, docs/how-to/train-on-video.md, CHANGELOG.md.

Testing

  • uv run ruff check kempnerforge/ tests/ passes
  • uv run ruff format --check kempnerforge/ tests/ scripts/ passes
  • uv run pyright kempnerforge/ passes (0 errors, 0 warnings, 0 informations)
  • uv run pytest tests/unit/ -v --timeout=60 — ran tests/unit/test_adapter.py: 91 passed (new: ragged token count 14×14 @ 3 → 25, masked edge-window correctness — a 1-real-patch window equals attention over that single patch, output_num_tokens matches forward for ragged, config accepts ragged; flipped the old ragged-rejection tests). Full unit suite runs in CI.
  • If distributed code changed: uv run torchrun --nproc_per_node=4 -m pytest tests/distributed/ -v — n/a: no distributed//parallelism code changed; the adapter runs unchanged under FSDP.
  • If training loop / parallelism / optimizers changed: uv run pytest tests/e2e/ --e2e -v — n/a: pure adapter-forward math + config gate.

Closes #135
refs: KEM-546

@codecov

codecov Bot commented Jun 30, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

Files with missing lines Coverage Δ
kempnerforge/model/adapter.py 97.45% <100.00%> (+0.17%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the vision connector’s attentional_pool adapter to support ragged patch grids (where the grid side is not divisible by the pooling window) by padding to ceil(grid/window) * window and masking padded patches so partial edge windows only pool over real patches. This brings attentional_pool to parity with avgpool, unblocks Molmo2-style 3×3 video pooling on SigLIP’s 14×14 grid, and ensures config-time token-count estimation matches runtime behavior.

Changes:

  • Implement ragged-grid handling in AttentionalPoolAdapter.forward via bottom/right padding + per-window boolean attention mask + masked-mean query.
  • Update token-count prediction to ceil(grid/window)² for attentional_pool (and remove the divisibility-only enforcement seam for current pooling adapters).
  • Add/adjust unit tests and update documentation + changelog to reflect ragged support.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
kempnerforge/model/adapter.py Adds ragged-grid padding/masking to attentional_pool and aligns output_num_tokens with ceil(grid/window)².
tests/unit/test_adapter.py Updates expectations and adds coverage for ragged token counts and masked edge-window correctness.
docs/how-to/train-on-video.md Documents that both pooling connectors support ragged windows and gives the 14×14 @ window=3 example (25 tokens).
CHANGELOG.md Records the new ragged attentional_pool capability and its practical impact on 3×3 video pooling.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support ragged grids in attentional_pool

3 participants