Mask padded video frames from attention by amazloumi · Pull Request #132 · KempnerInstitute/KempnerForge

amazloumi · 2026-06-29T22:36:07Z

Summary

Consume the per-frame frame_mask so real tokens never attend to padded-frame visual tokens, across all four archs. One generic per-token validity mask, ModalityContext.key_padding_mask (B, S), threads through the model — consumers are modality-agnostic.
Shared Attention ANDs it with the causal (and doc) mask — covers Joint-Decoder + MoMa; MoTAttention builds an explicit causal-AND-valid mask; Cross-Attention masks the padded image K/V via its existingimage_mask.
MoMaFFN excludes padded positions from expert-choice routing — otherwise padded tokens consume expert capacity and perturb which real tokens get processed (caught by a per-arch invariance test).
NaN guard unmasks fully-masked query rows (an all-padded / undecodable clip) so softmax stays finite.
vlm.py: _visual_token_mask expands frame_mask (B,F) → (B, F·P′); the four strategies place it. scripts/train.py threads batch["frame_mask"].
Pure mask — no new state-dict keys (checkpoints stay compatible both ways); image (F=1) and text paths are unchanged (no-op when nothing is padded). Foundation for variable-length / mixed image+video batches.
Docs + CHANGELOG updated; "frame-mask-aware attention" removed from the video Deferred list.
Follow-up (out of scope): MoT with an MoE FFN still routes padded tokens through the shared MoEMLP — a "generic token-validity in MoE" change (also fixes padded text). MoT-dense (default) and MoMa are fully masked here.

Testing

uv run ruff check kempnerforge/ tests/ passes
uv run ruff format --check kempnerforge/ tests/ scripts/ passes
uv run pyright kempnerforge/ passes (0 errors)
uv run pytest tests/unit/ -v --timeout=60 passes — affected files + the doc_ids/packing regression: 234 passed (new: test_vlm.py::TestFramePaddingMask — per-arch masking invariance, image no-op, undecodable-clip NaN guard, mask expansion; test_moma.py FFN routing exclusion; test_modality_context.py invariant)
If distributed code changed: uv run torchrun --nproc_per_node=4 -m pytest tests/distributed/ -v ← parallel.py unchanged, but the model forward (attention/mot/moma) changed; worth running on GPUs

[x] e2e: 24 passed, 6 failed — all 6 failures are pre-existing on main (PP/checkpoint/HF/sigterm infra, confirmed via baseline run) and unrelated to this PR; tracked in #133. The basic FSDP/TP e2e tests, which exercise the changed model forward, pass.

Closes #131

codecov · 2026-06-29T22:39:51Z

Codecov Report

❌ Patch coverage is 87.87879% with 8 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
kempnerforge/model/attention.py	55.55%	7 Missing and 1 partial ⚠️

Files with missing lines	Coverage Δ
kempnerforge/model/cross_attention.py	`100.00% <100.00%> (ø)`
kempnerforge/model/modality.py	`100.00% <100.00%> (ø)`
kempnerforge/model/moma.py	`98.07% <100.00%> (+0.09%)`	⬆️
kempnerforge/model/mot.py	`94.21% <100.00%> (+0.35%)`	⬆️
kempnerforge/model/transformer.py	`94.62% <100.00%> (+0.02%)`	⬆️
kempnerforge/model/vlm.py	`99.18% <100.00%> (+0.11%)`	⬆️
kempnerforge/model/attention.py	`87.87% <55.55%> (-3.14%)`	⬇️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull request overview

This PR wires a per-frame frame_mask through the VLM stack so padded video frames are masked out of attention (and, for MoMa, excluded from expert-choice routing), ensuring real-token outputs are invariant to padded-frame content across all four VLM architectures.

Changes:

Add a generic per-token validity mask (ModalityContext.key_padding_mask) and thread it through Transformer blocks (and MoT/MoMa paths).
Consume the mask in attention implementations (shared self-attention, MoT self-attention, cross-attention image K/V masking) with a NaN guard for fully-masked rows.
Pass frame_mask from the training loop, add unit tests for masking invariance/NaN-guard behavior, and update docs/CHANGELOG.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`kempnerforge/model/vlm.py`	Expands `frame_mask` to per-visual-token masks and places masks into `ModalityContext` per-arch.
`kempnerforge/model/modality.py`	Adds `ModalityContext.key_padding_mask` field and invariant checks.
`kempnerforge/model/transformer.py`	Threads `key_padding_mask` through blocks and MoT/MoMa branches.
`kempnerforge/model/attention.py`	ANDs `key_padding_mask` with causal/doc masks + NaN guard in shared self-attention.
`kempnerforge/model/mot.py`	Builds explicit causal-and-valid attention mask when `key_padding_mask` is provided + NaN guard.
`kempnerforge/model/cross_attention.py`	Uses `image_mask` with a NaN guard to keep all-masked rows finite.
`kempnerforge/model/moma.py`	Excludes padded positions from expert-choice routing via `key_padding_mask`.
`scripts/train.py`	Threads `batch["frame_mask"]` (when present) into the model forward.
`tests/unit/test_vlm.py`	Adds masking invariance, mask expansion, F=1 no-op, and all-padded NaN-guard tests across archs.
`tests/unit/test_moma.py`	Adds unit test ensuring padded tokens are excluded from MoMa routing and don’t perturb real outputs.
`tests/unit/test_modality_context.py`	Adds invariant tests for `key_padding_mask`.
`docs/how-to/train-on-video.md`	Updates training-on-video docs to reflect frame-mask-aware behavior and trade-offs.
`CHANGELOG.md`	Documents the new frame padding masking behavior and related changes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Mask padded video frames from attention

0445ce3

amazloumi requested review from Naeemkh, camilobrownpinilla and mmshad June 30, 2026 00:33

FA2 trade-off docs, visual-token divisibility guard, kv_cache assert

9d350c5

amazloumi mentioned this pull request Jun 30, 2026

Recover FlashAttention-2 on video training (the frame-mask key-padding disables the fast path) #134

Open

amazloumi requested a review from Copilot June 30, 2026 03:18

Copilot started reviewing on behalf of amazloumi June 30, 2026 03:18 View session

Copilot AI reviewed Jun 30, 2026

View reviewed changes

Comment thread kempnerforge/model/attention.py Outdated

Comment thread docs/how-to/train-on-video.md Outdated

Comment thread CHANGELOG.md Outdated

Scope FA2-disabled video note to image-prefix arches

2670fd9

Copilot started work on behalf of amazloumi June 30, 2026 04:41 View session

Copilot finished work on behalf of amazloumi June 30, 2026 04:42

Copilot started work on behalf of amazloumi June 30, 2026 04:44 View session

Copilot finished work on behalf of amazloumi June 30, 2026 04:45

Copilot started work on behalf of amazloumi June 30, 2026 04:46 View session

Copilot finished work on behalf of amazloumi June 30, 2026 04:46

Naeemkh approved these changes Jun 30, 2026

View reviewed changes

amazloumi merged commit 32accd1 into main Jun 30, 2026
9 checks passed

amazloumi deleted the feat/mask-padded-frames branch June 30, 2026 16:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Mask padded video frames from attention#132

Mask padded video frames from attention#132
amazloumi merged 3 commits into
mainfrom
feat/mask-padded-frames

amazloumi commented Jun 29, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jun 29, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

amazloumi commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Uh oh!

codecov Bot commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

amazloumi commented Jun 29, 2026 •

edited

Loading

codecov Bot commented Jun 29, 2026 •

edited

Loading