Skip to content

hooks/pyramid_attention_broadcast: remove redundant iteration==0 guard and fix stale cache VRAM leak#13497

Open
GitGlimpse895 wants to merge 1 commit intohuggingface:mainfrom
GitGlimpse895:fix/pab-cache-logic
Open

hooks/pyramid_attention_broadcast: remove redundant iteration==0 guard and fix stale cache VRAM leak#13497
GitGlimpse895 wants to merge 1 commit intohuggingface:mainfrom
GitGlimpse895:fix/pab-cache-logic

Conversation

@GitGlimpse895
Copy link
Copy Markdown

What does this PR do?

Fixes two bugs in PyramidAttentionBroadcastHook.new_forward:

1. Redundant iteration == 0 condition
After every reset_state(), both iteration and cache are reset to 0 and None respectively. So at iteration 0, self.state.cache is None is always True, making self.state.iteration == 0 permanently dead code that creates a misleading impression of two independent invariants.

2. Stale cache leaking GPU VRAM
When outside the active timestep range, the hook was unconditionally writing self.state.cache = output, holding a full hidden-state activation tensor on GPU until the next generation's reset_state() call. For video transformers with dozens of PAB-hooked layers, this accumulates hundreds of MBs of unreleased VRAM. The fix sets self.state.cache = None immediately when outside the range.

This is a re-submission of #13467 (accidentally self-closed). Apologies for the noise, @sayakpaul!

Checklist

  • Did you read the contributor guideline?
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Who can review?

@sayakpaul @DN6

@github-actions github-actions Bot added hooks size/S PR with diff < 50 LOC labels Apr 17, 2026
@github-actions github-actions Bot added size/S PR with diff < 50 LOC and removed size/S PR with diff < 50 LOC labels Apr 17, 2026
@github-actions github-actions Bot added size/S PR with diff < 50 LOC and removed size/S PR with diff < 50 LOC labels Apr 18, 2026
@github-actions github-actions Bot added size/S PR with diff < 50 LOC and removed size/S PR with diff < 50 LOC labels Apr 19, 2026
@github-actions github-actions Bot added size/S PR with diff < 50 LOC and removed size/S PR with diff < 50 LOC labels Apr 23, 2026
@GitGlimpse895
Copy link
Copy Markdown
Author

Hi @sayakpaul and @DN6 — would love to get your eyes on this when you get a chance! This is a re-submission of #13467 (sorry for the noise from the accidental close). The change is small (1 addition, 2 deletions) and addresses the redundant iteration == 0 guard and the stale cache VRAM leak in PyramidAttentionBroadcastHook. Happy to answer any questions or make adjustments based on your feedback. Thanks so much for your time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hooks size/S PR with diff < 50 LOC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant