You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PR #3948: Scrub Paged Attention Serving Kernels and Obsolete Allocators
Imported from GitHub PR #3948
# Description
This PR completely removes and scrubs the obsolete, TPU-specific, and vulnerable custom JAX Pallas Paged Attention serving kernels and configurations from MaxText. During security reviews (such as `b/510375529`), the physical page manager allocator was flagged for potential cross-tenant HBM memory leaks. Following team alignment, it was confirmed that no production multimodal or Reinforcement Learning (RL) serving pipelines use `attention="paged"`.
To ensure model architecture definitions (Gemma, Llama, Mistral, Qwen, DeepSeek) and attention layer files do not crash on missing imports during this transition, this PR introduces lightweight, zero-overhead transitional compatibility shims in place of the deleted allocator and operator.
### Details & Implementation:
* **Paged Attention Deletion**: Purged custom Pallas kernels (`paged_attention_kernel_v2.py`) and page manager tests (`page_manager_test.py`). Scrubbed `pagedattn_` variables from `base.yml`, `types.py`, and config validators inside `pyconfig_deprecated.py`. Scrubbed page allocators and layout bindings inside `maxengine.py`.
* **Transitional Compatibility Shims**: Created dummy shims for `src/maxtext/inference/page_manager.py` (`PageState`, `PageManager`) and `src/maxtext/inference/paged_attention.py` (`PagedAttentionOp`) to allow all model layers to compile successfully with zero code mutations.
# Tests
CI integration tests
# Checklist
Before submitting this PR, please make sure (put X in square brackets):
- [X] I have performed a self-review of my code. For an optional AI review, add the `gemini-review` label.
- [X] I have necessary comments in my code, particularly in hard-to-understand areas.
- [X] I have run end-to-end tests tests and provided workload links above if applicable.
- [X] I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in [our documentation](https://maxtext.readthedocs.io/en/latest/development.html#adding-new-documentation-files).
Copybara import of the project:
--
4c9c3b2 by Jetski <jetski@google.com>:
Fix MaxEngine chunked prefill JAX dynamic shape trace alignment, serving batch capacity mismatch, and purge obsolete paged attention serving kernels
Merging this change closes#3948
COPYBARA_INTEGRATE_REVIEW=#3948 from AI-Hypercomputer:fix-oom-aot-clean 4c9c3b2
PiperOrigin-RevId: 922761322
0 commit comments