[LED][Longformer] Replace for-loop with unfold in _chunk ONNX-export path#46169
Open
guinik wants to merge 2 commits into
Open
[LED][Longformer] Replace for-loop with unfold in _chunk ONNX-export path#46169guinik wants to merge 2 commits into
guinik wants to merge 2 commits into
Conversation
Contributor
|
[For maintainers] Suggested jobs to run (before merge) run-slow: led, longformer |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Removes a now-obsolete workaround in
LEDEncoderSelfAttention._chunkand the mirrored
LongformerSelfAttention._chunk. Both files carrythe same blocking TODO:
The ONNX-export branch used a Python
forloop becausetorch.unfoldwas not supported by the ONNX exporter at the time. PyTorch's ONNX
exporters now handle
aten::unfold, so the workaround can go.Fixes the stale TODO in both
modeling_led.pyandmodeling_longformer.py— same change applied to both files.Compatibility note (please read first)
This is technically a behavior change for one narrow export path, so
flagging it up front:
The legacy TorchScript ONNX exporter's
unfoldsymbolic requires theunfolded dimension to be statically known at export time. After this
change, anyone exporting LED/Longformer under the legacy exporter
with
dynamic_axeson the sequence dimension will get a clear error:"Unfold, input size not accessible".Affected users have two paths forward:
without changes).
handles symbolic shapes natively).
The legacy TorchScript exporter is deprecated as of PyTorch 2.9, so
(2) is the documented migration path.
If you'd prefer to preserve a fallback for users still on the legacy
exporter + dynamic shapes, happy to add one, but I don't think a
PyTorch ≥ 2.9 version check is the right shape for it. Two reasons:
dynamo=Truethe default fortorch.onnx.export, but the legacy exporter is still callable andin active use. Users on 2.9+ pass
dynamo=Falsefor real reasons(avoiding
onnxscriptas a transitive dependency, olderoptimumpins, custom export wrappers that still drive the legacy path). A
version-only check would silently break them.
which exporter is currently running.
torch.onnx.is_in_onnx_export()tells you an export is in flight,not which one, so a runtime capability check isn't really
available either.
The cleanest alternative I can see is an explicit opt-in config flag,
mirroring the existing
config.onnx_exportflag this function alreadyreads:
Call sites in
_sliding_chunks_query_key_matmulwould thread the flagthrough alongside the existing
onnx_exportone:Why this shape:
unfoldpath, so modern users get thememory win and cleaner graph with no action required.
escape hatch (
config.onnx_export_legacy_chunk = True) instead of ahard break.
onnx_exportflag, so it'sconsistent with the surrounding code rather than new API surface in
an unrelated spot.
the flag and the loop can both be dropped without touching the fast
path.
That said, I don't have strong feelings here, the legacy exporter is
deprecated as of 2.9 and the documented migration is the dynamo
exporter, so a clean break is also defensible. Happy to push the flag
as a follow-up commit on this PR if you want it, or leave the current
diff as-is. Let me know which you prefer.
Why make the change
torch.empty(...)tensoron every call. For
led-base-16384production shapes(
bh=24, seq=16384, w=512) this is ~195 MB per call. Theunfoldreplacement is a view that shares storage with the input, allocating
nothing, matching the zero-copy semantics of the non-ONNX
as_stridedpath that already exists in the same function.intent directly.
single view op rather than N scatter writes into a freshly allocated
tensor (N = 31 at xl shapes), giving downstream runtimes a cleaner
graph to optimize.
Verification
Verified locally with a parametrized test covering 24 combinations:
(bh=8, seq=512, w=64),(24, 2048, 256),(24, 4096, 512),(24, 16384, 512)exporter (default since PyTorch 2.9)
Each case verifies (a) the exported model passes
onnx.checker.check_model, (b) runs under onnxruntime's CPU EP, and(c) produces output numerically identical (
atol=1e-6) to the existing_chunk(..., onnx_export=True)reference computed on the same input.The legacy exporter is tested with static shapes per its documented
unfoldsymbolic limitation; the dynamo exporter is tested with itsnative symbolic-shape handling.
Testing code
Result: 24/24 PASSED on Windows 11 / PyTorch (current) / onnx /
onnxruntime.
I didn't add this to the repo because it would pull
onnxandonnxruntimeinto the test deps. Happy to add it gated behindrequire_onnx/require_onnxruntimedecorators (matching theexisting pattern for optional deps in
transformers) if you'd preferit in-tree — one-line ask, just say the word.
Code Agent Policy
Before submitting
Who can review?
Small ONNX-export cleanup, applied identically in LED and Longformer.
Per the tagging guide: