Commit 8744183
Make sparse_type_int_to_dtype import resilient to stale frozen torch.package depots (#5942)
Summary:
Pull Request resolved: #5942
Unblocks the ai_infra/model_processing Conveyor pipeline, whose "Contbuild
Tracking (lego_flow)" node has been red since ~R1495 (last green R1494,
2026-06-13), blocking all push/promotion because the Test Success gate never
passes.
Root cause: a torch.package version-skew inside the frozen package that GMPP
re-exports during offline/recurring publish. The GMPP publish path resolves
interned fbgemm_gpu modules through an OrderedImporter that places the frozen
module_factory_depot ahead of the live sys_importer
(aiplatform/modelstore/publish/utils/torch_package.py). For the affected models,
fbgemm_gpu/split_embedding_configs.py EXISTS in the depot archive and is captured
STALE (predates D79869613, so it lacks sparse_type_int_to_dtype), while
fbgemm_gpu/split_table_batched_embeddings_ops_training_common.py is ABSENT from
the archive and falls through to sys_importer, captured FRESH from trunk where
D107684316 ("Migrate fbgemm internals off the ops_common shim to leaf imports")
made it do `from fbgemm_gpu.split_embedding_configs import sparse_type_int_to_dtype`.
The fresh consumer + stale provider are co-packaged, so module load fails with:
ImportError: cannot import name 'sparse_type_int_to_dtype'
from '<torch_package_1>.fbgemm_gpu.split_embedding_configs'
This forward-fix (no revert; keeps the leaf-import direction of D107684316) makes
the single import site defensive: the leaf import remains the primary path on
trunk, and a local TorchScript-compatible fallback (equivalent to the trunk
definition / SparseType.from_int(ty).as_dtype()) is used only when this module is
loaded alongside a stale split_embedding_configs.py inside a frozen package.
sparse_type_int_to_dtype has exactly one consumer in the codebase
(split_table_batched_embeddings_ops_training_common.py:204) and no external
direct importers, so wrapping this one import site covers 100% of the symbol's
usage. On trunk / freshly-frozen packages behavior is byte-identical (the try
path always succeeds); the except path only executes inside the stale-depot
population (2865/3473 checked-in depots).
Follow-up (separate, non-urgent): the architectural root cause is the
depot-ahead-of-sys_importer ordering in torch_package.py / model_packager.py;
making co-located fbgemm modules always capture from a single consistent source
would prevent future symbol skews. Tracked separately, owned by model_processing
packaging.
Reviewed By: sophielin508
Differential Revision: D109168966
fbshipit-source-id: cf46eeadd198aecf234e8e6ed366b4aa8e9ebc851 parent 2e1aba6 commit 8744183
1 file changed
Lines changed: 18 additions & 1 deletion
Lines changed: 18 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
18 | | - | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
19 | 36 | | |
20 | 37 | | |
21 | 38 | | |
| |||
0 commit comments