Skip to content

Commit 8744183

Browse files
q10meta-codesync[bot]
authored andcommitted
Make sparse_type_int_to_dtype import resilient to stale frozen torch.package depots (#5942)
Summary: Pull Request resolved: #5942 Unblocks the ai_infra/model_processing Conveyor pipeline, whose "Contbuild Tracking (lego_flow)" node has been red since ~R1495 (last green R1494, 2026-06-13), blocking all push/promotion because the Test Success gate never passes. Root cause: a torch.package version-skew inside the frozen package that GMPP re-exports during offline/recurring publish. The GMPP publish path resolves interned fbgemm_gpu modules through an OrderedImporter that places the frozen module_factory_depot ahead of the live sys_importer (aiplatform/modelstore/publish/utils/torch_package.py). For the affected models, fbgemm_gpu/split_embedding_configs.py EXISTS in the depot archive and is captured STALE (predates D79869613, so it lacks sparse_type_int_to_dtype), while fbgemm_gpu/split_table_batched_embeddings_ops_training_common.py is ABSENT from the archive and falls through to sys_importer, captured FRESH from trunk where D107684316 ("Migrate fbgemm internals off the ops_common shim to leaf imports") made it do `from fbgemm_gpu.split_embedding_configs import sparse_type_int_to_dtype`. The fresh consumer + stale provider are co-packaged, so module load fails with: ImportError: cannot import name 'sparse_type_int_to_dtype' from '<torch_package_1>.fbgemm_gpu.split_embedding_configs' This forward-fix (no revert; keeps the leaf-import direction of D107684316) makes the single import site defensive: the leaf import remains the primary path on trunk, and a local TorchScript-compatible fallback (equivalent to the trunk definition / SparseType.from_int(ty).as_dtype()) is used only when this module is loaded alongside a stale split_embedding_configs.py inside a frozen package. sparse_type_int_to_dtype has exactly one consumer in the codebase (split_table_batched_embeddings_ops_training_common.py:204) and no external direct importers, so wrapping this one import site covers 100% of the symbol's usage. On trunk / freshly-frozen packages behavior is byte-identical (the try path always succeeds); the except path only executes inside the stale-depot population (2865/3473 checked-in depots). Follow-up (separate, non-urgent): the architectural root cause is the depot-ahead-of-sys_importer ordering in torch_package.py / model_packager.py; making co-located fbgemm modules always capture from a single consistent source would prevent future symbol skews. Tracked separately, owned by model_processing packaging. Reviewed By: sophielin508 Differential Revision: D109168966 fbshipit-source-id: cf46eeadd198aecf234e8e6ed366b4aa8e9ebc85
1 parent 2e1aba6 commit 8744183

1 file changed

Lines changed: 18 additions & 1 deletion

File tree

fbgemm_gpu/fbgemm_gpu/split_table_batched_embeddings_ops_training_common.py

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,24 @@
1515

1616
# @manual=//deeplearning/fbgemm/fbgemm_gpu/codegen:split_embedding_codegen_lookup_invokers
1717
import fbgemm_gpu.split_embedding_codegen_lookup_invokers as invokers
18-
from fbgemm_gpu.split_embedding_configs import sparse_type_int_to_dtype
18+
19+
try:
20+
from fbgemm_gpu.split_embedding_configs import sparse_type_int_to_dtype
21+
except ImportError:
22+
# Forward-compat shim for frozen torch.package depots whose stale, co-packaged
23+
# copy of split_embedding_configs.py predates D79869613 and therefore does not
24+
# export sparse_type_int_to_dtype. The leaf import above is the primary path on
25+
# trunk (post-D107684316); this fallback only triggers when this module is
26+
# captured fresh alongside a stale split_embedding_configs.py inside a frozen
27+
# package, keeping module load from failing with an ImportError. It simply
28+
# delegates to SparseType (which the stale module does export), so it stays
29+
# correct without re-encoding the SparseType -> dtype mapping.
30+
from fbgemm_gpu.split_embedding_configs import SparseType
31+
32+
def sparse_type_int_to_dtype(ty: int) -> torch.dtype:
33+
return SparseType.from_int(ty).as_dtype()
34+
35+
1936
from fbgemm_gpu.tbe.config.embedding_config import PoolingMode
2037

2138

0 commit comments

Comments
 (0)