Make sparse_type_int_to_dtype import resilient to stale frozen torch.package depots (#5942)

q10 · meta-codesync[bot] · commit 8744183e4bd7 · 2026-06-22T10:56:45.000-07:00
Summary: Pull Request resolved: #5942 Unblocks the ai_infra/model_processing Conveyor pipeline, whose "Contbuild Tracking (lego_flow)" node has been red since ~R1495 (last green R1494, 2026-06-13), blocking all push/promotion because the Test Success gate never passes. Root cause: a torch.package version-skew inside the frozen package that GMPP re-exports during offline/recurring publish. The GMPP publish path resolves interned fbgemm_gpu modules through an OrderedImporter that places the frozen module_factory_depot ahead of the live sys_importer (aiplatform/modelstore/publish/utils/torch_package.py). For the affected models, fbgemm_gpu/split_embedding_configs.py EXISTS in the depot archive and is captured STALE (predates D79869613, so it lacks sparse_type_int_to_dtype), while fbgemm_gpu/split_table_batched_embeddings_ops_training_common.py is ABSENT from the archive and falls through to sys_importer, captured FRESH from trunk where D107684316 ("Migrate fbgemm internals off the ops_common shim to leaf imports") made it do `from fbgemm_gpu.split_embedding_configs import sparse_type_int_to_dtype`. The fresh consumer + stale provider are co-packaged, so module load fails with: ImportError: cannot import name 'sparse_type_int_to_dtype' from '<torch_package_1>.fbgemm_gpu.split_embedding_configs' This forward-fix (no revert; keeps the leaf-import direction of D107684316) makes the single import site defensive: the leaf import remains the primary path on trunk, and a local TorchScript-compatible fallback (equivalent to the trunk definition / SparseType.from_int(ty).as_dtype()) is used only when this module is loaded alongside a stale split_embedding_configs.py inside a frozen package. sparse_type_int_to_dtype has exactly one consumer in the codebase (split_table_batched_embeddings_ops_training_common.py:204) and no external direct importers, so wrapping this one import site covers 100% of the symbol's usage. On trunk / freshly-frozen packages behavior is byte-identical (the try path always succeeds); the except path only executes inside the stale-depot population (2865/3473 checked-in depots). Follow-up (separate, non-urgent): the architectural root cause is the depot-ahead-of-sys_importer ordering in torch_package.py / model_packager.py; making co-located fbgemm modules always capture from a single consistent source would prevent future symbol skews. Tracked separately, owned by model_processing packaging. Reviewed By: sophielin508 Differential Revision: D109168966 fbshipit-source-id: cf46eeadd198aecf234e8e6ed366b4aa8e9ebc85
diff --git a/fbgemm_gpu/fbgemm_gpu/split_table_batched_embeddings_ops_training_common.py b/fbgemm_gpu/fbgemm_gpu/split_table_batched_embeddings_ops_training_common.py
@@ -15,7 +15,24 @@
 
 # @manual=//deeplearning/fbgemm/fbgemm_gpu/codegen:split_embedding_codegen_lookup_invokers
 import fbgemm_gpu.split_embedding_codegen_lookup_invokers as invokers
-from fbgemm_gpu.split_embedding_configs import sparse_type_int_to_dtype
+
+try:
+    from fbgemm_gpu.split_embedding_configs import sparse_type_int_to_dtype
+except ImportError:
+    # Forward-compat shim for frozen torch.package depots whose stale, co-packaged
+    # copy of split_embedding_configs.py predates D79869613 and therefore does not
+    # export sparse_type_int_to_dtype. The leaf import above is the primary path on
+    # trunk (post-D107684316); this fallback only triggers when this module is
+    # captured fresh alongside a stale split_embedding_configs.py inside a frozen
+    # package, keeping module load from failing with an ImportError. It simply
+    # delegates to SparseType (which the stale module does export), so it stays
+    # correct without re-encoding the SparseType -> dtype mapping.
+    from fbgemm_gpu.split_embedding_configs import SparseType
+
+    def sparse_type_int_to_dtype(ty: int) -> torch.dtype:
+        return SparseType.from_int(ty).as_dtype()
+
+
 from fbgemm_gpu.tbe.config.embedding_config import PoolingMode