You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Summary: Zero initialization is non standard with pytorch models, and in
particular with ET is frustrating because ET looks to greedily
deduplicate weights. That means if you zero initialize a transformer
model the pte size will be a lot smaller then you would expect if you
didnt know about the deduplication.
Differential Revision: D91518961
f"The provided checkpoint is missing the following weights that are expected by the model: {missing_weights}. Please fix the fqn's in your checkpoint to match."
253
+
)
254
+
ifunexpected:
255
+
ifself.verbose:
256
+
print(f"Unexpected keys: {unexpected}")
248
257
else:
249
-
print("Checkpoint not provided, defaulting weights to zeros.")
258
+
print("Checkpoint not provided, using default initialization.")
259
+
# Because we loaded onto meta device, it is annoying to now load onto cpu
260
+
# with the standard random initialization.
250
261
self.model_.to_empty(device="cpu")
251
-
# Need to provide concrete values for meta-initialized tensors for quantization.
f"The provided checkpoint is missing the following weights that are expected by the model: {missing_weights}. Please fix the fqn's in your checkpoint to match."
0 commit comments