Describe the bug
I have a Keras model getting trained using an entry_point script and I am using the following pieces of code to store the model artifacts (in the entry_point script).
parser.add_argument('--model_dir', type=str, default=os.environ['SM_MODEL_DIR'])
args, _ = parser.parse_known_args()
model_dir = args.model_dir
---
tf.keras.models.save_model(
model,
os.path.join(model_dir, 'model/1'),
overwrite=True,
include_optimizer=True
)
Ideally, the model_dir should be opt/ml/model and Sagemaker should automatically move the contents of this folder to S3 as s3://<default_bucket>/<training_name>/output/model.tar.gz
When I run the estimator.fit({'training': training_input_path}), the training is successful, but the Cloudwatch logs show the following:
2020-09-16 02:49:12,458 sagemaker_tensorflow_container.training WARNING No model artifact is saved under the path /opt/ml/model. Your training job will not save any model files to S3.
Even then, Sagemaker does store my model artifacts, with the only difference being that instead of storing them in s3://<default_bucket>/<training_name>/output/model.tar.gz, they are now stored unzipped as s3://<default_bucket>/<training_name>/model/model/1/saved_model.pb along with the variables and assets folder. Because of this, estimator.deploy() call fails as it is unable to find the artifacts in the output/ directory in S3.
To reproduce
Estimator code:
from sagemaker.tensorflow import TensorFlow
tf_estimator = TensorFlow(entry_point='autoencoder-model.py',
role=role,
instance_count=1,
instance_type='ml.m5.large',
framework_version="2.3.0",
py_version="py37",
debugger_hook_config=False,
hyperparameters={'epochs': 20},
source_dir='/home/ec2-user/SageMaker/model',
subnets=['subnet-1', 'subnet-2'],
security_group_ids=['sg-1', 'sg-1'])
tf_estimator.fit({'training': training_input_path})
System information
A description of your system. Please provide:
- SageMaker Python SDK version: 2.6
- Framework name (eg. PyTorch) or algorithm (eg. KMeans): Tensorflow Keras
- Framework version: 2.3.0
- Python version: 3.7
- CPU or GPU: CPU
- Custom Docker image (Y/N): N
Additional context
Add any other context about the problem here.
Describe the bug
I have a Keras model getting trained using an entry_point script and I am using the following pieces of code to store the model artifacts (in the entry_point script).
Ideally, the model_dir should be
opt/ml/modeland Sagemaker should automatically move the contents of this folder to S3 ass3://<default_bucket>/<training_name>/output/model.tar.gzWhen I run the
estimator.fit({'training': training_input_path}), the training is successful, but the Cloudwatch logs show the following:Even then, Sagemaker does store my model artifacts, with the only difference being that instead of storing them in
s3://<default_bucket>/<training_name>/output/model.tar.gz, they are now stored unzipped ass3://<default_bucket>/<training_name>/model/model/1/saved_model.pbalong with thevariablesandassetsfolder. Because of this,estimator.deploy()call fails as it is unable to find the artifacts in theoutput/directory in S3.To reproduce
Estimator code:
System information
A description of your system. Please provide:
Additional context
Add any other context about the problem here.