Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 21 additions & 4 deletions s3_templates/mlops-github-actions/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ key=sagemaker value=true

In the above example, `aEXAMPLE-8aad-4d5d-8878-dfcab0bc441f` is the unique ID for this connection. We use this ID when we create our SageMaker project later in this example.

### 2. GitHub Personal Access Token
### 2. GitHub Personal Access Token (PAT)
Create a GitHub personal access token with access to **Contents** and **Actions** permissions, following the instructions on [Managing your personal access tokens](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens)

> Note: You can create either classic or fine-grained access token. However, make sure the token has access to the Contents and Actions (workflows, runs and artifacts) for that repository.
Expand All @@ -80,6 +80,18 @@ Create a GitHub personal access token with access to **Contents** and **Actions*
* ✅workflow(Update GitHub Action workflows) - Required
* Click "Generate token"

> ⚠️ **Note:** If your repo is part of an Organization, your PAT must be granted access to that Organization.

* Ensure your token has access to the organization with the required repository permissions (Actions, Contents, Metadata, Workflows).
![](./images/org-pta-1.png)

* Go to Organization Settings → Third-party Access → Personal access tokens and approve the token request.
![](./images/org-pta-2.png)

* Confirm your token appears as active before proceeding
![](./images/org-pta-3.png)


**then store it in AWS Secrets Manager.**

```bash
Expand Down Expand Up @@ -297,7 +309,7 @@ Add these secrets to your GitHub repository as follows:
To create a manual approval step in our deployment pipelines, we use a [GitHub environment](https://docs.github.com/en/actions/how-tos/deploy/configure-and-manage-deployments/manage-environments). Complete the following steps:
1. Go to your repository **Settings** > **Environments**
2. Create environment named `production`
3. Add required reviewers for deployment approval. These are the people who can approve model deployment to production environment
3. Add required reviewers for deployment approval (only available on public repo if using the free version). These are the people who can approve model deployment to production environment
![](./images/reviewer.png)

## Template Deployment
Expand Down Expand Up @@ -395,7 +407,8 @@ After creating the project:
```yaml
env:
AWS_REGION: <REGION> # Your AWS region
SAGEMAKER_PROJECT_NAME: your-project-name # Your project name
SAGEMAKER_PROJECT_NAME: <your-project-name> # Your project name
MLFLOW_TRACKING_APP_ARN: "" # Optional: ARN of SageMaker managed MLflow Tracking Server
```

2. **Test the Pipeline:**
Expand All @@ -408,6 +421,10 @@ After creating the project:
The template will then create two automated ModelOps workflows—one for model building and one for model deployment—that work together to provide CI/CD for your ML models.

![](./images/sagemaker_pipeline.png)

Pipeline experiments are automatically tracked in the SageMaker Managed MLflow App. You can view the experiment, individual step runs, metrics, datasets, and registered models.
![](./images/MLflow-1.png)
![](./images/MLflow-2.png)

## Clean up
After deployment, you will incur costs for the deployed resources. If you don’t intend to continue using the setup, delete the ModelOps project resources to avoid unnecessary charges.
Expand Down Expand Up @@ -446,4 +463,4 @@ In addition to deleting a project, which will remove and deprovision the SageMak

## License

This template is licensed under the MIT-0 License. See the LICENSE file for details.
This template is licensed under the MIT-0 License. See the LICENSE file for details.
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,8 @@
"Effect": "Allow",
"Action": [
"s3:CreateBucket",
"s3:PutObject"
"s3:PutObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::sagemaker-*"
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,8 @@ def lambda_handler(event, context):

# Getting repository and trigger the deploy GitHub workflow
try:
repo = g.get_user().get_repo(github_repo_name)
print("new lambda")
repo = g.get_repo(github_repo_name)
workflow = repo.get_workflow(github_workflow_name)
branch = repo.get_branch("main")
res = workflow.create_dispatch(branch)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,9 @@ on:
paths:
- pipelines/**
env:
AWS_REGION: us-west-2
SAGEMAKER_PROJECT_NAME: custom-build-deploy
AWS_REGION: <REGION> # AWS Region
SAGEMAKER_PROJECT_NAME: <YOUR_PROJECT_NAME> # Your SageMaker AI project name
MLFLOW_TRACKING_SERVER_ARN: "" # Optional: ARN of SageMaker managed MLflow Tracking Server

jobs:
Build:
Expand Down Expand Up @@ -49,4 +50,4 @@ jobs:
run-pipeline --module-name pipelines.abalone.pipeline \
--role-arn ${SAGEMAKER_PIPELINE_ROLE_ARN} \
--tags "[{\"Key\":\"sagemaker:project-name\", \"Value\":\"${SAGEMAKER_PROJECT_NAME}\"}, {\"Key\":\"sagemaker:project-id\", \"Value\":\"${SAGEMAKER_PROJECT_ID}\"}]" \
--kwargs "{\"region\":\"${AWS_REGION}\",\"sagemaker_project_arn\":\"${SAGEMAKER_PROJECT_ARN}\",\"role\":\"${SAGEMAKER_PIPELINE_ROLE_ARN}\",\"default_bucket\":\"${ARTIFACT_BUCKET}\",\"pipeline_name\":\"${SAGEMAKER_PROJECT_NAME_ID}\",\"model_package_group_name\":\"${SAGEMAKER_PROJECT_NAME_ID}\",\"base_job_prefix\":\"${SAGEMAKER_PROJECT_NAME_ID}\"}"
--kwargs "{\"region\":\"${AWS_REGION}\",\"sagemaker_project_arn\":\"${SAGEMAKER_PROJECT_ARN}\",\"role\":\"${SAGEMAKER_PIPELINE_ROLE_ARN}\",\"default_bucket\":\"${ARTIFACT_BUCKET}\",\"pipeline_name\":\"${SAGEMAKER_PROJECT_NAME_ID}\",\"model_package_group_name\":\"${SAGEMAKER_PROJECT_NAME_ID}\",\"base_job_prefix\":\"${SAGEMAKER_PROJECT_NAME_ID}\",\"mlflow_tracking_arn\":\"${{ env.MLFLOW_TRACKING_SERVER_ARN }}\"}"
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@ name: DeploySageMakerModel
on: workflow_dispatch

env:
AWS_REGION: us-west-2
SAGEMAKER_PROJECT_NAME: custom-build-deploy
AWS_REGION: <REGION> # AWS Region
SAGEMAKER_PROJECT_NAME: <YOUR_PROJECT_NAME> # Your SageMaker AI project name
EXPORT_TEMPLATE_STAGING_CONFIG: "staging-config-export.json"
EXPORT_TEMPLATE_PROD_CONFIG: "prod-config-export.json"

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,11 @@

if __name__ == "__main__":
logger.debug("Starting evaluation.")

# MLflow child run for evaluation
from mlflow_helper import setup_mlflow, end_mlflow
mlflow_enabled = setup_mlflow("EvaluateAbaloneModel")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping in line with MLflow best practices: You need to set the experiment and for parent and child run chaining, pass the parent run when calling the child run.
See a similar example here which can be adopted for new version.

    mlflow.set_tracking_uri(tracking_server_arn)
    mlflow.set_experiment(experiment_name)
    with mlflow.start_run(run_name=run_name) as run:
        run_id = run.info.run_id
        with mlflow.start_run(run_name="DataPreprocessing", nested=True):

Copy link
Copy Markdown
Contributor Author

@Christian-kam Christian-kam Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. The approach that you are referring to, works best with @step decorator way of creating pipeline since Data flows between steps can be directly return as values (not S3 properties). I have rewritten the example to use the @step decorator approach and submitted a new PR feat: MLflow Apps steps for experiment tracking


model_path = "/opt/ml/processing/model/model.tar.gz"
with tarfile.open(model_path) as tar:
tar.extractall(path=".")
Expand Down Expand Up @@ -50,10 +55,20 @@
},
}

# Log evaluation metrics to MLflow
if mlflow_enabled:
try:
import mlflow
mlflow.log_metrics({"mse": mse, "mse_std": std})
except Exception as e:
logger.warning("Failed to log evaluation metrics to MLflow: %s", e)

output_dir = "/opt/ml/processing/evaluation"
pathlib.Path(output_dir).mkdir(parents=True, exist_ok=True)

logger.info("Writing out evaluation report with mse: %f", mse)
evaluation_path = f"{output_dir}/evaluation.json"
with open(evaluation_path, "w") as f:
f.write(json.dumps(report_dict))

end_mlflow(mlflow_enabled)
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
"""Shared MLflow helper for pipeline steps.

Provides setup/teardown for MLflow runs. Each pipeline step creates its own
run under a shared experiment (the pipeline name). When MLFLOW_TRACKING_ARN
is not set, all functions are no-ops.
"""
import logging
import os

logger = logging.getLogger(__name__)


def _install_mlflow():
"""Install MLflow dependencies at runtime if not already available."""
try:
import mlflow # noqa: F401
return True
except ImportError:
pass
try:
import subprocess
import sys
subprocess.check_call(
[sys.executable, "-m", "pip", "install", "mlflow", "sagemaker-mlflow==0.2.0", "-q"]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to install MLflow packages when its already present in the requirements.txt

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I was using TrainingStep from sdk v3 and there is a bug with sagemaker.core.workflow.utilities.get_training_code_hash expects dependencies to be a list, but SourceCode.requirements is passed as a str see PR on the SDK v3

)
return True
except Exception as e:
logger.warning("Failed to install MLflow: %s", e)
return False


def setup_mlflow(step_name):
"""Set up MLflow tracking for a pipeline step.

Args:
step_name: Name for this run (e.g. "PreprocessAbaloneData")

Returns:
True if MLflow tracking is active, False otherwise.
"""
tracking_arn = os.environ.get("MLFLOW_TRACKING_ARN", "")
if not tracking_arn:
logger.info("MLFLOW_TRACKING_ARN not set. MLflow tracking disabled.")
return False

if not _install_mlflow():
return False

try:
import mlflow

mlflow.set_tracking_uri(tracking_arn)

experiment_name = os.environ.get("MLFLOW_EXPERIMENT_NAME", "Default")
mlflow.set_experiment(experiment_name)
mlflow.start_run(run_name=step_name)

logger.info("MLflow run started: %s (experiment=%s)", step_name, experiment_name)
return True
except Exception as e:
logger.warning("Failed to set up MLflow: %s. Continuing without tracking.", e)
return False


def end_mlflow(mlflow_enabled):
"""End the current MLflow run.

Args:
mlflow_enabled: Return value from setup_mlflow().
"""
if not mlflow_enabled:
return
try:
import mlflow
mlflow.end_run()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you start the mlflow run with a python loop it automatically terminates the mlflow run.
Thats a standard way to use MLflow and this code design is deviating from MLflow best-practices

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

except Exception as e:
logger.warning("Failed to end MLflow run: %s", e)
Loading