diff --git a/.DS_Store b/.DS_Store
index 4fdf4b6..d6753f9 100644
Binary files a/.DS_Store and b/.DS_Store differ
diff --git a/.github/.DS_Store b/.github/.DS_Store
new file mode 100644
index 0000000..2ff9b28
Binary files /dev/null and b/.github/.DS_Store differ
diff --git a/mlops/.DS_Store b/mlops/.DS_Store
index 5f852bc..85b7cb0 100644
Binary files a/mlops/.DS_Store and b/mlops/.DS_Store differ
diff --git a/mlops/PROJECT_STATUS.md b/mlops/PROJECT_STATUS.md
new file mode 100644
index 0000000..68af1a9
--- /dev/null
+++ b/mlops/PROJECT_STATUS.md
@@ -0,0 +1,232 @@
+# PROJECT_STATUS.md
+
+## 1. Project snapshot
+
+- **Project name:** Digital Twin Resilience Model
+- **Project goal:** Develop a digital twin of a major streaming platform to simulate system failure, impact radius, and response time.
+- **Current objective:** Build toward simulation of the entitlements service using a graph-oriented approach on AWS.
+- **Current phase:** Repo and pipeline mechanics clarified. GitHub Actions and Terraform deploy the SageMaker pipeline definition and related infrastructure; pipeline execution is started separately and currently runs a stub synthetic-data workflow.
+
+- **Current status:**
+
+### Deployment
+Based on GitHub Actions, `terraform-plan.yml` and `terraform-apply.yml`:
+- generate the SageMaker pipeline definition
+- provision or update the SageMaker Pipeline resource
+- validate Terraform / infra changes
+
+`digital_twin_resilience/pipeline.py`
+- defines the SageMaker pipeline
+- generates `pipeline_definition.json`
+
+### Execution
+`start_pipeline.py`
+- starts a specific SageMaker pipeline execution
+- allows parameter overrides
+- triggers the registered pipeline in AWS
+
+The generated pipeline definition shows three steps:
+- `processor.py`
+- `train.py`
+- `evaluate.py`
+
+`processor.py`
+- generates synthetic data
+- populates train / validation / test outputs in S3
+
+`train.py`
+- builds a trivial baseline model from synthetic data
+
+`evaluate.py`
+- computes an evaluation output / trivial metric from the model
+
+### Verification
+`check_pipeline_execution.py`
+- asks SageMaker for overall pipeline execution status
+- lists step-level statuses and related job metadata
+
+### Important distinction
+- Deploying the pipeline is separate from executing it.
+- Current GitHub Actions deploy and update the pipeline definition and infrastructure.
+- Pipeline execution is started deliberately via `start_pipeline.py`.
+
+- **Immediate next step:** Define the minimum set of starter docs and begin filling them in, starting with continuity and framing docs.
+- **Biggest current blockers / gaps:**
+ - Input data contract is not yet defined
+ - Service graph schema is not yet defined
+ - Prediction target is not yet defined
+ - Definition of "good" model output is not yet defined
+ - It is not yet decided whether the first baseline should be graph ML or something simpler
+
+---
+
+## 2. Working understanding of the repo
+
+This section is not a replacement for `repo_skeleton.yml`. It is a quick orientation note describing how the repo is currently understood.
+
+### Repo orientation
+
+- `.github/workflows/`
+ - GitHub Actions workflows for Terraform plan/apply and deployment-oriented automation
+ - current understanding: deploys pipeline definition and infra, but does not execute the pipeline or run Python tests
+
+- `terraform/`
+ - infrastructure code for AWS resources and SageMaker pipeline registration
+ - `envs/dev/` contains environment-specific wiring
+ - `modules/` contains reusable pieces such as S3, IAM, and SageMaker pipeline setup
+
+- `mlops/pipelines/digital_twin_resilience/`
+ - core pipeline orchestration area
+ - `pipeline.py` defines the SageMaker pipeline and generates `pipeline_definition.json`
+ - `start_pipeline.py` starts a pipeline execution
+ - `check_pipeline_execution.py` checks execution status
+ - `steps/processing/`, `steps/training/`, and `steps/evaluation/` contain the step logic executed by SageMaker
+
+- `data/synthetic/`
+ - synthetic data support for the current stub workflow
+
+- `tests/`
+ - test area exists, but CI usage has not yet been confirmed in this document
+
+- `README.md`
+ - high-level explanation of repo purpose and structure
+
+### Current understanding
+- Deployment and execution are separate concerns
+- GitHub Actions currently appear focused on deployment and Terraform validation
+- Pipeline execution is started deliberately, not automatically from Terraform apply
+- The current pipeline appears to be a stub synthetic processing/training/evaluation flow
+
+### Key files for current understanding
+
+The following files are currently the most relevant for understanding pipeline definition, execution, and verification:
+
+- `pipeline.py`
+- `start_pipeline.py`
+- `check_pipeline_execution.py`
+- `steps/processing/processor.py`
+- `steps/training/train.py`
+- `steps/evaluation/evaluate.py`
+
+Additional files such as `parse_request.py`, `request_schema.py`, and `create_pipeline.py` are likely important next, but have not yet been examined in detail in this document.
+
+---
+
+## 3. Current working decisions
+
+- Deployment and execution are separate concerns.
+- GitHub Actions currently handle pipeline-definition generation and Terraform plan/apply.
+- Current GitHub Actions do not appear to start pipeline execution or run Python tests.
+- `pipeline.py` generates the SageMaker pipeline definition and writes `pipeline_definition.json`.
+- `start_pipeline.py` deliberately starts a SageMaker pipeline execution.
+- `check_pipeline_execution.py` checks overall execution status and step-level status through SageMaker APIs.
+- The current registered pipeline executes three step scripts: `processor.py`, `train.py`, and `evaluate.py`.
+- Early work should focus on framing, contracts, scope, and evaluation before sophisticated model choices.
+
+---
+
+## 4. Open questions
+
+### Core problem / model questions
+- What exact decision is the system supposed to support first?
+- What is the narrow REV1 scope?
+- What is the first prediction target?
+- What would count as a useful model output?
+- What is the simplest credible baseline for REV1: graph-based, heuristic, tabular, or other?
+
+### Data / entity questions
+- What are the core entities?
+- What node and edge types belong in the first service graph?
+- What data sources are expected to be available?
+- What minimum fields are required to support the first end-to-end run?
+- What synthetic substitutes are acceptable early on?
+
+### Evaluation questions
+- How will success be measured for REV1?
+- What does "decision-useful" mean in practice?
+- What outputs should `evaluate.py` emit?
+- What evidence would justify continuing to the next phase?
+
+### Repo / process questions
+- Which starter doc should be written next?
+- What should be treated as current truth vs placeholder?
+- What is the first code file that should be tightened?
+
+---
+
+## 5. Recommended starter docs from this session
+
+These were identified as the most useful starter docs.
+
+### A. Problem framing doc
+Should answer:
+- What problem are we solving?
+- Who is the decision-maker?
+- What is REV1 trying to prove?
+- What is explicitly out of scope?
+
+### B. Feasibility questions / hypotheses doc
+Should answer:
+- What are the major unknowns?
+- What do we believe right now?
+- What evidence would support or weaken each hypothesis?
+
+### C. REV1 scope and success criteria doc
+Should answer:
+- What are we building now?
+- What are we not building?
+- What must be demonstrated?
+- What would count as failure or a stop condition?
+
+### D. Data and entity contract doc
+Should answer:
+- What are the main entities?
+- How do they relate?
+- What data do we expect?
+- What quality risks exist?
+
+### E. Repo/runbook doc
+Should answer:
+- How is the repo organized?
+- How does the flow run?
+- What is implemented vs placeholder?
+- How should someone orient themselves quickly?
+
+### Note
+This `PROJECT_STATUS.md` is not a replacement for those docs. It is the continuity layer that points to them and tracks what is missing.
+
+---
+
+## 6. Guidance agreed in this session
+
+### What not to do
+- Do not begin by locking in sophisticated model architecture
+- Do not let the repo skeleton create false confidence
+- Do not use a polished solution architecture doc as the first anchor
+- Do not hide unresolved questions under implementation detail
+
+### What to do first
+- Clarify the project/problem framing
+- Make the major unknowns explicit
+- Define REV1 scope and success criteria
+- Build continuity documentation that preserves momentum
+- Use this file to keep current status, decisions, open questions, and next actions visible
+
+---
+
+## 7. Next actions
+
+- [ ] Create a first draft of the problem framing doc
+- [ ] Create a first draft of the feasibility questions / hypotheses doc
+- [ ] Create a first draft of the REV1 scope and success criteria doc
+- [ ] Identify the most important data/entity questions for the first pass
+- [ ] Decide which current repo file should be examined first for concrete changes
+
+---
+
+## 8. Change log
+
+### Session-created initial version
+- Created the first session-only continuity draft of `PROJECT_STATUS.md`
+- Purpose: establish a resumable project memory file and expose missing information clearly
+- Constraint: uses only information discussed in this session
\ No newline at end of file
diff --git a/mlops/README.md b/mlops/README.md
index a275ed3..4199a81 100644
--- a/mlops/README.md
+++ b/mlops/README.md
@@ -1,46 +1,50 @@
# SageMaker Pipeline Feasibility PoC
+
## Description of directory tree elements
- .github/workflows/
+**.github/workflows/**
+
This is CI/CD only. It is not ML logic. GitHub Actions can authenticate to AWS via OIDC instead of long-lived secrets, which is the cleaner enterprise pattern.
-
-- terraform-plan.yml: runs fmt/validate/plan on PRs
-- terraform-apply.yml: applies approved infra changes to dev, maybe later prod
-
-infra/terraform/
+- **terraform-plan.yml**: runs fmt/validate/plan on PRs
+- **terraform-apply.yml**: applies approved infra changes to dev, maybe later prod
+
+**infra/terraform/**
+
This is infrastructure only.
-
-- envs/dev/: environment-specific wiring
-- modules/s3/: buckets for raw, processed, model artifacts, evaluation outputs
-- modules/iam/: execution roles and policies
-- modules/sagemaker_pipeline/: Terraform resource for the SageMaker Pipeline
-
-Terraform has an aws_sagemaker_pipeline resource, so using Terraform for the pipeline object itself is a legitimate pattern, not a workaround.
-
-pipelines/digital_twin_resilience/
+
+- **envs/dev/**: environment-specific wiring
+- **modules/s3/**: buckets for raw, processed, model artifacts, evaluation outputs
+- **modules/iam/**: execution roles and policies
+- **modules/sagemaker_pipeline/**: Terraform resource for the SageMaker Pipeline
+
+ Terraform has an aws_sagemaker_pipeline resource, so using Terraform for the pipeline object itself is a legitimate pattern, not a workaround.
+
+
+
+**pipelines/digital_twin_resilience/**
+
This is the ML workflow definition.
-
-- pipeline.py: defines the SageMaker Pipeline DAG
-- config.py: pipeline parameters and defaults
-- steps/processing/processor.py: builds datasets or synthetic inputs
-- steps/training/train.py: trains a trivial baseline model first
-- steps/evaluation/evaluate.py: computes metrics and emits a JSON report
-- utils/: shared helpers
-
-SageMaker Pipelines is a DAG of interconnected steps, and AWS explicitly supports Processing and Training steps in the pipeline definition.
-
-data/synthetic/
+
+- **pipeline.py**: defines the SageMaker Pipeline DAG
+- **config.py**: pipeline parameters and defaults
+- **steps/processing/processor.py**: builds datasets or synthetic inputs
+- **steps/training/train.py**: trains a trivial baseline model first
+- **steps/evaluation/evaluate.py**: computes metrics and emits a JSON report
+- **utils/**: shared helpers
+
+ SageMaker Pipelines is a DAG of interconnected steps, and AWS explicitly supports Processing and Training steps in the pipeline definition.
+
+**data/synthetic/**
This is discovery-sprint fuel.
-
-- generate fake telemetry
-- define a graph-ish structure if needed
-- keep it tiny and boring
-
-
-tests/
-
-- test_pipeline_compile.py: proves the pipeline definition compiles
-- test_smoke_synthetic.py: one tiny end-to-end synthetic run
-
\ No newline at end of file
+
+- generate fake telemetry
+- define a graph-ish structure if needed
+- keep it tiny and boring
+
+**tests/**
+
+- **test_pipeline_compile.py**: proves the pipeline definition compiles
+- **test_smoke_synthetic.py**: one tiny end-to-end synthetic run
+
diff --git a/mlops/pipelines/.DS_Store b/mlops/pipelines/.DS_Store
index f303c20..4dec090 100644
Binary files a/mlops/pipelines/.DS_Store and b/mlops/pipelines/.DS_Store differ
diff --git a/mlops/pipelines/digital_twin_resilience/pipeline.py b/mlops/pipelines/digital_twin_resilience/pipeline.py
index 4e7621e..e52ef17 100644
--- a/mlops/pipelines/digital_twin_resilience/pipeline.py
+++ b/mlops/pipelines/digital_twin_resilience/pipeline.py
@@ -1,3 +1,4 @@
+import json
import os
from pathlib import Path
@@ -299,6 +300,8 @@ def get_pipeline(
definition = pipeline.definition()
out_path = Path(__file__).resolve().parent / "pipeline_definition.json"
- out_path.write_text(definition)
+ with out_path.open("w", encoding="utf-8") as f:
+ json.dump(json.loads(definition), f, indent=2, sort_keys=False)
+ f.write("\n")
print(f"Wrote pipeline definition to {out_path}")
\ No newline at end of file
diff --git a/mlops/pipelines/digital_twin_resilience/pipeline_definition.json b/mlops/pipelines/digital_twin_resilience/pipeline_definition.json
index 26f4888..f034c57 100644
--- a/mlops/pipelines/digital_twin_resilience/pipeline_definition.json
+++ b/mlops/pipelines/digital_twin_resilience/pipeline_definition.json
@@ -1 +1,368 @@
-{"Version": "2020-12-01", "Metadata": {}, "Parameters": [{"Name": "InputDataUri", "Type": "String", "DefaultValue": "s3://dougdaly-mlops-poc-input-dev/synthetic/raw/"}, {"Name": "RequestConfigUri", "Type": "String", "DefaultValue": "s3://dougdaly-mlops-poc-input-dev/requests/request.json"}, {"Name": "ProcessingInstanceType", "Type": "String", "DefaultValue": "ml.t3.medium"}, {"Name": "TrainingInstanceType", "Type": "String", "DefaultValue": "ml.t3.medium"}, {"Name": "EvaluationInstanceType", "Type": "String", "DefaultValue": "ml.t3.medium"}], "PipelineExperimentConfig": {"ExperimentName": {"Get": "Execution.PipelineName"}, "TrialName": {"Get": "Execution.PipelineExecutionId"}}, "Steps": [{"Name": "ProcessSyntheticTelemetry", "Type": "Processing", "Arguments": {"ProcessingResources": {"ClusterConfig": {"InstanceType": {"Get": "Parameters.ProcessingInstanceType"}, "InstanceCount": 1, "VolumeSizeInGB": 30}}, "AppSpecification": {"ImageUri": "246618743249.dkr.ecr.us-west-2.amazonaws.com/sagemaker-scikit-learn:1.2-1-cpu-py3", "ContainerEntrypoint": ["python3", "/opt/ml/processing/input/code/processor.py"]}, "RoleArn": "arn:aws:iam::159535637196:role/SageMakerExecutionRole-mlops", "ProcessingInputs": [{"InputName": "input-1", "AppManaged": false, "S3Input": {"S3Uri": {"Get": "Parameters.InputDataUri"}, "LocalPath": "/opt/ml/processing/input", "S3DataType": "S3Prefix", "S3InputMode": "File", "S3DataDistributionType": "FullyReplicated", "S3CompressionType": "None"}}, {"InputName": "input-2", "AppManaged": false, "S3Input": {"S3Uri": {"Get": "Parameters.RequestConfigUri"}, "LocalPath": "/opt/ml/processing/config", "S3DataType": "S3Prefix", "S3InputMode": "File", "S3DataDistributionType": "FullyReplicated", "S3CompressionType": "None"}}, {"InputName": "code", "AppManaged": false, "S3Input": {"S3Uri": "s3://dougdaly-mlops-poc-output-dev/sagemaker-scikit-learn-2026-03-26-15-41-58-624/input/code/processor.py", "LocalPath": "/opt/ml/processing/input/code", "S3DataType": "S3Prefix", "S3InputMode": "File", "S3DataDistributionType": "FullyReplicated", "S3CompressionType": "None"}}], "ProcessingOutputConfig": {"Outputs": [{"OutputName": "train", "AppManaged": false, "S3Output": {"S3Uri": {"Std:Join": {"On": "/", "Values": ["s3:/", "dougdaly-mlops-poc-output-dev", "digital-twin-resilience-dev-pipeline", {"Get": "Execution.PipelineExecutionId"}, "ProcessSyntheticTelemetry", "output", "train"]}}, "LocalPath": "/opt/ml/processing/output/train", "S3UploadMode": "EndOfJob"}}, {"OutputName": "validation", "AppManaged": false, "S3Output": {"S3Uri": {"Std:Join": {"On": "/", "Values": ["s3:/", "dougdaly-mlops-poc-output-dev", "digital-twin-resilience-dev-pipeline", {"Get": "Execution.PipelineExecutionId"}, "ProcessSyntheticTelemetry", "output", "validation"]}}, "LocalPath": "/opt/ml/processing/output/validation", "S3UploadMode": "EndOfJob"}}, {"OutputName": "test", "AppManaged": false, "S3Output": {"S3Uri": {"Std:Join": {"On": "/", "Values": ["s3:/", "dougdaly-mlops-poc-output-dev", "digital-twin-resilience-dev-pipeline", {"Get": "Execution.PipelineExecutionId"}, "ProcessSyntheticTelemetry", "output", "test"]}}, "LocalPath": "/opt/ml/processing/output/test", "S3UploadMode": "EndOfJob"}}]}}}, {"Name": "TrainBaselineModel", "Type": "Processing", "Arguments": {"ProcessingResources": {"ClusterConfig": {"InstanceType": {"Get": "Parameters.TrainingInstanceType"}, "InstanceCount": 1, "VolumeSizeInGB": 30}}, "AppSpecification": {"ImageUri": "246618743249.dkr.ecr.us-west-2.amazonaws.com/sagemaker-scikit-learn:1.2-1-cpu-py3", "ContainerEntrypoint": ["python3", "/opt/ml/processing/input/code/train.py"]}, "RoleArn": "arn:aws:iam::159535637196:role/SageMakerExecutionRole-mlops", "ProcessingInputs": [{"InputName": "input-1", "AppManaged": false, "S3Input": {"S3Uri": {"Get": "Steps.ProcessSyntheticTelemetry.ProcessingOutputConfig.Outputs['train'].S3Output.S3Uri"}, "LocalPath": "/opt/ml/processing/train", "S3DataType": "S3Prefix", "S3InputMode": "File", "S3DataDistributionType": "FullyReplicated", "S3CompressionType": "None"}}, {"InputName": "input-2", "AppManaged": false, "S3Input": {"S3Uri": {"Get": "Steps.ProcessSyntheticTelemetry.ProcessingOutputConfig.Outputs['validation'].S3Output.S3Uri"}, "LocalPath": "/opt/ml/processing/validation", "S3DataType": "S3Prefix", "S3InputMode": "File", "S3DataDistributionType": "FullyReplicated", "S3CompressionType": "None"}}, {"InputName": "code", "AppManaged": false, "S3Input": {"S3Uri": "s3://dougdaly-mlops-poc-output-dev/sagemaker-scikit-learn-2026-03-26-15-41-58-843/input/code/train.py", "LocalPath": "/opt/ml/processing/input/code", "S3DataType": "S3Prefix", "S3InputMode": "File", "S3DataDistributionType": "FullyReplicated", "S3CompressionType": "None"}}], "ProcessingOutputConfig": {"Outputs": [{"OutputName": "model", "AppManaged": false, "S3Output": {"S3Uri": {"Std:Join": {"On": "/", "Values": ["s3:/", "dougdaly-mlops-poc-output-dev", "digital-twin-resilience-dev-pipeline", {"Get": "Execution.PipelineExecutionId"}, "TrainBaselineModel", "output", "model"]}}, "LocalPath": "/opt/ml/processing/model", "S3UploadMode": "EndOfJob"}}]}}}, {"Name": "EvaluateModel", "Type": "Processing", "Arguments": {"ProcessingResources": {"ClusterConfig": {"InstanceType": {"Get": "Parameters.EvaluationInstanceType"}, "InstanceCount": 1, "VolumeSizeInGB": 30}}, "AppSpecification": {"ImageUri": "246618743249.dkr.ecr.us-west-2.amazonaws.com/sagemaker-scikit-learn:1.2-1-cpu-py3", "ContainerEntrypoint": ["python3", "/opt/ml/processing/input/code/evaluate.py"]}, "RoleArn": "arn:aws:iam::159535637196:role/SageMakerExecutionRole-mlops", "ProcessingInputs": [{"InputName": "input-1", "AppManaged": false, "S3Input": {"S3Uri": {"Get": "Steps.TrainBaselineModel.ProcessingOutputConfig.Outputs['model'].S3Output.S3Uri"}, "LocalPath": "/opt/ml/processing/model", "S3DataType": "S3Prefix", "S3InputMode": "File", "S3DataDistributionType": "FullyReplicated", "S3CompressionType": "None"}}, {"InputName": "input-2", "AppManaged": false, "S3Input": {"S3Uri": {"Get": "Steps.ProcessSyntheticTelemetry.ProcessingOutputConfig.Outputs['test'].S3Output.S3Uri"}, "LocalPath": "/opt/ml/processing/test", "S3DataType": "S3Prefix", "S3InputMode": "File", "S3DataDistributionType": "FullyReplicated", "S3CompressionType": "None"}}, {"InputName": "code", "AppManaged": false, "S3Input": {"S3Uri": "s3://dougdaly-mlops-poc-output-dev/sagemaker-scikit-learn-2026-03-26-15-41-58-899/input/code/evaluate.py", "LocalPath": "/opt/ml/processing/input/code", "S3DataType": "S3Prefix", "S3InputMode": "File", "S3DataDistributionType": "FullyReplicated", "S3CompressionType": "None"}}], "ProcessingOutputConfig": {"Outputs": [{"OutputName": "evaluation", "AppManaged": false, "S3Output": {"S3Uri": {"Std:Join": {"On": "/", "Values": ["s3:/", "dougdaly-mlops-poc-output-dev", "digital-twin-resilience-dev-pipeline", {"Get": "Execution.PipelineExecutionId"}, "EvaluateModel", "output", "evaluation"]}}, "LocalPath": "/opt/ml/processing/evaluation", "S3UploadMode": "EndOfJob"}}]}}}]}
\ No newline at end of file
+{
+ "Version": "2020-12-01",
+ "Metadata": {},
+ "Parameters": [
+ {
+ "Name": "InputDataUri",
+ "Type": "String",
+ "DefaultValue": "s3://dougdaly-mlops-poc-input-dev/synthetic/raw/"
+ },
+ {
+ "Name": "RequestConfigUri",
+ "Type": "String",
+ "DefaultValue": "s3://dougdaly-mlops-poc-input-dev/requests/request.json"
+ },
+ {
+ "Name": "ProcessingInstanceType",
+ "Type": "String",
+ "DefaultValue": "ml.t3.medium"
+ },
+ {
+ "Name": "TrainingInstanceType",
+ "Type": "String",
+ "DefaultValue": "ml.t3.medium"
+ },
+ {
+ "Name": "EvaluationInstanceType",
+ "Type": "String",
+ "DefaultValue": "ml.t3.medium"
+ }
+ ],
+ "PipelineExperimentConfig": {
+ "ExperimentName": {
+ "Get": "Execution.PipelineName"
+ },
+ "TrialName": {
+ "Get": "Execution.PipelineExecutionId"
+ }
+ },
+ "Steps": [
+ {
+ "Name": "ProcessSyntheticTelemetry",
+ "Type": "Processing",
+ "Arguments": {
+ "ProcessingResources": {
+ "ClusterConfig": {
+ "InstanceType": {
+ "Get": "Parameters.ProcessingInstanceType"
+ },
+ "InstanceCount": 1,
+ "VolumeSizeInGB": 30
+ }
+ },
+ "AppSpecification": {
+ "ImageUri": "246618743249.dkr.ecr.us-west-2.amazonaws.com/sagemaker-scikit-learn:1.2-1-cpu-py3",
+ "ContainerEntrypoint": [
+ "python3",
+ "/opt/ml/processing/input/code/processor.py"
+ ]
+ },
+ "RoleArn": "arn:aws:iam::159535637196:role/SageMakerExecutionRole-mlops",
+ "ProcessingInputs": [
+ {
+ "InputName": "input-1",
+ "AppManaged": false,
+ "S3Input": {
+ "S3Uri": {
+ "Get": "Parameters.InputDataUri"
+ },
+ "LocalPath": "/opt/ml/processing/input",
+ "S3DataType": "S3Prefix",
+ "S3InputMode": "File",
+ "S3DataDistributionType": "FullyReplicated",
+ "S3CompressionType": "None"
+ }
+ },
+ {
+ "InputName": "input-2",
+ "AppManaged": false,
+ "S3Input": {
+ "S3Uri": {
+ "Get": "Parameters.RequestConfigUri"
+ },
+ "LocalPath": "/opt/ml/processing/config",
+ "S3DataType": "S3Prefix",
+ "S3InputMode": "File",
+ "S3DataDistributionType": "FullyReplicated",
+ "S3CompressionType": "None"
+ }
+ },
+ {
+ "InputName": "code",
+ "AppManaged": false,
+ "S3Input": {
+ "S3Uri": "s3://dougdaly-mlops-poc-output-dev/sagemaker-scikit-learn-2026-03-31-18-55-49-990/input/code/processor.py",
+ "LocalPath": "/opt/ml/processing/input/code",
+ "S3DataType": "S3Prefix",
+ "S3InputMode": "File",
+ "S3DataDistributionType": "FullyReplicated",
+ "S3CompressionType": "None"
+ }
+ }
+ ],
+ "ProcessingOutputConfig": {
+ "Outputs": [
+ {
+ "OutputName": "train",
+ "AppManaged": false,
+ "S3Output": {
+ "S3Uri": {
+ "Std:Join": {
+ "On": "/",
+ "Values": [
+ "s3:/",
+ "dougdaly-mlops-poc-output-dev",
+ "digital-twin-resilience-dev-pipeline",
+ {
+ "Get": "Execution.PipelineExecutionId"
+ },
+ "ProcessSyntheticTelemetry",
+ "output",
+ "train"
+ ]
+ }
+ },
+ "LocalPath": "/opt/ml/processing/output/train",
+ "S3UploadMode": "EndOfJob"
+ }
+ },
+ {
+ "OutputName": "validation",
+ "AppManaged": false,
+ "S3Output": {
+ "S3Uri": {
+ "Std:Join": {
+ "On": "/",
+ "Values": [
+ "s3:/",
+ "dougdaly-mlops-poc-output-dev",
+ "digital-twin-resilience-dev-pipeline",
+ {
+ "Get": "Execution.PipelineExecutionId"
+ },
+ "ProcessSyntheticTelemetry",
+ "output",
+ "validation"
+ ]
+ }
+ },
+ "LocalPath": "/opt/ml/processing/output/validation",
+ "S3UploadMode": "EndOfJob"
+ }
+ },
+ {
+ "OutputName": "test",
+ "AppManaged": false,
+ "S3Output": {
+ "S3Uri": {
+ "Std:Join": {
+ "On": "/",
+ "Values": [
+ "s3:/",
+ "dougdaly-mlops-poc-output-dev",
+ "digital-twin-resilience-dev-pipeline",
+ {
+ "Get": "Execution.PipelineExecutionId"
+ },
+ "ProcessSyntheticTelemetry",
+ "output",
+ "test"
+ ]
+ }
+ },
+ "LocalPath": "/opt/ml/processing/output/test",
+ "S3UploadMode": "EndOfJob"
+ }
+ }
+ ]
+ }
+ }
+ },
+ {
+ "Name": "TrainBaselineModel",
+ "Type": "Processing",
+ "Arguments": {
+ "ProcessingResources": {
+ "ClusterConfig": {
+ "InstanceType": {
+ "Get": "Parameters.TrainingInstanceType"
+ },
+ "InstanceCount": 1,
+ "VolumeSizeInGB": 30
+ }
+ },
+ "AppSpecification": {
+ "ImageUri": "246618743249.dkr.ecr.us-west-2.amazonaws.com/sagemaker-scikit-learn:1.2-1-cpu-py3",
+ "ContainerEntrypoint": [
+ "python3",
+ "/opt/ml/processing/input/code/train.py"
+ ]
+ },
+ "RoleArn": "arn:aws:iam::159535637196:role/SageMakerExecutionRole-mlops",
+ "ProcessingInputs": [
+ {
+ "InputName": "input-1",
+ "AppManaged": false,
+ "S3Input": {
+ "S3Uri": {
+ "Get": "Steps.ProcessSyntheticTelemetry.ProcessingOutputConfig.Outputs['train'].S3Output.S3Uri"
+ },
+ "LocalPath": "/opt/ml/processing/train",
+ "S3DataType": "S3Prefix",
+ "S3InputMode": "File",
+ "S3DataDistributionType": "FullyReplicated",
+ "S3CompressionType": "None"
+ }
+ },
+ {
+ "InputName": "input-2",
+ "AppManaged": false,
+ "S3Input": {
+ "S3Uri": {
+ "Get": "Steps.ProcessSyntheticTelemetry.ProcessingOutputConfig.Outputs['validation'].S3Output.S3Uri"
+ },
+ "LocalPath": "/opt/ml/processing/validation",
+ "S3DataType": "S3Prefix",
+ "S3InputMode": "File",
+ "S3DataDistributionType": "FullyReplicated",
+ "S3CompressionType": "None"
+ }
+ },
+ {
+ "InputName": "code",
+ "AppManaged": false,
+ "S3Input": {
+ "S3Uri": "s3://dougdaly-mlops-poc-output-dev/sagemaker-scikit-learn-2026-03-31-18-55-50-253/input/code/train.py",
+ "LocalPath": "/opt/ml/processing/input/code",
+ "S3DataType": "S3Prefix",
+ "S3InputMode": "File",
+ "S3DataDistributionType": "FullyReplicated",
+ "S3CompressionType": "None"
+ }
+ }
+ ],
+ "ProcessingOutputConfig": {
+ "Outputs": [
+ {
+ "OutputName": "model",
+ "AppManaged": false,
+ "S3Output": {
+ "S3Uri": {
+ "Std:Join": {
+ "On": "/",
+ "Values": [
+ "s3:/",
+ "dougdaly-mlops-poc-output-dev",
+ "digital-twin-resilience-dev-pipeline",
+ {
+ "Get": "Execution.PipelineExecutionId"
+ },
+ "TrainBaselineModel",
+ "output",
+ "model"
+ ]
+ }
+ },
+ "LocalPath": "/opt/ml/processing/model",
+ "S3UploadMode": "EndOfJob"
+ }
+ }
+ ]
+ }
+ }
+ },
+ {
+ "Name": "EvaluateModel",
+ "Type": "Processing",
+ "Arguments": {
+ "ProcessingResources": {
+ "ClusterConfig": {
+ "InstanceType": {
+ "Get": "Parameters.EvaluationInstanceType"
+ },
+ "InstanceCount": 1,
+ "VolumeSizeInGB": 30
+ }
+ },
+ "AppSpecification": {
+ "ImageUri": "246618743249.dkr.ecr.us-west-2.amazonaws.com/sagemaker-scikit-learn:1.2-1-cpu-py3",
+ "ContainerEntrypoint": [
+ "python3",
+ "/opt/ml/processing/input/code/evaluate.py"
+ ]
+ },
+ "RoleArn": "arn:aws:iam::159535637196:role/SageMakerExecutionRole-mlops",
+ "ProcessingInputs": [
+ {
+ "InputName": "input-1",
+ "AppManaged": false,
+ "S3Input": {
+ "S3Uri": {
+ "Get": "Steps.TrainBaselineModel.ProcessingOutputConfig.Outputs['model'].S3Output.S3Uri"
+ },
+ "LocalPath": "/opt/ml/processing/model",
+ "S3DataType": "S3Prefix",
+ "S3InputMode": "File",
+ "S3DataDistributionType": "FullyReplicated",
+ "S3CompressionType": "None"
+ }
+ },
+ {
+ "InputName": "input-2",
+ "AppManaged": false,
+ "S3Input": {
+ "S3Uri": {
+ "Get": "Steps.ProcessSyntheticTelemetry.ProcessingOutputConfig.Outputs['test'].S3Output.S3Uri"
+ },
+ "LocalPath": "/opt/ml/processing/test",
+ "S3DataType": "S3Prefix",
+ "S3InputMode": "File",
+ "S3DataDistributionType": "FullyReplicated",
+ "S3CompressionType": "None"
+ }
+ },
+ {
+ "InputName": "code",
+ "AppManaged": false,
+ "S3Input": {
+ "S3Uri": "s3://dougdaly-mlops-poc-output-dev/sagemaker-scikit-learn-2026-03-31-18-55-50-315/input/code/evaluate.py",
+ "LocalPath": "/opt/ml/processing/input/code",
+ "S3DataType": "S3Prefix",
+ "S3InputMode": "File",
+ "S3DataDistributionType": "FullyReplicated",
+ "S3CompressionType": "None"
+ }
+ }
+ ],
+ "ProcessingOutputConfig": {
+ "Outputs": [
+ {
+ "OutputName": "evaluation",
+ "AppManaged": false,
+ "S3Output": {
+ "S3Uri": {
+ "Std:Join": {
+ "On": "/",
+ "Values": [
+ "s3:/",
+ "dougdaly-mlops-poc-output-dev",
+ "digital-twin-resilience-dev-pipeline",
+ {
+ "Get": "Execution.PipelineExecutionId"
+ },
+ "EvaluateModel",
+ "output",
+ "evaluation"
+ ]
+ }
+ },
+ "LocalPath": "/opt/ml/processing/evaluation",
+ "S3UploadMode": "EndOfJob"
+ }
+ }
+ ]
+ }
+ }
+ }
+ ]
+}
diff --git a/mlops/pipelines/digital_twin_resilience/steps/.DS_Store b/mlops/pipelines/digital_twin_resilience/steps/.DS_Store
index dec24de..f32f232 100644
Binary files a/mlops/pipelines/digital_twin_resilience/steps/.DS_Store and b/mlops/pipelines/digital_twin_resilience/steps/.DS_Store differ
diff --git a/mlops/repo_skeleton.yml b/mlops/repo_skeleton.yml
index 374bc28..41e2a30 100644
--- a/mlops/repo_skeleton.yml
+++ b/mlops/repo_skeleton.yml
@@ -3,77 +3,65 @@ repo/
workflows/
terraform-plan.yml
terraform-apply.yml
-
- docs/
- discovery-one-pager.md
- architecture-notes.md
-
- infra/
- terraform/
- envs/
- dev/
- main.tf
- variables.tf
- outputs.tf
- backend.tf
- terraform.tfvars
- modules/
- s3/
- main.tf
- variables.tf
- outputs.tf
- iam/
- main.tf
- variables.tf
- outputs.tf
- sagemaker_pipeline/
- main.tf
- variables.tf
- outputs.tf
-
- pipelines/
- digital_twin_resilience/
- pipeline.py
- config.py
- requirements.txt
- steps/
- processing/
- processor.py
- requirements.txt
- training/
- train.py
- requirements.txt
- evaluation/
- evaluate.py
- requirements.txt
- utils/
- io_utils.py
- metrics.py
- schemas.py
-
- containers/
- processing/
- Dockerfile
- requirements.txt
- training/
- Dockerfile
- requirements.txt
- evaluation/
- Dockerfile
- requirements.txt
-
+ terraform/
+ envs/
+ dev/
+ backend.tf
+ main.tf
+ outputs.tf
+ terraform.tfstate
+ terraform.tfvars
+ variables.tf
+ README.md
+ modules/
+ s3/
+ main.tf
+ variables.tf
+ outputs.tf
+ iam/
+ main.tf
+ variables.tf
+ outputs.tf
+ sagemaker_pipeline/
+ main.tf
+ variables.tf
+ outputs.tf
+ mlops/
+ data/
+ docs/
+ discovery-one-pager.md
+ README.md
+ pipelines/
+ digital_twin_resilience/
+ check_pipeline_execution.py
+ config.py
+ create_pipeline.py
+ parse_request.py
+ pipeline_definition.json
+ pipeline.py
+ request_schema.py
+ request.json
+ requirements.txt
+ run_request_flow.py
+ show_pipeline_outputs.py
+ show_processing_logs.py
+ start_pipeline.py
+ steps/
+ processing/
+ processor.py
+ training/
+ train.py
+ evaluation/
+ evaluate.py
+ utils/
+ io_utils.py
+ metrics.py
+ schemas.py
data/
synthetic/
generate_synthetic_data.py
sample_input.csv
-
tests/
- unit/
- test_config.py
- test_metrics.py
- test_pipeline_compile.py
- integration/
- test_smoke_synthetic.py
-
- Makefile
+ bedrock_test.py
+ test_steps.py
README.md
\ No newline at end of file
diff --git a/mlops_local_test/.DS_Store b/mlops_local_test/.DS_Store
new file mode 100644
index 0000000..82eb920
Binary files /dev/null and b/mlops_local_test/.DS_Store differ
diff --git a/some_input.csv b/some_input.csv
deleted file mode 100644
index e69de29..0000000
diff --git a/terraform/.DS_Store b/terraform/.DS_Store
index 5b1fb09..9801d46 100644
Binary files a/terraform/.DS_Store and b/terraform/.DS_Store differ
diff --git a/terraform/envs/.DS_Store b/terraform/envs/.DS_Store
index cc02eff..82185f4 100644
Binary files a/terraform/envs/.DS_Store and b/terraform/envs/.DS_Store differ
diff --git a/terraform/modules/.DS_Store b/terraform/modules/.DS_Store
index 0800edd..868f0a5 100644
Binary files a/terraform/modules/.DS_Store and b/terraform/modules/.DS_Store differ