@@ -18,6 +18,12 @@ Serverless AutoML platform with **split architecture**:
1818
1919** Key insight:** Containers ONLY for training - ML deps (265MB) exceed Lambda's 250MB limit.
2020
21+ ** Critical architectural principle:** The training container is ** fully autonomous** and stateless. It:
22+ - Never calls the backend API
23+ - Receives ALL context via environment variables only
24+ - Writes results directly to DynamoDB and S3
25+ - Can be tested locally with ` docker-compose --profile training run training `
26+
2127## Critical: Environment Variable Cascade
2228
2329Training container is ** autonomous** - receives ALL context via environment variables, never calls the API:
@@ -74,26 +80,75 @@ Located in `backend/training/`, runs as Docker container in AWS Batch:
7480
7581## Development Commands
7682
83+ ### Local Development (Docker Compose)
84+
85+ ``` powershell
86+ # 1. Configure backend environment
87+ cp backend/.env.example backend/.env
88+ # Edit backend/.env with values from: terraform output
89+
90+ # 2. Start API (connects to dev AWS services)
91+ docker-compose up
92+
93+ # 3. Frontend (separate terminal)
94+ cd frontend
95+ cp .env.local.example .env.local
96+ # Edit .env.local with NEXT_PUBLIC_API_URL
97+ pnpm install && pnpm dev
98+ ```
99+
100+ ### Backend Development
101+
102+ ``` powershell
103+ # Run API locally without Docker
104+ cd backend
105+ python -m venv venv
106+ .\venv\Scripts\Activate.ps1 # Linux/Mac: source venv/bin/activate
107+ pip install -r requirements.txt
108+ uvicorn api.main:app --reload # API at http://localhost:8000/docs
109+ ```
110+
111+ ### Training Container Testing
112+
77113``` powershell
78- # Backend (local) - http://localhost:8000/docs
79- cd backend; uvicorn api.main:app --reload
114+ # Test training locally with uploaded dataset
115+ DATASET_ID=xxx TARGET_COLUMN=price docker-compose --profile training run training
80116
81- # Frontend (local) - Set NEXT_PUBLIC_API_URL=http://localhost:8000 in .env.local
82- cd frontend; pnpm dev
117+ # Or use helper script (requires dataset uploaded to dev)
118+ python scripts/run-training-local.py --dataset-id xxx --target-column price --time-budget 120
119+ ```
120+
121+ ### Deployment
83122
84- # Test training container locally (requires uploaded dataset)
85- docker-compose --profile training run training
123+ ``` powershell
124+ # Full infrastructure
125+ cd infrastructure/terraform
126+ terraform apply
86127
87- # Deploy Lambda only
88- cd infrastructure/terraform; terraform apply -target=aws_lambda_function.api
128+ # Deploy Lambda API only (fast iteration)
129+ terraform apply -target=aws_lambda_function.api
89130
90- # Deploy training container
131+ # Build and deploy training container
91132$EcrUrl = terraform output -raw ecr_repository_url
133+ $Region = terraform output -raw aws_region
134+ aws ecr get-login-password --region $Region | docker login --username AWS --password-stdin $($EcrUrl.Split('/')[0])
92135docker build -t automl-training:latest backend/training
93- docker tag automl-training:latest "$EcrUrl:latest"; docker push "$EcrUrl:latest"
136+ docker tag automl-training:latest "$EcrUrl:latest"
137+ docker push "$EcrUrl:latest"
138+
139+ # Verify image in ECR
140+ aws ecr describe-images --repository-name automl-lite-dev-training --image-ids imageTag=latest
141+ ```
94142
143+ ### Utilities
144+
145+ ``` powershell
95146# Generate architecture diagrams (requires: pip install diagrams + Graphviz)
96147python scripts/generate_architecture_diagram.py
148+
149+ # Make predictions with trained model
150+ docker build -f scripts/Dockerfile.predict -t automl-predict .
151+ docker run --rm -v ${PWD}:/data automl-predict /data/model.pkl --info
97152```
98153
99154## Common Pitfalls
@@ -138,14 +193,73 @@ Backend Pydantic and Frontend TypeScript schemas must match. When adding fields:
138193| ` scripts/predict.py ` | Make predictions with trained models (Docker) |
139194| ` scripts/generate_architecture_diagram.py ` | Generate AWS architecture diagrams |
140195
196+ ## Testing & Validation
197+
198+ ### API Testing
199+
200+ ``` powershell
201+ # Health check
202+ curl $API_URL/health
203+
204+ # Test upload endpoint
205+ curl -X POST $API_URL/upload -H "Content-Type: application/json" -d '{"filename": "test.csv"}'
206+
207+ # View API docs (Swagger UI)
208+ # http://localhost:8000/docs (local) or $API_URL/docs (deployed)
209+ ```
210+
211+ ### Container Testing
212+
213+ ``` powershell
214+ # Build training container locally
215+ docker build -t automl-training:latest backend/training
216+
217+ # Test with environment variables
218+ docker run --rm \
219+ -e DATASET_ID=xxx \
220+ -e TARGET_COLUMN=price \
221+ -e JOB_ID=test-123 \
222+ -e TIME_BUDGET=60 \
223+ -e S3_BUCKET_DATASETS=automl-lite-dev-datasets-XXX \
224+ -e DYNAMODB_JOBS_TABLE=automl-lite-dev-training-jobs \
225+ -e REGION=us-east-1 \
226+ -v ~/.aws:/root/.aws:ro \
227+ automl-training:latest
228+ ```
229+
230+ ### Frontend Testing
231+
232+ ``` powershell
233+ cd frontend
234+ pnpm dev # Development server at http://localhost:3000
235+ pnpm build # Test production build
236+ pnpm lint # ESLint check
237+ ```
238+
239+ ### Terraform Validation
240+
241+ ``` powershell
242+ cd infrastructure/terraform
243+ terraform fmt # Format files
244+ terraform validate # Syntax check
245+ terraform plan # Preview changes
246+ ```
247+
141248## CI/CD Workflows (` .github/workflows/ ` )
142249
143250| Workflow | Trigger | Purpose |
144251| ----------| ---------| ---------|
145252| ` deploy-lambda-api.yml ` | Push to main/dev | Deploy FastAPI to Lambda |
146253| ` deploy-training-container.yml ` | Push to main/dev | Build & push training image to ECR |
147254| ` deploy-infrastructure.yml ` | Manual | Terraform apply |
255+ | ` deploy-frontend.yml ` | Push to main/dev (via Amplify) | Auto-deploy Next.js frontend |
148256| ` ci-terraform.yml ` | PR | Terraform validate & plan |
257+ | ` destroy-environment.yml ` | Manual | Destroy all infrastructure (requires confirmation) |
258+
259+ ** Branch Strategy:**
260+ - ` dev ` → Deploy to dev environment (automl-lite-dev-* )
261+ - ` main ` → Deploy to prod environment (automl-lite-prod-* )
262+ - Feature branches → CI validation only (no deployment)
149263
150264## Key Docs
151265
0 commit comments