@@ -51,10 +51,14 @@ Required env vars in training container: `DATASET_ID`, `TARGET_COLUMN`, `JOB_ID`
5151
5252### Training Container (Python)
5353
54- - ** Preprocessing** (` preprocessor.py ` ): Auto-detects ID columns, uses ` feature-engine ` for constant/duplicate detection
55- - ** Problem type** : ` <20 unique values OR <5% unique ratio ` = classification
56- - ** Model training** (` model_trainer.py ` ): FLAML with ` ['lgbm', 'rf', 'extra_tree'] ` - xgboost excluded (bugs)
57- - ** Multiclass** : Explicitly set ` metric='accuracy' `
54+ Located in ` backend/training/ ` , runs as Docker container in AWS Batch:
55+
56+ - ** Entry point** (` train.py ` ): Orchestrates 7-step pipeline (download → EDA → preprocess → train → reports → save → update status)
57+ - ** Preprocessing** (` preprocessor.py ` ): Auto-detects ID columns using regex patterns, uses ` feature-engine ` for constant/duplicate detection
58+ - ** Problem type detection** : ` <20 unique values OR <5% unique ratio ` = classification
59+ - ** Model training** (` model_trainer.py ` ): FLAML with ` ['lgbm', 'rf', 'extra_tree'] ` - xgboost excluded due to ` best_iteration ` bugs
60+ - ** Multiclass** : Explicitly set ` metric='accuracy' ` (FLAML's auto-detection unreliable)
61+ - ** Reports** : Generates both EDA (` sweetviz ` ) and training reports with feature importance charts
5862
5963### Frontend (TypeScript)
6064
@@ -100,6 +104,8 @@ python scripts/generate_architecture_diagram.py
100104| Job stuck RUNNING | Missing DynamoDB perms | Add ` dynamodb:UpdateItem ` to Batch task role in ` iam.tf ` |
101105| New train.py param ignored | Not in containerOverrides | Add to ` batch_service.py ` environment list |
102106| Frontend CORS errors | Wrong API URL | Get from ` terraform output api_gateway_url ` |
107+ | Low model accuracy | ID columns in training | Check ` preprocessor.py ` ID detection patterns |
108+ | DynamoDB Decimal errors | Floats in metrics dict | Convert to ` Decimal(str(v)) ` before saving |
103109
104110## File Reference by Task
105111
@@ -112,16 +118,17 @@ python scripts/generate_architecture_diagram.py
112118## Schema Sync Pattern
113119
114120Backend Pydantic and Frontend TypeScript schemas must match. When adding fields:
115- 1 . ` backend/api/models/schemas.py ` - Add to Pydantic model
116- 2 . ` frontend/lib/api.ts ` - Add to TypeScript interface
117- 3 . Example : ` JobResponse ` (backend) ↔ ` JobDetails ` (frontend)
121+ 1 . ` backend/api/models/schemas.py ` - Add to Pydantic model (e.g., ` JobResponse ` )
122+ 2 . ` frontend/lib/api.ts ` - Add to TypeScript interface (e.g., ` JobDetails ` )
123+ 3 . Key pairs : ` JobResponse ` ↔ ` JobDetails ` , ` DatasetMetadata ` ↔ ` DatasetMetadata ` , ` TrainResponse ` ↔ ` TrainResponse `
118124
119125## Debugging
120126
121127- Lambda logs: ` /aws/lambda/automl-lite-{env}-api `
122128- Batch logs: ` /aws/batch/automl-lite-{env}-training `
123129- Local API: ` http://localhost:8000/docs ` (Swagger UI)
124130- Env var mismatch: Compare ` batch_service.py ` containerOverrides with ` train.py ` os.getenv()
131+ - Training issues: Check ` dropped_columns ` in preprocessing_info for filtered features
125132
126133## Utility Scripts
127134
@@ -131,9 +138,19 @@ Backend Pydantic and Frontend TypeScript schemas must match. When adding fields:
131138| ` scripts/predict.py ` | Make predictions with trained models (Docker) |
132139| ` scripts/generate_architecture_diagram.py ` | Generate AWS architecture diagrams |
133140
141+ ## CI/CD Workflows (` .github/workflows/ ` )
142+
143+ | Workflow | Trigger | Purpose |
144+ | ----------| ---------| ---------|
145+ | ` deploy-lambda-api.yml ` | Push to main/dev | Deploy FastAPI to Lambda |
146+ | ` deploy-training-container.yml ` | Push to main/dev | Build & push training image to ECR |
147+ | ` deploy-infrastructure.yml ` | Manual | Terraform apply |
148+ | ` ci-terraform.yml ` | PR | Terraform validate & plan |
149+
134150## Key Docs
135151
136- - ` docs/LESSONS_LEARNED.md ` - Critical debugging insights
152+ - ` docs/LESSONS_LEARNED.md ` - Critical debugging insights (read this first for troubleshooting)
137153- ` docs/QUICKSTART.md ` - Deployment guide
138154- ` .github/SETUP_CICD.md ` - CI/CD with GitHub Actions
139155- ` infrastructure/terraform/ARCHITECTURE_DECISIONS.md ` - Why Lambda + Batch split
156+ - ` .github/git-commit-messages-instructions.md ` - Commit message conventions
0 commit comments