Skip to content

Commit 4cd7b31

Browse files
committed
docs: update copilot instructions with training container details
Expanded the training container section to include entry point, preprocessing logic, problem type detection, model training specifics, and report generation. Added troubleshooting tips for low model accuracy and DynamoDB Decimal errors. Updated schema sync pattern for clarity.
1 parent 2e7e921 commit 4cd7b31

1 file changed

Lines changed: 25 additions & 8 deletions

File tree

.github/copilot-instructions.md

Lines changed: 25 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -51,10 +51,14 @@ Required env vars in training container: `DATASET_ID`, `TARGET_COLUMN`, `JOB_ID`
5151

5252
### Training Container (Python)
5353

54-
- **Preprocessing** (`preprocessor.py`): Auto-detects ID columns, uses `feature-engine` for constant/duplicate detection
55-
- **Problem type**: `<20 unique values OR <5% unique ratio` = classification
56-
- **Model training** (`model_trainer.py`): FLAML with `['lgbm', 'rf', 'extra_tree']` - xgboost excluded (bugs)
57-
- **Multiclass**: Explicitly set `metric='accuracy'`
54+
Located in `backend/training/`, runs as Docker container in AWS Batch:
55+
56+
- **Entry point** (`train.py`): Orchestrates 7-step pipeline (download → EDA → preprocess → train → reports → save → update status)
57+
- **Preprocessing** (`preprocessor.py`): Auto-detects ID columns using regex patterns, uses `feature-engine` for constant/duplicate detection
58+
- **Problem type detection**: `<20 unique values OR <5% unique ratio` = classification
59+
- **Model training** (`model_trainer.py`): FLAML with `['lgbm', 'rf', 'extra_tree']` - xgboost excluded due to `best_iteration` bugs
60+
- **Multiclass**: Explicitly set `metric='accuracy'` (FLAML's auto-detection unreliable)
61+
- **Reports**: Generates both EDA (`sweetviz`) and training reports with feature importance charts
5862

5963
### Frontend (TypeScript)
6064

@@ -100,6 +104,8 @@ python scripts/generate_architecture_diagram.py
100104
| Job stuck RUNNING | Missing DynamoDB perms | Add `dynamodb:UpdateItem` to Batch task role in `iam.tf` |
101105
| New train.py param ignored | Not in containerOverrides | Add to `batch_service.py` environment list |
102106
| Frontend CORS errors | Wrong API URL | Get from `terraform output api_gateway_url` |
107+
| Low model accuracy | ID columns in training | Check `preprocessor.py` ID detection patterns |
108+
| DynamoDB Decimal errors | Floats in metrics dict | Convert to `Decimal(str(v))` before saving |
103109

104110
## File Reference by Task
105111

@@ -112,16 +118,17 @@ python scripts/generate_architecture_diagram.py
112118
## Schema Sync Pattern
113119

114120
Backend Pydantic and Frontend TypeScript schemas must match. When adding fields:
115-
1. `backend/api/models/schemas.py` - Add to Pydantic model
116-
2. `frontend/lib/api.ts` - Add to TypeScript interface
117-
3. Example: `JobResponse` (backend) `JobDetails` (frontend)
121+
1. `backend/api/models/schemas.py` - Add to Pydantic model (e.g., `JobResponse`)
122+
2. `frontend/lib/api.ts` - Add to TypeScript interface (e.g., `JobDetails`)
123+
3. Key pairs: `JobResponse``JobDetails`, `DatasetMetadata``DatasetMetadata`, `TrainResponse``TrainResponse`
118124

119125
## Debugging
120126

121127
- Lambda logs: `/aws/lambda/automl-lite-{env}-api`
122128
- Batch logs: `/aws/batch/automl-lite-{env}-training`
123129
- Local API: `http://localhost:8000/docs` (Swagger UI)
124130
- Env var mismatch: Compare `batch_service.py` containerOverrides with `train.py` os.getenv()
131+
- Training issues: Check `dropped_columns` in preprocessing_info for filtered features
125132

126133
## Utility Scripts
127134

@@ -131,9 +138,19 @@ Backend Pydantic and Frontend TypeScript schemas must match. When adding fields:
131138
| `scripts/predict.py` | Make predictions with trained models (Docker) |
132139
| `scripts/generate_architecture_diagram.py` | Generate AWS architecture diagrams |
133140

141+
## CI/CD Workflows (`.github/workflows/`)
142+
143+
| Workflow | Trigger | Purpose |
144+
|----------|---------|---------|
145+
| `deploy-lambda-api.yml` | Push to main/dev | Deploy FastAPI to Lambda |
146+
| `deploy-training-container.yml` | Push to main/dev | Build & push training image to ECR |
147+
| `deploy-infrastructure.yml` | Manual | Terraform apply |
148+
| `ci-terraform.yml` | PR | Terraform validate & plan |
149+
134150
## Key Docs
135151

136-
- `docs/LESSONS_LEARNED.md` - Critical debugging insights
152+
- `docs/LESSONS_LEARNED.md` - Critical debugging insights (read this first for troubleshooting)
137153
- `docs/QUICKSTART.md` - Deployment guide
138154
- `.github/SETUP_CICD.md` - CI/CD with GitHub Actions
139155
- `infrastructure/terraform/ARCHITECTURE_DECISIONS.md` - Why Lambda + Batch split
156+
- `.github/git-commit-messages-instructions.md` - Commit message conventions

0 commit comments

Comments
 (0)