Sagemaker test (mind that this is with older sagemaker v1 version)
This project builds a machine learning pipeline to classify mobile phone prices into different ranges using AWS SageMaker. The workflow includes:
- Data Preparation: Load and split data using scikit-learn
- Data Storage: Upload training/testing data to Amazon S3
- Model Training: Train RandomForest classifier using SageMaker's SKLearn container
- Model Deployment: Deploy trained model as a real-time inference endpoint
- Predictions: Make predictions using the deployed endpoint
- Cleanup: Delete endpoints to avoid unnecessary charges
- File:
mob_price_classification_train.csv - Task: Multi-class classification (predicting price_range)
- Features: 20 mobile phone specifications
- Train/Test Split: 85% training, 15% testing
- AWS Account with SageMaker access
- IAM role with SageMaker permissions
- Python 3.x with dependencies in
requirements.txt - S3 bucket for storing training data
WARNING (Testing only - IAM & Networking): Use least-privilege IAM roles and scoped policies; avoid granting broad/full access in production. Also avoid deploying resources into default/public subnets for production — prefer private subnets, NAT gateway or VPC endpoints, and properly scoped security groups.
- research.ipynb - Main notebook containing the complete SageMaker workflow
- script.py - Training script executed by SageMaker
- train-V-1.csv - Training dataset
- test-V-1.csv - Testing dataset
Run the cells in research.ipynb to:
- Load and explore the dataset
- Split data into train/test sets
- Upload data to S3
- Train a RandomForest model on SageMaker
- Deploy the model to an endpoint
- Make predictions on test data
- Clean up resources
The RandomForest classifier is trained with:
- n_estimators: 100
- random_state: 0
- Evaluation: Accuracy score and classification report on test data (15% of dataset)
-
Delete SageMaker Endpoint
- AWS Console → SageMaker → Endpoints → Select endpoint → Delete
- This is critical as running endpoints incur hourly charges
-
Delete Endpoint Configuration
- AWS Console → SageMaker → Endpoint Configurations → Select config → Delete
-
Delete Model
- AWS Console → SageMaker → Models → Select model → Delete
-
Clean S3 Bucket
- AWS Console → S3 → Select bucket → Delete all training/test data and artifacts
- Delete the bucket if created specifically for this project
-
Remove IAM Role (if created specifically for this project)
- AWS Console → IAM → Roles → Select role → Delete
-
Delete CloudWatch Logs (optional)
- AWS Console → CloudWatch → Logs → Delete log groups related to SageMaker
- MLOPS Udemy course from Krish Naik