The project follows a standard Machine Learning pipeline architecture:
Data Source -> Preprocessing -> Model Training -> Model Deployment -> User Interface
The model utilizes the following key features:
- Categorical: Brand, Model, Fuel Type, Transmission, Owner Type.
- Numerical: Year of Manufacture, Kilometers Driven, Engine CC, Power (bhp), Mileage (kmpl).
- Target Variable: Price (INR/USD).
- Outlier detection for 'Price' and 'Kilometers Driven' using the IQR method.
- Imputation of missing values for 'Engine' and 'Power' using the median.
- Age calculation:
Current Year - Year of Manufacture. - Log transformation of the target variable (Price) to handle skewness.
- Algorithm: Random Forest Regressor.
- Reasoning: Handles non-linear relationships and categorical data effectively with less risk of overfitting compared to simple Linear Regression.
- Evaluation Metrics: Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE).
app.py: The main entry point using Flask/Streamlit to serve the model.model.py: Contains the logic for training and saving the model as a.pklfile.processor.py: A dedicated script to ensure that real-time user inputs are transformed exactly like the training data.
- User enters car specifications.
- The frontend sends a POST request to the backend.
- The backend scales/encodes inputs and passes them to the loaded model.
- The predicted price is returned and displayed with a confidence range.