Skip to content

Latest commit

 

History

History
36 lines (29 loc) · 1.71 KB

File metadata and controls

36 lines (29 loc) · 1.71 KB

Design Document: Car Price Prediction AI

1. System Architecture

The project follows a standard Machine Learning pipeline architecture: Data Source -> Preprocessing -> Model Training -> Model Deployment -> User Interface

2. Data Design

The model utilizes the following key features:

  • Categorical: Brand, Model, Fuel Type, Transmission, Owner Type.
  • Numerical: Year of Manufacture, Kilometers Driven, Engine CC, Power (bhp), Mileage (kmpl).
  • Target Variable: Price (INR/USD).

3. Machine Learning Pipeline

A. Data Cleaning

  • Outlier detection for 'Price' and 'Kilometers Driven' using the IQR method.
  • Imputation of missing values for 'Engine' and 'Power' using the median.

B. Feature Engineering

  • Age calculation: Current Year - Year of Manufacture.
  • Log transformation of the target variable (Price) to handle skewness.

C. Model Selection

  • Algorithm: Random Forest Regressor.
  • Reasoning: Handles non-linear relationships and categorical data effectively with less risk of overfitting compared to simple Linear Regression.
  • Evaluation Metrics: Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE).

4. Component Design

  • app.py: The main entry point using Flask/Streamlit to serve the model.
  • model.py: Contains the logic for training and saving the model as a .pkl file.
  • processor.py: A dedicated script to ensure that real-time user inputs are transformed exactly like the training data.

5. UI/UX Flow

  1. User enters car specifications.
  2. The frontend sends a POST request to the backend.
  3. The backend scales/encodes inputs and passes them to the loaded model.
  4. The predicted price is returned and displayed with a confidence range.