36 lines (29 loc) · 1.71 KB

Design Document: Car Price Prediction AI

1. System Architecture

The project follows a standard Machine Learning pipeline architecture: Data Source -> Preprocessing -> Model Training -> Model Deployment -> User Interface

2. Data Design

The model utilizes the following key features:

Categorical: Brand, Model, Fuel Type, Transmission, Owner Type.
Numerical: Year of Manufacture, Kilometers Driven, Engine CC, Power (bhp), Mileage (kmpl).
Target Variable: Price (INR/USD).

3. Machine Learning Pipeline

A. Data Cleaning

Outlier detection for 'Price' and 'Kilometers Driven' using the IQR method.
Imputation of missing values for 'Engine' and 'Power' using the median.

B. Feature Engineering

Age calculation: Current Year - Year of Manufacture.
Log transformation of the target variable (Price) to handle skewness.

C. Model Selection

Algorithm: Random Forest Regressor.
Reasoning: Handles non-linear relationships and categorical data effectively with less risk of overfitting compared to simple Linear Regression.
Evaluation Metrics: Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE).

4. Component Design

app.py: The main entry point using Flask/Streamlit to serve the model.
model.py: Contains the logic for training and saving the model as a .pkl file.
processor.py: A dedicated script to ensure that real-time user inputs are transformed exactly like the training data.

5. UI/UX Flow

User enters car specifications.
The frontend sends a POST request to the backend.
The backend scales/encodes inputs and passes them to the loaded model.
The predicted price is returned and displayed with a confidence range.