Description
The current machine learning models in this project (specifically the Random Forest implementation) show a significant disparity between Accuracy (~90%) and Recall (~50%). This indicates a class imbalance issue where the model struggles to identify the minority class effectively.
Proposed Improvement
I have implemented a pipeline that uses:
- SMOTE (Synthetic Minority Over-sampling Technique) to balance the training set.
- StandardScaler to normalize feature distributions.
Results
These changes result in a much more balanced and reliable model:
- Recall: Improved from 0.50 to 0.63 (+26% gain)
- F1-Score: Improved from 0.59 to 0.64
- Precision: Maintained at a healthy 0.65
Checklist
I have the code ready and would like to be assigned to this issue to submit a Pull Request!
Description
The current machine learning models in this project (specifically the Random Forest implementation) show a significant disparity between Accuracy (~90%) and Recall (~50%). This indicates a class imbalance issue where the model struggles to identify the minority class effectively.
Proposed Improvement
I have implemented a pipeline that uses:
Results
These changes result in a much more balanced and reliable model:
Checklist
I have the code ready and would like to be assigned to this issue to submit a Pull Request!