This repository contains the implementation and analysis of a machine learning project aimed at classifying brain strokes into ischemic and hemorrhagic types. Leveraging Random Forest, K-Nearest Neighbors (KNN), and Decision Tree algorithms, i achieve high classification accuracy, providing a potential tool for aiding in the rapid diagnosis and treatment of strokes.
I utilized a publicly available dataset from Kaggle, which includes demographic information, medical history, lifestyle factors, and socioeconomic variables. Key features include:
Age, Gender, Hypertension, Heart disease, Blood sugar levels, Body Mass Index (BMI), Smoking status, Work type, Residence type,
The target variable is the presence or absence of a stroke.
Age Distribution: Stroke incidence increases with age.
Smoking Status: Higher stroke prevalence among smokers.
Heart Disease: Strong correlation between heart disease and stroke incidence.
Correlation Analysis: Explored relationships between features to understand stroke risk factors.
I evaluated three machine learning models:
Accuracy: 95.3%, Precision: 0.478, Recall: 0.498, F1 Score: 0.488,
Accuracy: 95.6%, Precision: 0.478, Recall: 0.499, F1 Score: 0.489,
Accuracy: 95.5%, Precision: 0.955, Recall: 0.478, F1 Score: 0.489,
All models achieved high overall accuracy, but faced challenges with class imbalance, leading to lower precision and recall for stroke classification. Future work will focus on addressing this imbalance and incorporating additional clinical data to enhance model performance.



