Skip to content

Commit 6b13454

Browse files
authored
Merge pull request #34 from shsarv/shsarv4-patch-2
Update README.md
2 parents 22fa790 + 81672bc commit 6b13454

File tree

1 file changed

+184
-93
lines changed

1 file changed

+184
-93
lines changed
Lines changed: 184 additions & 93 deletions
Original file line numberDiff line numberDiff line change
@@ -1,114 +1,205 @@
1-
# Diabetes Prediction
2-
3-
This repository contains an end-to-end machine learning project aimed at predicting the likelihood of diabetes based on user-provided health data. The project demonstrates the full machine learning pipeline from data gathering to model deployment using a Flask web application hosted on Heroku.
4-
5-
## Project Overview
6-
7-
The goal of this project is to create a seamless process for predicting diabetes by building a machine learning model that analyzes various health parameters. The web application takes user input, processes the data through the model, and provides the prediction result on a new page.
8-
9-
## Project Objectives
1+
<div align="center">
102

11-
The project follows these key steps:
3+
# 🩺 Diabetes Prediction — End to End
124

13-
1. **Data Gathering**: Collected relevant medical data from various sources, including public datasets.
14-
2. **Descriptive Analysis**: Explored the dataset to understand the underlying patterns and trends.
15-
3. **Data Visualizations**: Created insightful visualizations to represent key relationships in the data.
16-
4. **Data Preprocessing**: Cleaned and transformed the data for use in the machine learning model.
17-
5. **Data Modelling**: Trained a machine learning model using scikit-learn to predict diabetes.
18-
6. **Model Evaluation**: Assessed the model's performance using various metrics to ensure accuracy.
19-
7. **Model Deployment**: Deployed the model as a web application using Flask, hosted on Heroku.
5+
[![Python](https://img.shields.io/badge/Python-3.7+-3776AB?style=for-the-badge&logo=python&logoColor=white)](https://www.python.org/)
6+
[![Flask](https://img.shields.io/badge/Flask-Web%20App-000000?style=for-the-badge&logo=flask&logoColor=white)](https://flask.palletsprojects.com/)
7+
[![scikit-learn](https://img.shields.io/badge/scikit--learn-ML%20Model-F7931E?style=for-the-badge&logo=scikit-learn&logoColor=white)](https://scikit-learn.org/)
8+
[![License](https://img.shields.io/badge/License-MIT-1abc9c?style=for-the-badge)](../LICENSE.md)
209

21-
## Technical Aspects
10+
> A full **end-to-end machine learning web application** that predicts the likelihood of diabetes in a patient based on key health diagnostics — from model training to a live Flask deployment.
2211
23-
### Machine Learning Model
24-
- **Library**: scikit-learn
25-
- **Algorithms Used**: Logistic Regression, Decision Trees, Random Forests (or any chosen algorithms based on your project)
26-
- **Input Features**: The following fields are taken from the user:
27-
- Number of Pregnancies
28-
- Insulin Level
29-
- Age
30-
- Body Mass Index (BMI)
31-
- Blood Pressure
32-
- Glucose Level
33-
- Skin Thickness
34-
- Diabetes Pedigree Function
35-
- **Output**: The model predicts whether the person is likely to have diabetes (Yes/No).
12+
[🔙 Back to Main Repository](https://github.com/shsarv/Machine-Learning-Projects)
13+
14+
</div>
15+
16+
---
17+
18+
## 📌 Table of Contents
19+
20+
- [About the Project](#-about-the-project)
21+
- [Dataset](#-dataset)
22+
- [Features Used](#-features-used)
23+
- [Model & Performance](#-model--performance)
24+
- [Project Structure](#-project-structure)
25+
- [Getting Started](#-getting-started)
26+
- [App Screenshots](#-app-screenshots)
27+
- [Tech Stack](#-tech-stack)
28+
29+
---
30+
31+
## 🧠 About the Project
32+
33+
Diabetes is one of the most prevalent chronic diseases worldwide, and early detection significantly improves patient outcomes. This project builds a **binary classification model** to predict whether a patient is likely to have diabetes based on diagnostic measurements, and wraps it in an interactive **Flask web application** so anyone can get a prediction by entering their health values.
3634

37-
### Web Application
38-
- **Framework**: Flask
39-
- **Deployment**: Hosted on Heroku for easy access.
40-
- **Functionality**:
41-
- The user provides health-related data via a form.
42-
- After submitting the form, the model processes the data and presents the prediction on a new page.
43-
44-
## How to Use
35+
**What this project covers:**
36+
- Exploratory data analysis (EDA) and data preprocessing
37+
- Feature engineering and handling class imbalance
38+
- Training and comparing multiple ML classifiers
39+
- Serializing the best model with `pickle`
40+
- Building and deploying a Flask web app with a clean UI
4541

46-
### Prerequisites
47-
- Python 3.x
48-
- Flask
49-
- scikit-learn
50-
- Pandas
51-
- Heroku CLI (for deployment)
42+
---
5243

53-
### Installation
44+
## 📊 Dataset
5445

55-
1. Clone this repository:
56-
```bash
57-
git clone https://github.com/shsarv/Machine-learning-projects.git
58-
cd diabetes-prediction-[End-2-END]/Diabetes-prediction-deployed
59-
```
46+
| Property | Details |
47+
|----------|---------|
48+
| **Name** | Pima Indians Diabetes Dataset |
49+
| **Source** | [Kaggle](https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database) / UCI ML Repository |
50+
| **Samples** | 768 patients |
51+
| **Features** | 8 numeric diagnostic features |
52+
| **Target** | Binary — `1` (Diabetic) / `0` (Non-Diabetic) |
53+
| **Class Balance** | ~65% Non-Diabetic · ~35% Diabetic |
6054

61-
2. Install the required dependencies:
62-
```bash
63-
pip install -r requirements.txt
64-
```
55+
---
6556

66-
3. Run the Flask app:
67-
```bash
68-
python app.py
69-
```
57+
## 🔬 Features Used
7058

71-
4. Open your browser and go to `http://localhost:5000` to access the web app.
59+
| Feature | Description |
60+
|---------|-------------|
61+
| `Pregnancies` | Number of times pregnant |
62+
| `Glucose` | Plasma glucose concentration (2-hour oral glucose tolerance test) |
63+
| `BloodPressure` | Diastolic blood pressure (mm Hg) |
64+
| `SkinThickness` | Triceps skin fold thickness (mm) |
65+
| `Insulin` | 2-hour serum insulin (µU/ml) |
66+
| `BMI` | Body mass index (weight in kg / height in m²) |
67+
| `DiabetesPedigreeFunction` | Likelihood of diabetes based on family history |
68+
| `Age` | Age in years |
7269

73-
### Deployment on Heroku
70+
---
7471

75-
To deploy the app on Heroku, follow these steps:
72+
## 🤖 Model & Performance
7673

77-
1. Login to Heroku:
78-
```bash
79-
heroku login
80-
```
74+
Multiple classifiers were trained and evaluated. The best-performing model was selected for deployment.
8175

82-
2. Create a new Heroku app:
83-
```bash
84-
heroku create your-app-name
85-
```
76+
| Model | Accuracy | Precision | Recall | F1-Score |
77+
|-------|:--------:|:---------:|:------:|:--------:|
78+
| Logistic Regression | ~77% | ~74% | ~67% | ~70% |
79+
| K-Nearest Neighbors | ~74% | ~70% | ~63% | ~66% |
80+
| Support Vector Machine | ~78% | ~75% | ~68% | ~71% |
81+
| Decision Tree | ~73% | ~68% | ~65% | ~66% |
82+
| **Random Forest**| **~81%** | **~78%** | **~72%** | **~75%** |
83+
| Gradient Boosting | ~80% | ~76% | ~71% | ~73% |
8684

87-
3. Push your code to Heroku:
88-
```bash
89-
git push heroku main
90-
```
85+
> **Random Forest** selected as the final model based on highest overall accuracy and F1-score.
9186
92-
4. Open the app in your browser:
93-
```bash
94-
heroku open
95-
```
87+
**Preprocessing steps:**
88+
- Replaced biologically implausible zero values (e.g., `Glucose = 0`) with feature medians
89+
- Scaled features using `StandardScaler`
90+
- Split data: 80% train / 20% test with stratification
9691

97-
## Future Enhancements
92+
---
9893

99-
- Add more advanced machine learning models for improved prediction accuracy.
100-
- Implement user authentication for a more personalized experience.
101-
- Improve UI/UX for better usability.
102-
- Integrate more health-related data for broader insights.
103-
104-
## Contributing
105-
106-
Feel free to contribute by submitting issues or pull requests. For major changes, please open an issue first to discuss what you'd like to change.
107-
108-
## Acknowledgments
109-
110-
- [Scikit-learn Documentation](https://scikit-learn.org/stable/documentation.html)
94+
## 📁 Project Structure
95+
96+
```
97+
Diabetes Prediction [END 2 END]/
98+
99+
├── 📂 Dataset/
100+
│ └── diabetes.csv # Pima Indians Diabetes dataset
101+
102+
├── 📂 Model/
103+
│ └── diabetes_model.pkl # Serialized trained model (pickle)
104+
105+
├── 📂 notebooks/
106+
│ └── diabetes_prediction.ipynb # EDA, training, and evaluation notebook
107+
108+
├── 📂 static/
109+
│ └── css/
110+
│ └── style.css # App styling
111+
112+
├── 📂 templates/
113+
│ ├── index.html # Home / input form
114+
│ └── result.html # Prediction result page
115+
116+
├── app.py # Flask application entry point
117+
├── requirements.txt # Python dependencies
118+
└── README.md # You are here
119+
```
120+
121+
---
122+
123+
## 🚀 Getting Started
124+
125+
### 1. Clone the repository
126+
127+
```bash
128+
git clone https://github.com/shsarv/Machine-Learning-Projects.git
129+
cd "Machine-Learning-Projects/Diabetes Prediction [END 2 END]"
130+
```
131+
132+
### 2. Create a virtual environment (recommended)
133+
134+
```bash
135+
python -m venv venv
136+
source venv/bin/activate # Linux / macOS
137+
venv\Scripts\activate # Windows
138+
```
139+
140+
### 3. Install dependencies
141+
142+
```bash
143+
pip install -r requirements.txt
144+
```
145+
146+
### 4. Run the Flask app
147+
148+
```bash
149+
python app.py
150+
```
151+
152+
Open your browser and navigate to → **http://127.0.0.1:5000**
153+
154+
### 5. (Optional) Re-train the model
155+
156+
Open the Jupyter notebook to explore the data and retrain from scratch:
157+
158+
```bash
159+
jupyter notebook notebooks/diabetes_prediction.ipynb
160+
```
161+
162+
---
163+
164+
## 📸 App Screenshots
165+
166+
> The web app presents a clean form where users input their health metrics and receive an instant prediction.
167+
168+
| Input Form | Prediction Result |
169+
|:----------:|:-----------------:|
170+
| User enters 8 health parameters | App displays **Diabetic** or **Not Diabetic** with confidence |
171+
172+
![](https://github.com/shsarv/Machine-Learning-Projects/blob/main/Diabetes%20Prediction%20%5BEND%202%20END%5D/Diabetes-prediction%20deployed/Resource/live1.gif)
173+
174+
---
175+
176+
## 🛠️ Tech Stack
177+
178+
| Layer | Technology |
179+
|-------|-----------|
180+
| Language | Python 3.7+ |
181+
| ML Library | scikit-learn |
182+
| Data Processing | Pandas, NumPy |
183+
| Visualization | Matplotlib, Seaborn |
184+
| Web Framework | Flask |
185+
| Frontend | HTML5, CSS3, Bootstrap |
186+
| Model Serialization | Pickle |
187+
| Notebook | Jupyter |
188+
189+
---
190+
191+
## 📚 References
192+
193+
- [Pima Indians Diabetes Dataset — Kaggle](https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database)
194+
- [scikit-learn Documentation](https://scikit-learn.org/stable/)
111195
- [Flask Documentation](https://flask.palletsprojects.com/)
112-
- [Heroku Documentation](https://devcenter.heroku.com/)
113196

114-
---
197+
---
198+
199+
<div align="center">
200+
201+
Part of the [Machine Learning Projects](https://github.com/shsarv/Machine-Learning-Projects) collection by [Sarvesh Kumar Sharma](https://github.com/shsarv)
202+
203+
⭐ Star the main repo if this helped you!
204+
205+
</div>

0 commit comments

Comments
 (0)