This project is an end-to-end machine learning solution for predicting student placements. It involves data collection, preprocessing, model training, hyperparameter tuning, and deployment using Streamlit.
βββ data
β βββ raw_data
β β βββ raw.csv
β βββ cleaned_data
β βββ preprocess_data
βββ models
β βββ best_model
β βββ model
β βββ scaler
βββ notebooks
β βββ data_collection.ipynb
β βββ data_cleaning.ipynb
β βββ preprocessing.ipynb
β βββ model_train.ipynb
β βββ hyperparameter_tuning.ipynb
βββ src
β βββ data_collection.py
β βββ data_cleaning.py
β βββ preprocessing.py
β βββ model_train.py
β βββ hyperparameter_tuning.py
β βββ app.py
β βββ main.py
β βββ streamlit.py
β βββ test.py
β βββ requirements.txt
βββ placementdata.csv
βββ .gitignore
- The raw dataset (
raw.csv) is gathered and stored in thedata/raw_data/directory. - The
data_collection.ipynbanddata_collection.pyscripts are used to fetch and save the data.
- Missing values, duplicate records, and irrelevant features are handled.
- The cleaned data is stored in the
data/cleaned_data/directory. - Implemented in
data_cleaning.ipynbanddata_cleaning.py.
- Feature engineering, encoding categorical variables, and feature scaling.
- Preprocessed data is saved in
data/preprocess_data/. - Implemented in
preprocessing.ipynbandpreprocessing.py.
- Multiple classification models were trained:
- Logistic Regression: 79.8%
- Decision Tree Classifier: 71.25%
- Random Forest Classifier: 79.1%
- AdaBoost Classifier: 81.05% (Best Model)
- Gradient Boosting Classifier: 80.3%
- Implemented in
model_train.ipynbandmodel_train.py. - Trained models are saved in the
models/model/directory.
- Models are optimized to improve accuracy.
- The best-performing model (AdaBoost) is saved in
models/best_model/. - Implemented in
hyperparameter_tuning.ipynbandhyperparameter_tuning.py.
- The best model is used for predictions.
- A Streamlit web application is created for user interaction.
- Implemented in
app.pyandstreamlit.py.
- The final model is evaluated on test data.
- Results and metrics are logged.
- Implemented in
test.py.
Install dependencies using:
pip install -r requirements.txt
Run the Streamlit app with:
streamlit run app.py
- Improve model accuracy with additional feature engineering.
- Integrate a database for real-time student placement tracking.
- Expand deployment options using Flask/Django for a web-based solution.
Prakshi Karkera