A data-driven Predictive Maintenance (PdM) framework for modeling the non-linear, discontinuous degradation of industrial shell-and-tube heat exchangers. This project transcends classical statistical baselines by engineering physics-informed temporal features to accurately predict Fouling Resistance (
Fouling in heat exchangers acts as an insulating blanket, introducing an additional thermal resistance (
Because cleaning events (Cleaning-In-Place / CIP) periodically reset the fouling resistance to zero, the degradation follows a non-stationary "sawtooth" pattern. This project utilizes over 700,000 hourly operational observations to model these kinetics, moving from reactive maintenance to an AI-driven predictive schedule.
-
Engineered Temporal Coordinates: Designed the
hours_since_cleanfeature to solve the phase-shift errors inherent in classical time-series models (ARIMA/SARIMA). -
Champion Model (XGBoost): Achieved an
$R^2$ of 0.9995 and MAE of 0.0041 by mastering the sharp discontinuities of CIP resets. -
Deep Learning (TFT): Implemented Google's Temporal Fusion Transformer for probabilistic multi-horizon forecasting and variable selection interpretability (
$R^2 = 0.968$ ).
You can interact with the deployed model directly via our Digital Twin dashboard:
Launch the Fouling Prediction Dashboard
-
Upload Telemetry: Upload a CSV containing real-time operational sensors (e.g.,
flow_actual,T_in_actual_K,is_cleaning). -
Automated ETL: The app automatically standardizes the physical inputs (Z-score scaling) and calculates the necessary temporal features (
hours_since_clean, lags). -
Real-Time Prediction: View the instantaneous predicted Fouling Resistance (
$R_f$ ) plotted against your system's critical failure threshold. - What-If Analysis: Use the interactive sliders to adjust mass flow rates or inlet temperatures to simulate how altering operational parameters extends or reduces the Time-To-Failure (TTF).
The codebase follows a modular, MLOps-friendly architecture:
Fouling-Prediction/
│
├── app/ # Streamlit web application source code
├── artifacts/ # Serialized models (.ubj/.joblib) and fitted scalers (.pkl)
├── config/ # Centralized hyperparameter and path configurations
├── data/ # Data manifests (Raw data downloaded from Kaggle)
├── etl/ # Polars-based Extract, Transform, Load pipeline modules
├── images/ # EDA plots and architectural diagrams
├── model/ # Training scripts (xgboost, catboost, tft, lstm)
├── notebooks/ # Comprehensive Jupyter notebooks (Research & EDA)
├── requirements.txt # Python dependencies
└── README.md## ⚙️ Prerequisites & Installation
Clone the repository and install the required dependencies. We recommend using a virtual environment.
```bash
git clone https://github.com/DataWorshipper/Fouling-Prediction.git
cd Fouling-Prediction
python -m venv venv
source venv/bin/activate # On Windows use: venv\Scripts\activate
pip install -r requirements.txt
The ETL pipeline uses Polars for ultra-fast, multi-threaded data processing. It handles schema standardization, Z-score scaling (saving the scaler to /artifacts), and temporal feature engineering.
# Process raw data and generate the modeling dataset
python etl/preprocess.py --input data/raw_data.csv --output data/processed_data.csvYou can train specific architectures using the scripts in the /model directory. Hyperparameter optimization is handled automatically via Optuna.
# Train the XGBoost champion model
python model/train_xgboost.py --config config/model_config.yaml
# Train the Temporal Fusion Transformer (requires GPU for reasonable training times)
python model/train_tft.py --epochs 30 --batch_size 64Trained models are automatically serialized and saved to the /artifacts directory.
To generate predictions on new unseen data using the saved artifacts:
python model/inference.py --input data/new_sensor_readings.csv --output data/predictions.csvIf you wish to test or modify the Streamlit dashboard locally:
streamlit run app/main.pyThis will spin up a local web server (usually at http://localhost:8501).