Skip to content

Latest commit

 

History

History
54 lines (45 loc) · 2.66 KB

File metadata and controls

54 lines (45 loc) · 2.66 KB

Prediction of Aqueous Solubility of Drug Molecules

Overview

Predicting the aqueous solubility of small organic molecules is a critical step in early-stage drug discovery, influencing a compound's absorption, distribution, metabolism, and excretion (ADME) properties. This project explores and compares two primary computational approaches for classifying molecular solubility:

  1. Descriptor-based Quantitative Structure-Property Relationship (QSPR) Modeling: Utilizes molecular descriptors generated by RDKit with a LightGBM classifier.
  2. Graph Convolutional Neural Networks (GCNs): Employs graph representations of molecules with PyTorch Geometric to learn predictive features.

Data Source

Model Highlights

Example Molecules

Example molecules from the dataset.

Hyperparameter Tuning

Visualization from GCN hyperparameter tuning.

Dependencies

  • PyTorch
  • PyTorch-Geometric
  • RDKit
  • DeepChem
  • Scikit-Learn
  • LightGBM
  • Pandas
  • NumPy
  • Seaborn
  • Matplotlib
  • tqdm
  • Wandb (for hyperparameters search.ipynb)

Repository Structure

  • analytics.ipynb: Data exploration, feature analysis, and visualization.
  • descriptor-based-model.ipynb: Implementation and evaluation of the QSPR model using molecular descriptors and LightGBM.
  • hyperparameters search.ipynb: Hyperparameter optimization for the GCN model using Weights & Biases.
  • model training and evaluation.ipynb: Training, evaluation, and analysis of the GCN model.
  • custom_dataset.py: Defines the custom PyTorch Geometric dataset for loading molecular graphs.
  • model.py: Contains the GCN model architecture definition.
  • trainer.py: Implements the training and validation loop for the GCN model.
  • utils.py: Utility functions used across notebooks (e.g., for evaluation metrics, plotting).
  • data/: Directory containing the dataset files.

Read a Summary

Want to read a more readable version of this project? You can find it here: Prediction of Aqueous Solubility