|
1 | 1 |  |
2 | 2 | # MXtalTools: Toolbox for machine learning on molecular crystals |
3 | 3 |
|
| 4 | +A Python library built on PyTorch and PyTorch Geometric for machine learning tasks on molecules and molecular crystals. |
| 5 | + |
| 6 | +**Features:** |
| 7 | +- Crystal building -- fast, differentiable molecular crystal construction from asymmetric unit parameters |
| 8 | +- Crystal density prediction -- predict packing coefficients from molecular structure |
| 9 | +- Molecule autoencoder -- equivariant molecular encodings via pre-trained Mo3ENet |
| 10 | +- Crystal scoring -- evaluate crystal structures against CSD statistics |
| 11 | +- Crystal structure search -- optimize crystal packing with ML potentials |
| 12 | +- Dataset utilities -- build molecular/crystal datasets from CSD, .cif, and .xyz files |
| 13 | + |
4 | 14 | ## Documentation |
5 | | -See our detailed documentation including installation and deployment instructions at our [readthedocs](https://mxtaltools.readthedocs.io/en/master/) page. |
6 | 15 |
|
7 | | -## Installation for Users |
| 16 | +See our detailed docs at [readthedocs](https://mxtaltools.readthedocs.io/). |
8 | 17 |
|
| 18 | +## Quick Start |
9 | 19 |
|
10 | | -1. Install PyTorch, Pytorch Geometric (including torch-scatter, torch-sparse, torch-cluster), based on your system and CUDA version: |
11 | | -[PyTorch installation guide](https://pytorch.org/get-started/locally/) |
12 | | -[PyG installation guide](https://pytorch-geometric.readthedocs.io/en/latest/install/installation.html) |
| 20 | +```python |
| 21 | +from mxtaltools.dataset_utils.data_classes import MolData |
| 22 | +from mxtaltools.dataset_utils.utils import collate_data_list |
| 23 | +from mxtaltools.common.training_utils import load_molecule_scalar_regressor |
13 | 24 |
|
14 | | -2. Install this package: |
| 25 | +# Create molecule from SMILES |
| 26 | +mol = MolData.from_smiles("c1ccccc1", protonate=True, minimize=True, partial_charges=True) |
| 27 | +batch = collate_data_list([mol]) |
| 28 | + |
| 29 | +# Predict crystal packing coefficient |
| 30 | +model = load_molecule_scalar_regressor("checkpoints/cp_regressor.pt") |
| 31 | +prediction = model(batch.clone()) |
| 32 | +``` |
| 33 | + |
| 34 | +## Installation for Users |
| 35 | + |
| 36 | +1. Install PyTorch and PyTorch Geometric (including torch-scatter, torch-sparse, torch-cluster) for your CUDA version: |
| 37 | + - [PyTorch installation guide](https://pytorch.org/get-started/locally/) |
| 38 | + - [PyG installation guide](https://pytorch-geometric.readthedocs.io/en/latest/install/installation.html) |
| 39 | + |
| 40 | +2. Install MXtalTools: |
15 | 41 |
|
16 | 42 | ```bash |
17 | 43 | pip install mxtaltools |
18 | 44 | ``` |
19 | 45 |
|
20 | | - |
21 | 46 | ## Installation for Developers |
22 | 47 |
|
23 | | -1. Download the code from this repository via |
| 48 | +1. Clone the repository: |
24 | 49 |
|
25 | 50 | ```bash |
26 | | - git clone git@github.com:InfluenceFunctional/MXtalTools.git MXtalTools |
| 51 | + git clone git@github.com:InfluenceFunctional/MXtalTools.git |
| 52 | + cd MXtalTools |
27 | 53 | ``` |
28 | | -2. Create a python environment of your choice. We recommend using pip+virtualenv. |
29 | | -3. Install PyTorch, Pytorch Geometric (including torch-scatter, torch-sparse, torch-cluster), based on your system and CUDA version: |
30 | | -[PyTorch installation guide](https://pytorch.org/get-started/locally/) |
31 | | -[PyG installation guide](https://pytorch-geometric.readthedocs.io/en/latest/install/installation.html) |
32 | | -4. Install remaining requirements with |
| 54 | + |
| 55 | +2. Create a Python environment (pip+virtualenv recommended). |
| 56 | + |
| 57 | +3. Install PyTorch and PyG as described above. |
| 58 | + |
| 59 | +4. Install remaining dependencies: |
33 | 60 |
|
34 | 61 | ```bash |
35 | 62 | poetry install |
36 | 63 | ``` |
37 | | -5. If you plan to train any models, login to your weights and biases ("wandb") account, which is necessary for run monitoring and reporting with |
38 | | - |
39 | | - ```bash |
| 64 | + |
| 65 | +5. For model training, login to Weights & Biases: |
| 66 | + |
| 67 | + ```bash |
40 | 68 | wandb login |
41 | 69 | ``` |
42 | | -6. In configs/users create a .yaml file for yourself and edit the paths and wandb details to correspond to your preferences. |
43 | | -When running the code, append the following to your command line prompt. |
44 | | - |
45 | | - ``` |
46 | | - --user YOUR_USERNAME |
47 | | - ``` |
48 | | - |
49 | | -7. If you plan to construct crystal datasets from .cif files, you'll need to install the CSD python api, with a valid license from CCDC. |
50 | | -
|
51 | | - [CSD Python API]([PyTorch installation guide](https://pytorch.org/get-started/locally/)) |
52 | | -
|
53 | | -<!-- |
54 | | -## 2. Datasets |
55 | | -1. This software generates training datasets of molecular crystal structures from collections of .cif files. |
56 | | - .cifs are collated and processed primarily with the CSD Python API and RDKit. |
57 | | - Collation includes filtering of structures which are somehow invalid. |
58 | | - Invalid conditions include: symmetry features disagree, no atoms in the crystal, RDKit rejects the structure outright. |
59 | | - The Cambridge Structural Database (CSD) can be processed by first dumping it to .cif files, or directly with minor modifications. |
60 | | - Customized functions are available for processing CSD Blind Test submissions TODO clean & test. |
61 | | - |
62 | | -2. In the most common case, processing the CSD, to generate a dataset, run the following scripts, |
63 | | - `dump_csd.py` -> `cif_processor.py` -> `manager.py`, |
64 | | - with the appropriate paths set in each script. |
65 | | - `cif_processor.py` takes on the order of dozens of hours to process the full CSD (>1M crystals). |
66 | | - `manager.py` also may take a few minutes to process a large dataset, as this is where we do pose analysis, |
67 | | - duplicates search, and some indexing tasks. |
68 | | - We recommend running several instances in parallel to reduce this time. |
69 | | - As they process datasets chunkwise in random order, this parallelism is fairly efficient. |
70 | | - Note that the speed here depends strongly on disk read-write speed. |
71 | | -
|
72 | | -
|
73 | | -### Key components |
74 | | -1. `crystal_modeller` - class which contains everything else and does all the work |
75 | | -2. `logger` - handles training statistics and reporting to weights and biases |
76 | | -3. `crystal_builder` - generates supercells / molecule clusters given molecule & symmetry information for training and reporting |
77 | | -4. `molecule_graph_model` - wrapper for GraphNeuralNetwork which parses i/o according to the various needs of different types of models |
78 | | -5. configs |
79 | | - 1. users - path and wandb login info for separate users |
80 | | - 2. dataset - specifies information for dataset construction and featurization |
81 | | - 3. main / dev / experiments - define all other parameters of a given run including losses, hyperparameters, convergence, etc. |
82 | | -6. `dataset_management` - tools for dataset generator, curation, and modelling |
83 | | -7. `standalone` - tools for true standalone deployment of crystal models, e.g., stability score & density prediction |
84 | | ---> |
| 70 | + |
| 71 | +6. Create a user config in `configs/users/YOUR_USERNAME.yaml` with your paths and W&B settings. Pass `--user YOUR_USERNAME` when running. |
| 72 | + |
| 73 | +7. (Optional) For crystal dataset construction from `.cif` files, install the [CSD Python API](https://www.ccdc.cam.ac.uk/) with a valid CCDC license. |
85 | 74 |
|
86 | 75 | ## Reference |
87 | | -If you use this code in any future publications, please cite our work using |
88 | | -```@article{kilgour2023geometric, |
| 76 | + |
| 77 | +If you use this code in a publication, please cite: |
| 78 | + |
| 79 | +```bibtex |
| 80 | +@article{kilgour2023geometric, |
89 | 81 | title={Geometric deep learning for molecular crystal structure prediction}, |
90 | 82 | author={Kilgour, Michael and Rogal, Jutta and Tuckerman, Mark}, |
91 | | - journal={Journal of chemical theory and computation}, |
| 83 | + journal={Journal of Chemical Theory and Computation}, |
92 | 84 | volume={19}, |
93 | 85 | number={14}, |
94 | 86 | pages={4743--4756}, |
95 | 87 | year={2023}, |
96 | 88 | publisher={American Chemical Society} |
97 | 89 | } |
98 | 90 | ``` |
99 | | - |
0 commit comments