Skip to content

Commit ffbc900

Browse files
docs: full refresh of documentation and packaging
- Sphinx infrastructure: rewrite conf.py with custom torch mock system that handles issubclass() for nn.Module subclasses, enable autosummary_generate=True, add _templates/autosummary/base.rst fix - Remove 57 pre-committed autosummary stubs and duplicate docs/docs/ directory; both are now gitignored (auto-generated at build time) - API reference: reorganised into labelled sections (Common Utilities, Constants, Crystal Building, etc.) in modules.rst; 40 module pages now fully documented with function signatures and source links - Content: rewrite installation.rst (two-step torch/PyG then pip), about.rst (feature list + citation), examples.rst (fixed math); remove stale warnings from dataset_creation and model_training - pyproject.toml: add numba, scikit-image, networkx, einops, msgpack, POT, seaborn; drop umap-learn and hdbscan; add fairchem-core as optional [uma] extra; keep torch/PyG commented with install note - .readthedocs.yaml: install package (pip install .) before Sphinx so pure-Python deps are available; trim docs/requirements.txt to Sphinx-only - README: refresh with quick-start snippet, clean install instructions Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent 4328da3 commit ffbc900

97 files changed

Lines changed: 549 additions & 4449 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.gitignore

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,11 @@
11
.idea/
22
__pycache__/
3+
4+
# Sphinx build output (generated at build time, not committed)
5+
docs/_build/
6+
7+
# Auto-generated API stubs (rebuilt by sphinx-build via autosummary_generate=True)
8+
docs/source/_autosummary/
39
/parallelTest.py
410
/old/processDatasetChunks.py
511
/req.out

.readthedocs.yaml

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,22 @@
1-
# Read the Docs configuration file for MkDocs projects
1+
# Read the Docs configuration file
22
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details
33

4-
# Required
54
version: 2
65

7-
# Set the version of Python and other tools you might need
86
build:
97
os: ubuntu-22.04
108
tools:
11-
python: "3.9"
9+
python: "3.11"
1210

1311
python:
1412
install:
15-
# - method: pip
16-
# path: .
13+
# Install the package (without torch/PyG — those are mocked in conf.py).
14+
# This brings in all pure-Python deps (numpy, scipy, rdkit, etc.) so
15+
# Sphinx can import the parts of mxtaltools that don't touch torch.
16+
- method: pip
17+
path: .
18+
# Sphinx and theme
1719
- requirements: docs/requirements.txt
1820

1921
sphinx:
20-
configuration: docs/conf.py
22+
configuration: docs/conf.py

README.md

Lines changed: 55 additions & 64 deletions
Original file line numberDiff line numberDiff line change
@@ -1,99 +1,90 @@
11
![image](https://github.com/InfluenceFunctional/MXtalTools/assets/30198118/ecc49717-b9b4-4901-9b59-8e4c8b919813)
22
# MXtalTools: Toolbox for machine learning on molecular crystals
33

4+
A Python library built on PyTorch and PyTorch Geometric for machine learning tasks on molecules and molecular crystals.
5+
6+
**Features:**
7+
- Crystal building -- fast, differentiable molecular crystal construction from asymmetric unit parameters
8+
- Crystal density prediction -- predict packing coefficients from molecular structure
9+
- Molecule autoencoder -- equivariant molecular encodings via pre-trained Mo3ENet
10+
- Crystal scoring -- evaluate crystal structures against CSD statistics
11+
- Crystal structure search -- optimize crystal packing with ML potentials
12+
- Dataset utilities -- build molecular/crystal datasets from CSD, .cif, and .xyz files
13+
414
## Documentation
5-
See our detailed documentation including installation and deployment instructions at our [readthedocs](https://mxtaltools.readthedocs.io/en/master/) page.
615

7-
## Installation for Users
16+
See our detailed docs at [readthedocs](https://mxtaltools.readthedocs.io/).
817

18+
## Quick Start
919

10-
1. Install PyTorch, Pytorch Geometric (including torch-scatter, torch-sparse, torch-cluster), based on your system and CUDA version:
11-
[PyTorch installation guide](https://pytorch.org/get-started/locally/)
12-
[PyG installation guide](https://pytorch-geometric.readthedocs.io/en/latest/install/installation.html)
20+
```python
21+
from mxtaltools.dataset_utils.data_classes import MolData
22+
from mxtaltools.dataset_utils.utils import collate_data_list
23+
from mxtaltools.common.training_utils import load_molecule_scalar_regressor
1324

14-
2. Install this package:
25+
# Create molecule from SMILES
26+
mol = MolData.from_smiles("c1ccccc1", protonate=True, minimize=True, partial_charges=True)
27+
batch = collate_data_list([mol])
28+
29+
# Predict crystal packing coefficient
30+
model = load_molecule_scalar_regressor("checkpoints/cp_regressor.pt")
31+
prediction = model(batch.clone())
32+
```
33+
34+
## Installation for Users
35+
36+
1. Install PyTorch and PyTorch Geometric (including torch-scatter, torch-sparse, torch-cluster) for your CUDA version:
37+
- [PyTorch installation guide](https://pytorch.org/get-started/locally/)
38+
- [PyG installation guide](https://pytorch-geometric.readthedocs.io/en/latest/install/installation.html)
39+
40+
2. Install MXtalTools:
1541

1642
```bash
1743
pip install mxtaltools
1844
```
1945

20-
2146
## Installation for Developers
2247

23-
1. Download the code from this repository via
48+
1. Clone the repository:
2449

2550
```bash
26-
git clone git@github.com:InfluenceFunctional/MXtalTools.git MXtalTools
51+
git clone git@github.com:InfluenceFunctional/MXtalTools.git
52+
cd MXtalTools
2753
```
28-
2. Create a python environment of your choice. We recommend using pip+virtualenv.
29-
3. Install PyTorch, Pytorch Geometric (including torch-scatter, torch-sparse, torch-cluster), based on your system and CUDA version:
30-
[PyTorch installation guide](https://pytorch.org/get-started/locally/)
31-
[PyG installation guide](https://pytorch-geometric.readthedocs.io/en/latest/install/installation.html)
32-
4. Install remaining requirements with
54+
55+
2. Create a Python environment (pip+virtualenv recommended).
56+
57+
3. Install PyTorch and PyG as described above.
58+
59+
4. Install remaining dependencies:
3360

3461
```bash
3562
poetry install
3663
```
37-
5. If you plan to train any models, login to your weights and biases ("wandb") account, which is necessary for run monitoring and reporting with
38-
39-
```bash
64+
65+
5. For model training, login to Weights & Biases:
66+
67+
```bash
4068
wandb login
4169
```
42-
6. In configs/users create a .yaml file for yourself and edit the paths and wandb details to correspond to your preferences.
43-
When running the code, append the following to your command line prompt.
44-
45-
```
46-
--user YOUR_USERNAME
47-
```
48-
49-
7. If you plan to construct crystal datasets from .cif files, you'll need to install the CSD python api, with a valid license from CCDC.
50-
51-
[CSD Python API]([PyTorch installation guide](https://pytorch.org/get-started/locally/))
52-
53-
<!--
54-
## 2. Datasets
55-
1. This software generates training datasets of molecular crystal structures from collections of .cif files.
56-
.cifs are collated and processed primarily with the CSD Python API and RDKit.
57-
Collation includes filtering of structures which are somehow invalid.
58-
Invalid conditions include: symmetry features disagree, no atoms in the crystal, RDKit rejects the structure outright.
59-
The Cambridge Structural Database (CSD) can be processed by first dumping it to .cif files, or directly with minor modifications.
60-
Customized functions are available for processing CSD Blind Test submissions TODO clean & test.
61-
62-
2. In the most common case, processing the CSD, to generate a dataset, run the following scripts,
63-
`dump_csd.py` -> `cif_processor.py` -> `manager.py`,
64-
with the appropriate paths set in each script.
65-
`cif_processor.py` takes on the order of dozens of hours to process the full CSD (>1M crystals).
66-
`manager.py` also may take a few minutes to process a large dataset, as this is where we do pose analysis,
67-
duplicates search, and some indexing tasks.
68-
We recommend running several instances in parallel to reduce this time.
69-
As they process datasets chunkwise in random order, this parallelism is fairly efficient.
70-
Note that the speed here depends strongly on disk read-write speed.
71-
72-
73-
### Key components
74-
1. `crystal_modeller` - class which contains everything else and does all the work
75-
2. `logger` - handles training statistics and reporting to weights and biases
76-
3. `crystal_builder` - generates supercells / molecule clusters given molecule & symmetry information for training and reporting
77-
4. `molecule_graph_model` - wrapper for GraphNeuralNetwork which parses i/o according to the various needs of different types of models
78-
5. configs
79-
1. users - path and wandb login info for separate users
80-
2. dataset - specifies information for dataset construction and featurization
81-
3. main / dev / experiments - define all other parameters of a given run including losses, hyperparameters, convergence, etc.
82-
6. `dataset_management` - tools for dataset generator, curation, and modelling
83-
7. `standalone` - tools for true standalone deployment of crystal models, e.g., stability score & density prediction
84-
-->
70+
71+
6. Create a user config in `configs/users/YOUR_USERNAME.yaml` with your paths and W&B settings. Pass `--user YOUR_USERNAME` when running.
72+
73+
7. (Optional) For crystal dataset construction from `.cif` files, install the [CSD Python API](https://www.ccdc.cam.ac.uk/) with a valid CCDC license.
8574

8675
## Reference
87-
If you use this code in any future publications, please cite our work using
88-
```@article{kilgour2023geometric,
76+
77+
If you use this code in a publication, please cite:
78+
79+
```bibtex
80+
@article{kilgour2023geometric,
8981
title={Geometric deep learning for molecular crystal structure prediction},
9082
author={Kilgour, Michael and Rogal, Jutta and Tuckerman, Mark},
91-
journal={Journal of chemical theory and computation},
83+
journal={Journal of Chemical Theory and Computation},
9284
volume={19},
9385
number={14},
9486
pages={4743--4756},
9587
year={2023},
9688
publisher={American Chemical Society}
9789
}
9890
```
99-

docs/_build/html/.buildinfo

Lines changed: 0 additions & 4 deletions
This file was deleted.

0 commit comments

Comments
 (0)