Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions .github/workflows/lint_and_test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -55,10 +55,9 @@ jobs:

test_digs:
name: pytest (jojo)
runs-on: [jojo]
runs-on: [self-hosted]
timeout-minutes: 30
needs: lint
if: github.event_name == 'workflow_dispatch'
steps:
- uses: actions/checkout@v4
- name: Run tests
Expand Down
3 changes: 2 additions & 1 deletion .github/workflows/release_and_docs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ on:
push:
branches:
- production
- doc_release

jobs:
release_and_docs:
Expand Down Expand Up @@ -156,4 +157,4 @@ jobs:
keep_files: true # Keep existing versions
force_orphan: false # Don't force orphan, preserve history
user_name: 'github-actions[bot]'
user_email: 'github-actions[bot]@users.noreply.github.com'
user_email: 'github-actions[bot]@users.noreply.github.com'
23 changes: 14 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,18 +4,22 @@
[![Documentation Status](https://img.shields.io/badge/docs-latest-brightgreen.svg)](https://baker-laboratory.github.io/atomworks-dev/latest/index.html)
[![License: BSD 3-Clause](https://img.shields.io/badge/License-BSD%203--Clause-blue.svg)](https://opensource.org/licenses/BSD-3-Clause)

<img src="docs/_static/atomworks_logo_color.svg" width="450" alt="atomworks logo">
<div align="center">
<img src="docs/_static/atomworks_logo_color.svg" width="450" alt="atomworks logo">
</div>

**atomworks** is an open-source platform that maximizes research velocity for biomolecular modeling tasks. Much like how [Torchdata](https://docs.pytorch.org/data/beta/index.html) enables rapid prototyping within the vision and language domains, AtomWorks aims to accelerate development and experimentation within biomolecular modeling.
**atomworks** is an open-source platform that maximizes research velocity for biomolecular modeling tasks. Much like how [Torchvision](https://docs.pytorch.org/vision/stable/index.html) enables rapid prototyping within the vision domain, and [Torchaudio](https://docs.pytorch.org/audio/main/) within the audio domain, AtomWorks aims to accelerate development and experimentation within biomolecular modeling.

> **⚠️ Notice:** We are currently finalizing some cleanup work within our repositories. Please expect the APIs (e.g., function and class names, inputs and outputs) to stabilize within the next two weeks. Thank you for your patience!
> **⚠️ Notice:** We are currently finalizing some cleanup work within our repositories. Please expect the APIs (e.g., function and class names, inputs and outputs) to stabilize within the next one week. Thank you for your patience!

If you're looking for the models themselves (e.g., RF3, MPNN) that integrate with AtomWorks rather than the underlying framework, check out [ModelForge](https://github.com/RosettaCommons/modelforge)

> **💡 Note:** Not sure where to start? We've made some [examples in the AtomWorks documentation](https://baker-laboratory.github.io/atomworks-dev/latest/auto_examples/index.html) that work through several helpful scenarios; a full tutorial is under construction!

AtomWorks is composed of two symbiotic libraries:

- **atomworks.io:** A universal Python toolkit for parsing, cleaning, manipulating, and converting biological data (structures, sequences, small molecules). Built on the [biotite](https://www.biotite-python.org/) API, it seamlessly loads and exports between standard formats like mmCIF, PDB, FASTA, SMILES, MOL, and more.
- **atomworks.ml:** Advanced dataset featurization and sampling for deep learning workflows that uses `atomworks.io` as its structural backbone. We provide a comprehensive, pre-built and well-tested set of `Transforms` for common tasks that can be easily composed into full deep-learning pipelines; users may also create their own `Transforms` for custom operations.
- `atomworks.io`: A universal Python toolkit for parsing, cleaning, manipulating, and converting biological data (structures, sequences, small molecules). Built on the [biotite](https://www.biotite-python.org/) API, it seamlessly loads and exports between standard formats like mmCIF, PDB, FASTA, SMILES, MOL, and more. Broadly useful for anyone who works with structural data for biomolecules.
- `atomworks.ml`: Advanced dataset featurization and sampling for deep learning workflows that uses `atomworks.io` as its structural backbone. We provide a comprehensive, pre-built and well-tested set of `Transforms` for common tasks that can be easily composed into full deep-learning pipelines; users may also create their own `Transforms` for custom operations.

For more detail on the motivation for and applications of AtomWorks, please see the [preprint](https://doi.org/10.1101/2025.08.14.670328).

Expand All @@ -25,15 +29,15 @@ AtomWorks is built atop [biotite](https://www.biotite-python.org/): We are grate

## atomworks.io

> *A general-purpose Python toolkit for cleaning up, standardizing, and working with biomolecular files - based on biotite*
> *A general-purpose Python toolkit for cleaning, standardizing, and manipulating with biomolecular structure files - built atop [biotite](https://www.biotite-python.org/):

**atomworks.io** lets you:

- Parse, convert, and clean any common biological file (structure or sequence). For example, identifying and removing leaving groups, correcting bond order after nucleophilic addition, fixing charges, parsing covalent geometries, and appropriate treatment of structures with multiple occupancies and ligands at symmetry centers
- Transform all data to a consistent `AtomArray` representation for further analysis or machine learning applications, regardless of initial source
- Model missing atoms (those implied by the sequence but not represented in the coordinates) and initialize entity- and instance-level annotations (see the [glossary]() for more detail on our composable naming conventions)

We have found `atomworks.io` to be useful to a general bioinformatics and protein design audience; in many cases, `atomworks.io` can replace bespoke scripts and manual curation, enabling researchers to spend more time testing hypothesis and less time juggling dozens of tools and dependencies.
We have found `atomworks.io` to be generally useful to a broad bioinformatics and protein design audience; in many cases, `atomworks.io` can replace bespoke scripts and manual curation, enabling researchers to spend more time testing hypothesis and less time juggling dozens of tools and dependencies.

---

Expand All @@ -45,11 +49,12 @@ We have found `atomworks.io` to be useful to a general bioinformatics and protei

- A library of pre-built, well-tested `Transforms` that can be slotted into novel pipelines
- An extensible framework, integrated with `atomworks.io`, to write `Transforms` for arbitrary use cases
- Scripts to pre-process the PDB or other databases into dataframes appropriate for network training
- Efficient sampling and batching utilities for training machine learning models
- Pre-built datasets and samplers suitable for most model training scenarios

Within the AtomWorks paradigm, the output of each `Transform` is not an opaque dictionary with model-specific tensors but instead an updated version of our atom-level structural representation (Biotite's `AtomArray`). Operations within – and between – pipelines thus maintain a common vocabulary of inputs and outputs.

We have found that `atomworks.ml` **dramatically** reduces the overhead of starting, and completing, many ML projects; research topics that once took months now achieve signal within weeks if not days, accelerating the pace of innovation.

---

## Installation
Expand Down
Binary file added docs/_static/examples/dataset_exploration_01.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
1 change: 1 addition & 0 deletions docs/api_reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,5 +5,6 @@ API
:maxdepth: 2
:caption: API Modules

core
io
ml
12 changes: 12 additions & 0 deletions docs/core.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
Core Modules
============

The core modules provide fundamental utilities, constants, and enumerations used throughout the atomworks library.

.. toctree::
:maxdepth: 2

core/common
core/constants
core/enums

8 changes: 8 additions & 0 deletions docs/core/common.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
Common Utilities
================

.. automodule:: atomworks.common
:members:
:undoc-members:
:show-inheritance:

8 changes: 8 additions & 0 deletions docs/core/constants.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
Constants
=========

.. automodule:: atomworks.constants
:members:
:undoc-members:
:show-inheritance:

8 changes: 8 additions & 0 deletions docs/core/enums.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
Enumerations
============

.. automodule:: atomworks.enums
:members:
:undoc-members:
:show-inheritance:

2 changes: 1 addition & 1 deletion docs/examples/annotate_and_save_structures.py
Original file line number Diff line number Diff line change
Expand Up @@ -220,6 +220,6 @@ def fix_boolean_annotation(atom_array: struc.AtomArray, annotation_name: str) ->

########################################################################
# Related Examples
# ----------
# ---------------
#
# - :doc:`pocket_conditioning_transform` - Create custom transforms for ligand pocket identification and ML feature generation
Loading
Loading