You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/Inference.md
+61-2Lines changed: 61 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -24,12 +24,12 @@ Supported:
24
24
- Template-based prediction
25
25
- using ColabFold template alignments
26
26
- using pre-computed template alignments
27
+
- using direct CIF template files (no alignments required)
27
28
- Non-canonical residues
28
29
29
30
Coming soon:
30
31
31
32
- Covalently modified residues and other cross-chain covalent bonds
32
-
- User-specified template structures (as opposed to top 4)
33
33
34
34
### 1.2 DNA
35
35
@@ -301,6 +301,61 @@ model_update:
301
301
302
302
---
303
303
304
+
(inference-cif-direct-templates)=
305
+
#### 🧬 CIF Direct Template Mode
306
+
307
+
OpenFold3 supports providing template structures directly as CIF files without requiring pre-computed template alignments. In this mode, the system automatically:
308
+
1. Parses each provided CIF file
309
+
2. Extracts all chains and their sequences
310
+
3. Aligns each chain to your query sequence
311
+
4. Selects the best matching chain based on sequence identity × coverage score
312
+
313
+
This is particularly useful for stateless inference environments or when you have specific template structures but no alignment files.
314
+
315
+
**Usage:**
316
+
317
+
In your query JSON, specify `template_cif_paths` instead of `template_alignment_file_path`:
318
+
319
+
```json
320
+
{
321
+
"queries": {
322
+
"my_query": {
323
+
"chains": [
324
+
{
325
+
"molecule_type": "protein",
326
+
"chain_ids": ["A"],
327
+
"sequence": "MKLLVVDDAGQKFT...",
328
+
"template_cif_paths": [
329
+
"path/to/template1.cif",
330
+
"path/to/template2.cif",
331
+
"path/to/template3.cif"
332
+
],
333
+
"template_cif_chain_ids": ["A", null, "B"]
334
+
}
335
+
]
336
+
}
337
+
}
338
+
}
339
+
```
340
+
341
+
Optionally, use `template_cif_chain_ids` to specify which chain to use from each CIF file. Use `null` to let the system automatically select the best-matching chain.
342
+
343
+
**Configuration:**
344
+
345
+
You can adjust the minimum score threshold for chain selection in your `runner.yml`:
- For multi-chain CIF files, only the best matching chain per file is used as a template
354
+
- The `template_cif_paths` field cannot be used together with `template_alignment_file_path`
355
+
- This mode is currently supported for protein chains only
356
+
357
+
---
358
+
304
359
### 3.4 Customized ColabFold MSA Server Settings Using `runner.yml`
305
360
306
361
All settings for the ColabFold server and outputs can be set under [`msa_computation_settings`](https://github.com/aqlaboratory/openfold-3/blob/main/openfold3/core/data/tools/colabfold_msa_server.py#L904)
@@ -478,9 +533,13 @@ This file representing the full input query in a validated internal format defin
478
533
479
534
- `template_alignment_file_path`: Path to the preprocessed template cache entry `.npz` file used for template featurization. By default, template cache entries are automatically created in a short preprocessing step using the raw template alignment files provided under this same field and the template structures identified in the alignment.
480
535
536
+
- `template_cif_paths`: List of paths to CIF template files when using {ref}`CIF direct template mode <inference-cif-direct-templates>`. This field is mutually exclusive with `template_alignment_file_path`.
537
+
538
+
- `template_cif_chain_ids`: List of chain IDs to use from each corresponding CIF file in `template_cif_paths`. Use `null` for entries where automatic chain selection is desired. Must have the same length as `template_cif_paths` if provided.
539
+
481
540
- `template_entry_chain_ids`: List of template chains, identified by their entry (typically PDB) IDs and chain IDs, used for featurization. By default, up to the first 4 of these chains are used.
482
541
483
-
Note: Refer to the {doc}`Template How-To Documentation <template_how_to>` for how to specify these fields if you want to use precomputed template alignments instead of Colabfold alignments for template inputs.
542
+
Note: Refer to the {doc}`Template How-To Documentation <template_how_to>` for how to specify these fields if you want to use precomputed template alignments instead of Colabfold alignments for template inputs, or see {ref}`CIF Direct Template Mode <inference-cif-direct-templates>` for using template structures directly without alignments.
484
543
485
544
Note: If MSA and template files are persisted between runs, the same `inference_query_set.json` file can be used to resubmit the query without needing to rerun the template and MSA pipelines. To do so:
@@ -119,6 +121,19 @@ All chains must define a unique ```chain_ids``` field and appropriate sequence o
119
121
- Use this field only when running inference with **precomputed alignments**. See the {doc}`Running with Templates Documentation <template_how_to>` for details.
120
122
- If using the ColabFold MSA server, this field is automatically populated and will **override any user-provided path**.
Copy file name to clipboardExpand all lines: docs/source/template_how_to.md
+94-10Lines changed: 94 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,13 @@
1
1
# Running OpenFold3 Inference with Templates
2
2
3
-
This document contains instructions on how to use template information for OF3 predictions. Here, we assume that you already generated all of your template alignments or intend to fetch them from Colabfold on-the-fly. If you do not have any precomputed template alignments and do not want to use Colabfold, refer to our {doc}`MSA Generation Guide <precomputed_msa_generation_how_to>` before consulting this document. If you need further clarifications on how some of the template components of our inference pipeline work, refer to {doc}`this explanatory document <template_explanation>`.
3
+
This document contains instructions on how to use template information for OF3 predictions. OpenFold3 supports two template modes:
4
+
5
+
1.**Alignment-based templates** (traditional): Requires template alignments and template structures
6
+
2.**CIF direct templates** (simplified): Requires only template CIF files, no alignments needed
7
+
8
+
For alignment-based templates, we assume you already generated all of your template alignments or intend to fetch them from Colabfold on-the-fly. If you do not have any precomputed template alignments and do not want to use Colabfold, refer to our {doc}`MSA Generation Guide <precomputed_msa_generation_how_to>` before consulting this document.
9
+
10
+
If you need further clarifications on how some of the template components of our inference pipeline work, refer to {doc}`this explanatory document <template_explanation>`.
4
11
5
12
The template pipeline currently supports monomeric templates and has been tested for protein chains only.
6
13
@@ -12,10 +19,18 @@ The main steps detailed in this guide are:
12
19
(1-template-files)=
13
20
## 1. Template Files
14
21
15
-
Template featurization requires query-to-template **alignments** and template **structures**.
22
+
OpenFold3 supports two modes for providing template information:
23
+
24
+
### Alignment-Based Mode (Traditional)
25
+
Requires query-to-template **alignments** and template **structures**. Sections 1.1 and 1.2 below describe the required file formats.
26
+
27
+
### CIF Direct Mode (Simplified)
28
+
Requires only template **CIF files**. The system automatically aligns template chains to your query sequence and selects the best matching chain. See {ref}`Section 2.3 <23-cif-direct-templates>` for usage details.
29
+
30
+
---
16
31
17
32
(11-template-aligment-file-format)=
18
-
### 1.1. Template Aligment File Format
33
+
### 1.1. Template Alignment File Format (Alignment-Based Mode)
19
34
20
35
Template alignments can be provided in either `sto`, `a3m` or `m8` format. Template alignments from the Colabfold server are in `m8` format.
Note that since `m8` files do not provide actual alignments, we only use them to identify which structure files to get templates from, retrieve sequences from these structure files and always realign them to the query sequence using Kalign. More on this in the [template processing explanatory document](template_explanation.md).
75
90
76
-
### 1.2. Template Structure File Format
91
+
### 1.2. Template Structure File Format (Alignment-Based Mode)
92
+
93
+
For alignment-based templates, template structures currently can only be provided in `cif` format. An upcoming release will add support for parsing templates from `pdb` files.
77
94
78
-
Template structures currently can only be provided in `cif` format. An upcoming release will add support for parsing templates from `pdb` files.
95
+
**Note:** For {ref}`CIF direct mode <23-cif-direct-templates>`, template CIF files are specified directly in the query JSON without separate structure directories.
The data pipeline needs to know which template alignment to use for which chain. This information is provided by specifying the {ref}`paths to the alignments <31-protein-chains>` for each chain's `template_alignment_file_path` field in the inference query json file.
102
+
For alignment-based templates, the data pipeline needs to know which template alignment to use for which chain. This information is provided by specifying the {ref}`paths to the alignments <31-protein-chains>` for each chain's `template_alignment_file_path` field in the inference query json file.
86
103
87
104
Note that when fetching alignments from the Colabfold server, `template_alignment_file_path` fields are automatically populated.
88
105
@@ -118,9 +135,9 @@ Note that when fetching alignments from the Colabfold server, `template_alignmen
118
135
</code></pre>
119
136
</details>
120
137
121
-
### 2.2. Using Specific Templates
138
+
### 2.2. Using Specific Templates (Alignment-Based Mode)
122
139
123
-
By default, the template pipeline automatically populates the `template_entry_chain_ids` field with [n templates](https://github.com/aqlaboratory/openfold-3/blob/main/openfold3/core/data/pipelines/preprocessing/template.py#L1535) from the alignment, which is then further subset to the [top k templates](https://github.com/aqlaboratory/openfold-3/blob/main/openfold3/projects/of3_all_atom/config/dataset_config_components.py#L116) during featurization for inference.
140
+
By default, for alignment-based templates, the template pipeline automatically populates the `template_entry_chain_ids` field with [n templates](https://github.com/aqlaboratory/openfold-3/blob/main/openfold3/core/data/pipelines/preprocessing/template.py#L1535) from the alignment, which is then further subset to the [top k templates](https://github.com/aqlaboratory/openfold-3/blob/main/openfold3/projects/of3_all_atom/config/dataset_config_components.py#L116) during featurization for inference.
124
141
125
142
In an **upcoming release**, we will add support for specifying *specific templates* for the data pipeline to use for featurization. This will be possible through the `template_entry_chain_ids` field:
126
143
@@ -156,10 +173,77 @@ entry3_A MK----DDARGQGKFT
156
173
//
157
174
```
158
175
176
+
(23-cif-direct-templates)=
177
+
### 2.3. CIF Direct Templates (No Alignments Required)
178
+
179
+
OpenFold3 supports providing template structures directly as CIF files without requiring pre-computed template alignments. This is particularly useful for:
- Quick predictions when you have specific template structures
182
+
- Simplified workflows without external alignment tools
183
+
184
+
#### How It Works
185
+
186
+
In CIF direct mode, the system automatically:
187
+
1. Parses each provided CIF file to extract all chains and their sequences
188
+
2. Aligns each chain sequence to your query sequence using sequence alignment
189
+
3. Scores each chain by `sequence_identity × coverage`
190
+
4. Selects the best matching chain as the template (if score ≥ minimum threshold)
191
+
192
+
For multi-chain CIF files, only the best matching chain per file is used.
193
+
194
+
#### Usage Example
195
+
196
+
Specify `template_cif_paths` instead of `template_alignment_file_path` in your query JSON:
197
+
198
+
```json
199
+
{
200
+
"queries": {
201
+
"my_protein": {
202
+
"chains": [
203
+
{
204
+
"molecule_type": "protein",
205
+
"chain_ids": ["A", "B"],
206
+
"sequence": "XRMKQLEDKVEELLSKNYHLENEVARLKKLVGER",
207
+
"template_cif_paths": [
208
+
"templates/1dgc.cif",
209
+
"templates/1ysa.cif",
210
+
"templates/1zta.cif"
211
+
]
212
+
}
213
+
]
214
+
}
215
+
}
216
+
}
217
+
```
218
+
219
+
**Example query files:**
220
+
-[Homomer with direct CIF templates](https://github.com/aqlaboratory/openfold-3/blob/main/examples/example_inference_inputs/query_homomer_with_direct_cif_templates.json)
221
+
-[Multimer with direct CIF templates](https://github.com/aqlaboratory/openfold-3/blob/main/examples/example_inference_inputs/query_multimer_with_direct_cif_templates.json)
222
+
223
+
#### Configuration
224
+
225
+
Adjust the minimum score threshold for chain selection in your `runner.yml`:
Only chains with a score (sequence identity × coverage) above this threshold will be considered as valid templates.
233
+
234
+
#### Important Notes
235
+
236
+
- The `template_cif_paths` field is **mutually exclusive** with `template_alignment_file_path` - you must use one or the other, not both
237
+
- Template structures must be in CIF format
238
+
- Currently supported for protein chains only
239
+
- For multi-chain CIF files, the system automatically selects the best matching chain per file
240
+
159
241
(3-optimizations-for-high-throughput-workflows)=
160
242
## 3. Optimizations for High-Throughput Workflows
161
243
162
-
For high-throughput use cases, where a large number of structures are to be predicted, template processing can take a significant amount of time even with the built-in {doc}`deduplication utility <template_explanation>` we have for template alignment and structure processing. To avoid having to spend GPU compute on data transformations, we provide separate template preprocessing scripts to generate the necessary inputs from which template featurization can run efficiently in a subsequent job without being a bottleneck to the model forward pass.
244
+
**Note:** The optimizations described in this section apply to **alignment-based templates**. If you're using {ref}`CIF direct templates <23-cif-direct-templates>`, the workflow is already simplified and these preprocessing steps are not necessary.
245
+
246
+
For high-throughput use cases with alignment-based templates, where a large number of structures are to be predicted, template processing can take a significant amount of time even with the built-in {doc}`deduplication utility <template_explanation>` we have for template alignment and structure processing. To avoid having to spend GPU compute on data transformations, we provide separate template preprocessing scripts to generate the necessary inputs from which template featurization can run efficiently in a subsequent job without being a bottleneck to the model forward pass.
0 commit comments