Cifs direct template structure#37
Conversation
|
Hi @thanhnamitit , this looks really great! Thank you especially for providing such detailed examples complete with cifs, example queries, and documentation and initial results. It is greatly appreciated and will be super helpful for future users. I will review this in greater depth over the next few days, but a few quick comments:
Thank you! |
5de99e9 to
253e5c8
Compare
|
Hi @jnwei, Thank you so much for reviewing the MR and for your kind words! I've addressed all your comments: 1. MSA-Free Testing with CIF Direct TemplatesI created a comprehensive E2E test script (
Bug Found & Fixed: During testing, I discovered an issue when running multimers in MSA-free mode. In if not msa_arrays_to_pair_i:
continueAll 8 tests now pass successfully, You can take a look at the attached log file! 2. HuggingFace Examples PRI've submitted the examples to HuggingFace: (https://huggingface.co/OpenFold/OpenFold3/discussions/12#694272355826ce8d56b0aaac) 3. Documentation UpdatesUpdated the following documentation with the new
Please review when you have a chance. Let me know if you need any changes! |
|
Thank you so much @thanhnamitit ! Overall the changes look great. Thank you also for adding documentation and examples to the HuggingFace repository. We'll review this further in the new year, and we should be able to add this in soon after. |
|
Now that we have released v 0.4.0 I can revisit this PR and handle the merge with some of the recent template updates. I apologize for the delay on this PR. |
|
@thanhnamitit what's your vision for this PR? It's fallen a bit behind (just by looking at the merge conflicts) and I'd be leaning to close it – however, this is a feature that a few folks have asked for in the community. |
Resolved 4 conflicts: - data_module.py: merged import sets (random + multiprocessing/platform/sys), dropped undefined RequirementCache usage - preprocessing/template.py: kept both cif_direct_min_score (local) and min_f_resolved (upstream); kept both _parse_templates_from_cif_files (local CIF-direct) and preprocess_templates (upstream) methods - sample_processing/template.py: kept local CIF-direct args (atom_array slice + cif_assembly_cache) - msa.py: used upstream if-block form to match post-conflict indentation Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
jnwei
left a comment
There was a problem hiding this comment.
Thank you so much @thanhnamitit for revisiting this PR and helping to merge the recent changes. I apologize again for the delay here that necessitated the merge.
One request for this PR: Is it possible to add some small to test the CifDirectParser and _parse_templates_from_cif_files? The end-to-end tests you have provided are very nice and a good indication that the cif parsing is working as intended here. However, it would be useful to have some unit tests to run in our CI to ensure we do not have unintended breakages in the future.
Some suggestions / tips about organizing the unit tests:
- For the CifDirectParser, it could be useful to follow a similar framework to the tests in test_template_parsers.py In particular, the method
_compare_template_datamay be helpful - We have some sample cif files in our test_data directory, perhaps they could be reused for these tests.
I will retest cif parsing on my systems next week.
Thank you again for this excellent contribution to OpenFold3. From recent discussions with OpenFold users, I know many are looking forward to this feature.
| from openfold3.core.data.io.sequence.template import CifDirectParser | ||
| from openfold3.core.data.io.structure.cif import _load_ciffile | ||
| from openfold3.core.data.primitives.structure.metadata import ( | ||
| get_asym_id_to_canonical_seq_dict, | ||
| ) |
There was a problem hiding this comment.
nit: Please move import statements to the top of the module
Summary
Add CIF direct template mode to OpenFold3, allowing users to provide template structures as CIF files without pre-computed alignments. The system automatically aligns template chains to query sequences and selects the best match based on sequence identity × coverage.
Changes
Core Implementation
Template Processing
template_cif_pathsfield toChainmodel with validation to ensure mutual exclusivity withtemplate_alignment_file_pathCifDirectParserclass (openfold3/core/data/io/sequence/template.py) to parse CIF files directlyTemplatePreprocessorInputInference(openfold3/core/data/pipelines/preprocessing/template.py) to support both alignment-based and CIF-direct modes_parse_templates_from_cif_files()method for CIF-direct processingDocumentation
User Guides (
docs/source/Inference.md,docs/source/template_how_to.md)Example Files
Query JSONs
query_homomer_with_direct_cif_templates.json- Homomer examplequery_multimer_with_direct_cif_templates.json- Multimer exampleTemplate CIFs (15 files total)
1dgc.cif,1ysa.cif,1zta.cif,4dmd.cif,4dme.cif6l06.cif,6l07.cif,7cnw.cif,7cnx.cif,7cnz.cif(2 chain groups)Related Issues
N/A
Testing
I've created a script to test the CIF direct template feature across three template modes: no templates, ColabFold MSA server templates, and CIF direct templates (user-provided). The script runs 6 end-to-end inference tests to compare prediction quality across these modes for both homomer and multimer queries.
Test Script
Test Output
Summary
Test Configuration:
--use_templates false)--use_templates truewith automatic template discovery)Results:
Key Findings:
Technical Validation:
template_preprocessor_settings.create_logs: trueOther Notes
N/A