The full_config.yml file is a comprehensive reference configuration file that demonstrates all available configuration options for OpenFold3 inference and training experiments. This file is located at examples/reference_full_config/full_config.yml and serves as a complete example of all configurable settings.
The configuration file is organized into several main sections. Each section corresponds to a specific Pydantic model class defined in the OpenFold3 codebase. When you provide a runner.yml file, it overrides the default settings defined in validator.py.
- Selective Configuration: Only specify the settings you want to override in your runner YAML file. All unspecified options will use their default values.
- Command-line Priority: Command-line arguments take precedence and will override any values specified in the YAML file.
- Reference Implementation: The full configuration file serves as a reference - create your own simplified runner YAML based on your specific needs. See
examples/example_runner_yamls/for common usage examples.
Defines overall experiment parameters, including execution mode and seed configuration.
Pydantic Model: InferenceExperimentSettings
All Options:
mode(ValidModeType): Experiment mode -predictortrain(default:predict)output_dir(Path): Directory where outputs will be written (default:./)log_dir(Path | None): Directory for logs (default:null)seeds(int | list[int]): Starting seed or list of random seeds for inference (default:[42])num_seeds(int | None): Number of seeds to generate if only a starting seed is provided (default:null)use_msa_server(bool): Whether to use ColabFold MSA server (default:false)use_templates(bool): Whether to use template structures (default:false)skip_existing(bool): Skip results that already exist (default:false)
Example:
experiment_settings:
mode: predict
output_dir: ./results
seeds: [42, 100, 200]
use_msa_server: trueConfigures the PyTorch Lightning trainer for distributed training and multi-GPU inference.
Pydantic Model: PlTrainerArgs
All Options:
max_epochs(int): Maximum number of training epochs (default:1000)accelerator(str): Device type -gpuorcpu(default:gpu)precision(int | str): Numerical precision -32-true,16-mixed, etc. (default:32-true)num_nodes(int): Number of compute nodes (default:1)devices(int): Number of GPUs per node (default:1)profiler(str | None): Profiler to use (default:null)log_every_n_steps(int): Logging frequency in steps (default:1)enable_checkpointing(bool): Enable checkpointing (default:true)enable_model_summary(bool): Enable model summary (default:false)deepspeed_config_path(Path | None): Path to DeepSpeed configuration file (default:null)distributed_timeout(timedelta | None): Timeout for distributed operations (default:PT30M)mpi_plugin(bool): Use MPI plugin (default:false)
Example:
pl_trainer_args:
devices: 4
num_nodes: 1
precision: 16-mixedSpecifies model presets and custom architecture modifications.
Pydantic Model: ModelUpdate
All Options:
presets(list[str]): List of model presets to apply (default:[])predict: Inference configuration (required for inference)low_mem: Low memory mode for large structurespae_enabled: Enable Predicted Aligned Error (PAE) head
custom(dict): Custom model configuration overrides (default:{})
Example:
model_update:
presets:
- predict
- pae_enabled
- low_mem
custom: {}Pydantic Model: Fields on InferenceExperimentConfig
All Options:
inference_ckpt_path(Path | None): Path to model checkpoint file (.ptfile)- Default:
$HOME/.openfold3/of3_ft3_v1.pt - Will download parameters if not present
- Default:
cache_path(Path | None): Directory for storing cached model parameters- Default:
$HOME/.openfold3/
- Default:
Configures data loading and processing.
Pydantic Model: DataModuleArgs
All Options:
batch_size(int): Batch size (default:1)data_seed(int | None): Random seed for data processing (default:42)num_workers(int): Number of data loading workers (default:10)num_workers_validation(int): Number of workers for validation (default:4)epoch_len(int): Length of training epoch (default:4)
Example:
data_module_args:
batch_size: 1
num_workers: 8Configures MSA/template feature generation and optional custom CCD input for inference.
Pydantic Model: InferenceDatasetConfigKwargs
All Options:
ccd_file_path(FilePath | None): Path to custom Chemical Component Dictionary file for inference (.cifor.bcif). Ifnull, uses Biotite's bundled CCD (default:null)msa(MSASettings): MSA processing settings (see below)template(TemplateSettings): Template processing settings (see below)
For inference, set dataset_config_kwargs.ccd_file_path to provide a custom CCD file.
This can be useful for finer control of atom names of custom ligands, as well as
allowing for more readable query JSONs using user-defined ligand keys.
- Supported formats:
.bcifand.cif. .cifinput is converted to temporaryBinaryCIFbefore being passed to Biotite. Note that this on-the-fly conversion may add over a minute of startup time..bcifcan also be generated from a cif-file beforehand, using preprocess_ccd_biotite.py
Example:
dataset_config_kwargs:
ccd_file_path: /path/to/custom/components.cifControls how MSAs are parsed and processed into features.
Pydantic Model: MSASettings
All Options:
max_rows_paired(int): Maximum rows for paired MSAs (default:8191)max_rows(int): Maximum total MSA rows (default:16384)subsample_with_bands(bool): Use MMSeqs2-style subsampling (default:false, not currently supported)min_chains_paired_partial(int): Minimum chains for partial pairing (default:2)pairing_mask_keys(list[str]): Masks to apply during pairing (default:["shared_by_two", "less_than_600"])moltypes(list[MoleculeType]): Molecule types to process (default:[0, 1]for protein and RNA)max_seq_counts(dict): Max sequences per MSA file (default includes: uniref90_hits: 10000, uniprot_hits: 50000, etc.)msas_to_pair(list[str]): MSA files to use for online pairing (default:["uniprot_hits", "uniprot"])aln_order(list): Order to vertically concatenate MSA files (default includes: uniref90_hits, bfd_uniclust_hits, etc.)paired_msa_order(list): Order to vertically concatenrate pre-paired MSAs (default:["colabfold_paired"])
Example:
dataset_config_kwargs:
msa:
max_rows: 16384
max_rows_paired: 8191
moltypes: [0, 1] # protein and RNAControls template structure processing.
Pydantic Model: TemplateSettings
All Options:
n_templates(int): Number of templates to use (default:4)take_top_k(bool): Use top K templates by quality (default:false)distogram(TemplateDistogramSettings): Distogram binning settingsmin_bin(float): Minimum distance bin (default:3.25)max_bin(float): Maximum distance bin (default:50.75)n_bins(int): Number of bins (default:39)
Example:
dataset_config_kwargs:
template:
n_templates: 4
take_top_k: trueConfigures the format of output files.
Pydantic Model: OutputWritingSettings
All Options:
structure_format(Literal["pdb", "cif"]): Output format (default:cif)full_confidence_output_format(Literal["json", "npz"]): Confidence output format (default:json)write_features(bool): Write intermediate features (default:false)write_latent_outputs(bool): Write model intermediate outputs (default:false)
Example:
output_writer_settings:
structure_format: pdb
full_confidence_output_format: jsonConfigures the ColabFold MSA server integration.
Pydantic Model: MsaComputationSettings
All Options:
msa_file_format(Literal["npz", "a3m"]): Format for saved MSAs (default:npz)server_user_agent(str): User agent string (default:openfold)server_url(Url): ColabFold server URL (default:https://api.colabfold.com)save_mappings(bool): Save sequence ID mappings (default:true)msa_output_directory(Path): Directory for MSA outputs (default: temporary directory)cleanup_msa_dir(bool): Delete MSAs after processing (default:true)
Example:
msa_computation_settings:
msa_file_format: npz
cleanup_msa_dir: false
msa_output_directory: /path/to/msasConfigures template structure preprocessing and filtering.
Pydantic Model: TemplatePreprocessorSettings
All Options:
mode(Literal["train", "predict"]): Processing mode (default:predict)moltypes(list[MoleculeType]): Molecule types to process (default:[0]for protein)max_sequences_parse(int): Max sequences to parse (default:200)max_seq_id(float | None): Maximum sequence identity threshold (default:null)min_align(float | None): Minimum alignment coverage (default:null)min_len(int | None): Minimum aligned residues (default:null)max_release_date(datetime | None): Maximum template release date (default:null)min_release_date_diff(int | None): Minimum days between query and template release (default:null)max_templates(int): Maximum templates per chain (default:20)fetch_missing_structures(bool): Fetch missing structures from PDB (default:true)create_precache(bool): Cache template structure data (default:false)preparse_structures(bool): Preparse structures into .npz files (default:false)create_logs(bool): Create preprocessing logs (default:false)n_processes(int): Number of preprocessing processes (default:1)chunksize(int): Tasks per worker in multiprocessing (default:1)structure_directory(Path | None): Directory for template structures (default:null)structure_file_format(str): File format of structures -ciforpdb(default:cif)output_directory(Path | None): Output directory for templates (default:null)precache_directory(Path | None): Directory for template precache (default:null)structure_array_directory(Path | None): Directory for preparsed structures (default:null)cache_directory(Path | None): Directory for template cache (default:null)log_directory(Path | None): Directory for logs (default:null)ccd_file_path(Path | None): Path to Chemical Component Dictionary file. Primarily useful for standalone template preprocessing workflows; for inference, preferdataset_config_kwargs.ccd_file_path(default:null)
Example:
template_preprocessor_settings:
mode: predict
max_templates: 20
fetch_missing_structures: trueFor the complete list of default values, see the Pydantic model classes in:
openfold3/entry_points/validator.py- Main configuration classesopenfold3/projects/of3_all_atom/config/dataset_config_components.py- MSA and template settingsopenfold3/core/data/tools/colabfold_msa_server.py- MSA server settingsopenfold3/core/data/pipelines/preprocessing/template.py- Template preprocessing settings