Benchmark for Biomolecular Structure Prediction Models

📄 From Dataset Curation to Unified Evaluation: Revisiting Structure Prediction Benchmarks with PXMeter

‼️This document describes the dataset provided in the initial PXMeter release (the PXM-Legacy dataset). The evaluation results in this version were generated using the PXMeter code at commit 9830335c68d0c918f7776ad58946cfa5e7690e16.

This repository provides evaluation codes for assessing models using our curated evaluation sets:

Dataset	Description	Metrics
RecentPDB	Evaluates RecentPDB low homology subset. Antibody-antigen and monomer subsets reported separately.	DockQ success rates (>0.23), LDDT
AF3-AB	Analyses antibody-antigen subset of AlphaFold3.	DockQ success rates (>0.23), LDDT
dsDNA-Protein	Focuses on intra-DNA chains and DNA-Protein interfaces, aggregating LDDT metrics.	LDDT
RNA-Protein	Evaluates intra-DNA chains and RNA-protein interfaces with LDDT aggregation.	LDDT
PoseBusters	Assesses pocket-aligned RMSD of small molecules (PoseBusters V2).	RMSD success rates (< 2 Å)

💡 Usage

0. Download the dataset

The benchmark data is licensed under CC0.

Before running the code for the first time, download the necessary dataset files:

wget https://pxmeter.tos-cn-beijing.volces.com/evaluation_supported_data.tar.gz
tar xzvf evaluation_supported_data.tar.gz -C [your_path]

To download the full dataset used in our article, including model input files, inference output structures, evaluation output JSON files, and summaries:

wget https://pxmeter.tos-cn-beijing.volces.com/evaluation_full_data.tar.gz
tar xzvf evaluation_full_data.tar.gz -C [your_path]

You can place the extracted evaluation folder in the directory from which you run the benchmark code, or specify its location via an environment variable:

export PXM_EVAL_DATA_ROOT_PATH="your/path/to/evaluation"

1. Evaluate inference results

Run the evaluation script as follows:

python benchmark/run_eval.py -i [infer_dir] -o [output_dir] -d [dataset] -c [chunk_str] -m [model] -n [num_cpu]

infer_dir: Directory containing inference results (CIF files and confidence files).
output_dir: Directory where evaluation results will be saved.
dataset: Dataset to evaluate (options: "RecentPDB", "PoseBusters", "dsDNA-Protein", "RNA-Protein", "AF3-AB").
chunk_str: Chunk string for distributed evaluation (e.g., '1of5'), used when running evaluations across multiple machines. The default is None.
model: Model name should be one of the supported options: "protenix", "boltz", "chai" or "af2m", as defined in "benchmark.run_eval.run_batch_eval", which calls corresponding Evaluators.
num_cpu: Number of CPU cores to use. Default is 1.

2. Create JSON describing evaluation results paths

Example JSON structure:

{
  "name": {
    "model": "model name (protenix, af2m, chai, boltz)",
    "seeds": [101, 102, "..."],
    "dataset_path": {
      "RecentPDB": "path/to/eval_results/RecentPDB",
      "PoseBusters": "path/to/eval_results/PoseBusters",
      "AF3-AB": "path/to/eval_results/AF3_AB"
    }
  }
}

Allowed keys for dataset_path are: "RecentPDB", "PoseBusters", "dsDNA-Protein", "RNA-Protein", and "AF3-AB". Each dataset can include only a subset of these keys.

3. Aggregate and display evaluation results

Run the aggregation script: python benchmark/show_intersection_results.py -i [input_json] -o [output_path] -d [dataset_names] -n [num_cpu] -c [subset_csv] --overwrite_agg

Parameters:

input_json: Path to JSON file with evaluation results (created in step 2).
output_path (optional): Directory for final aggregated CSV output. Defaults to "./pxm_results".
dataset_names (optional): Comma-separated dataset names to compare using intersections of "chain" and "interface". Defaults to all names in JSON.
num_cpu (optional): Number of CPU cores for aggregation. Defaults to all available cores.
overwrite_agg (optional): Overwrite existing aggregated CSV files. Defaults to False.
subset_csv (optional): CSV file with columns ["type", "entry_id", "chain_id_1", "chain_id_2"]. Used to subset results. "type" can be "chain" or "interface".

Output structure

The script outputs summary CSV files in the directories specified by dataset_path in the JSON and creates a consolidated CSV with aggregated metrics in output_path:

pxm_results
├── DockQ_details.csv
├── DockQ_results.csv
├── LDDT_details.csv
├── LDDT_results.csv
├── RMSD_details.csv
├── RMSD_results.csv
├── Summary_table.csv
└── Summary_table.txt

Summary_table.csv and Summary_table.txt provide a concise overview of key metrics such as DockQ success rate, PoseBusters success rate, and LDDT (with protein-protein interfaces indicated as prot_prot).
*_results.csv files provide a full view of aggregated evaluation metrics.
*_details.csv files allow exploration of sample-specific metrics selected by each ranker.

🔄 Reproduction of RecentPDB Low Homology Dataset Construction

This section outlines steps to construct the RecentPDB Low Homology dataset. File paths required for scripts are specified in benchmark.configs.data_config. It’s recommended to specify new output paths in scripts to prevent overwriting default files.

1. Filter to RecentPDB

Run this command to filter RecentPDB entries:

python benchmark/scripts/filter_for_recentpdb.py -c [mmcif_dir] -o [chain_interface_csv] -m [meta_csv] -n [num_cpu]

mmcif_dir: Directory with MMCIF files downloaded from RCSB PDB.
chain_interface_csv: Output CSV file recording filtered chain and interface information for RecentPDB dataset.
meta_csv: Output CSV file containing meta information of filtered entries.
num_cpu: Number of CPU cores for processing. Recommended CPU-to-memory (GB) ratio is 1:4.

2. Filter to Low Homology Subset

Run this command to filter to the Low Homology subset:

python benchmark/scripts/filter_to_lowh.py -c [chain_interface_csv] -t [test_to_train_json] -o [output_protein_lowh_csv] -n [output_nuc_lowh_csv]

chain_interface_csv: Input CSV with filtered chain and interface info from RecentPDB.
test_to_train_json: A JSON file containing a dictionary where keys are entities from the test set in the format '{PDB ID}_{Entity ID}', and values are lists of similar entities (high homology) from the train set in the same format.
output_protein_lowh_csv: Output CSV with filtered chain and interface info for RecentPDB Low Homology Protein subset.
output_nuc_lowh_csv: Output CSV with filtered chain and interface info for RecentPDB Low Homology Nucleic Acids subset.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark for Biomolecular Structure Prediction Models

💡 Usage

0. Download the dataset

1. Evaluate inference results

2. Create JSON describing evaluation results paths

3. Aggregate and display evaluation results

Output structure

🔄 Reproduction of RecentPDB Low Homology Dataset Construction

1. Filter to RecentPDB

2. Filter to Low Homology Subset

FilesExpand file tree

legacy_dataset_reference.md

Latest commit

History

legacy_dataset_reference.md

File metadata and controls

Benchmark for Biomolecular Structure Prediction Models

💡 Usage

0. Download the dataset

1. Evaluate inference results

2. Create JSON describing evaluation results paths

3. Aggregate and display evaluation results

Output structure

🔄 Reproduction of RecentPDB Low Homology Dataset Construction

1. Filter to RecentPDB

2. Filter to Low Homology Subset