Evaluation & Analysis Guide

Metric definitions, result CSV structure, motif scaffolding, and Python analysis for the Proteina-Complexa pipeline.

Documentation Map

Running a design? See Inference Guide

Tuning YAML configs? See Configuration Guide

Parameter sweeps? See Sweep System

Search metadata? See Search Metadata

Pipeline Overview
Protein Types & Result Types
Evaluation Step (evaluate)
Motif Scaffolding Deep Dive
Analysis Step (analyze)
Result CSV Reference
Reading Results in Python
Success Criteria
Metric Interpretation Cheat Sheet
Environment Variables

Pipeline Overview

The evaluation workflow has two stages:

evaluate  →  analyze
  │              │
  │  Per-sample  │  Aggregate, filter,
  │  metrics     │  success rates, diversity
  │              │
  ▼              ▼
  CSV per job    Summary CSVs + organized output

Evaluate computes per-sample metrics (RMSD, designability, folding confidence, etc.) and writes one CSV per parallel job.

Analyze loads all per-job CSVs, merges them, computes pass rates, diversity, and saves organized output.

# Run both stages (standalone)
complexa evaluate configs/evaluate.yaml
complexa analyze  configs/evaluate.yaml

# Or as part of a full design pipeline (evaluate + analyze run automatically)
complexa design configs/search_binder_local_pipeline.yaml          # Protein binder
complexa design configs/search_ligand_binder_local_pipeline.yaml   # Ligand binder
complexa design configs/search_ame_local_pipeline.yaml             # AME motif scaffolding

Protein Types & Result Types

Every evaluation is configured by two keys: protein_type (set in evaluation config, controls which metrics are computed) and result_type (set in analysis config, controls default success thresholds).

`protein_type`	`result_type`	What is evaluated	Pipeline config	Default thresholds
`binder`	`protein_binder`	Binder + protein target	`search_binder_local_pipeline.yaml`	i_pAE*31 <= 7.0, pLDDT >= 0.9, scRMSD_ca < 1.5
`binder`	`ligand_binder`	Binder + ligand target	`search_ligand_binder_local_pipeline.yaml`	min_ipAE*31 < 2.0, scRMSD_ca < 2.0, ligand scRMSD < 5.0
`monomer`	`monomer`	Single-chain protein	—	designability scRMSD < 2.0
`monomer_motif`	`monomer_motif`	Single-chain + motif region	—	motif RMSD + codesignability
`motif_binder`	`motif_protein_binder`	Binder + protein target + motif	standalone `evaluate_motif_binder.yaml`	Joint binder + motif
`motif_binder`	`motif_ligand_binder`	Binder + ligand target + motif	`search_ame_local_pipeline.yaml`	Joint binder + motif + clash

Each base binder type has a motif counterpart that adds motif RMSD, motif sequence recovery, and (for ligand targets) ligand clash detection on top of the standard binder metrics.

Pipeline configs live under configs/pipeline/. The top-level pipeline YAML (e.g. search_binder_local_pipeline.yaml) composes stage configs via Hydra defaults. The motif protein binder evaluation can also be run standalone against outputs from any protein binder pipeline.

Evaluation Step

All evaluation uses protein_type: binder for binder designs. The difference between protein and ligand binders is in the folding method, inverse folding model, and which columns appear in the output.

Protein Binder Evaluation

Evaluates protein-protein binder designs. Uses AlphaFold2 (ColabDesign) for refolding and SolubleMPNN for sequence redesign.

protein_type: binder

metric:
  compute_binder_metrics: true
  binder_folding_method: colabdesign   # AF2 multimer
  sequence_types: [self, mpnn, mpnn_fixed]  # pipeline default: [self]
  num_redesign_seqs: 8
  inverse_folding_model: soluble_mpnn

Note: The pipeline default (binder_evaluate.yaml) uses sequence_types: [self] for speed. Add mpnn and/or mpnn_fixed for ProteinMPNN redesign evaluation.

Workflow per sample:

Inverse folding -- ProteinMPNN redesigns the binder sequence (mpnn, mpnn_fixed) or keeps the original (self)
Structure prediction -- Refold the complex with AF2 (ColabDesign)
Metrics -- Compute binding confidence (i_pAE, i_pTM, pLDDT) and structural consistency (scRMSD)

Key metrics:

Metric	Column pattern	What it measures	Good value
Interface PAE	`{seq}_complex_i_pAE`	Binding confidence	< 10 (< 5 excellent)
Interface pTM	`{seq}_complex_i_pTM`	Interface quality	> 0.5
pLDDT	`{seq}_complex_pLDDT`	Structure confidence	> 70
Binder scRMSD	`{seq}_binder_scRMSD`	Design preserved after refold	< 2.0 A

Ligand Binder Evaluation

Evaluates small-molecule binder designs. Uses RF3 for refolding and LigandMPNN for sequence redesign. Produces additional ligand-specific RMSD columns.

protein_type: binder

metric:
  compute_binder_metrics: true
  binder_folding_method: rf3_latest    # RoseTTAFold3
  sequence_types: [self, mpnn]
  num_redesign_seqs: 2
  inverse_folding_model: ligand_mpnn

Workflow per sample:

Inverse folding -- LigandMPNN redesigns the binder sequence (ligand-aware)
Structure prediction -- Refold the complex with RF3
Metrics -- Compute RF3 confidence metrics (min_ipAE, pLDDT, ipSAE) and both binder and ligand scRMSD

Key metrics (in addition to protein binder metrics above):

Metric	Column pattern	What it measures	Good value
min_ipAE	`{seq}_complex_min_ipAE`	Minimum interface PAE (primary for ligand)	< 2.0 (after * 31)
Binder scRMSD (CA)	`{seq}_binder_scRMSD_ca`	Binder backbone preserved (CA only)	< 2.0 A
Ligand scRMSD	`{seq}_ligand_scRMSD`	Ligand position preserved after refold	< 5.0 A
Ligand scRMSD (aligned)	`{seq}_ligand_scRMSD_aligned_allatom`	Ligand position after aligning binder	< 5.0 A
ipSAE	`{seq}_complex_min_ipSAE`	Interface pSAE (higher is better)	> 0.5

Key differences from protein binder evaluation:

binder_folding_method: rf3_latest instead of colabdesign
inverse_folding_model: ligand_mpnn instead of soluble_mpnn
Produces ligand RMSD columns (ligand_scRMSD, ligand_scRMSD_aligned_allatom, ligand_scRMSD_aligned_bb)
RF3 provides additional metrics: min_ipAE, min_ipSAE, avg_ipSAE, max_ipSAE, ranking_score, has_clash
Default ranking uses min_ipAE (minimize) instead of i_pAE

Monomer Evaluation

Evaluates single-chain structural quality via fold-and-compare.

metric:
  compute_monomer_metrics: true
  monomer_folding_models: [esmfold]
  designability_modes: [ca]
  codesignability_modes: [ca, all_atom]
  compute_ss: true

Workflow per sample:

Designability — ProteinMPNN redesigns the sequence → fold with ESMFold → compute scRMSD vs generated structure
Codesignability — Fold the original PDB sequence → compute scRMSD vs generated structure
Secondary structure — Compute alpha/beta/coil fractions via biotite
Sequence recovery — Compare ProteinMPNN output to original sequence

Key metrics:

Metric	Column pattern	What it measures	Good value
Designability	`_res_scRMSD_{mode}_{model}`	Best scRMSD over MPNN sequences	< 2.0 A
Single-seq designability	`_res_scRMSD_single_{mode}_{model}`	scRMSD of first MPNN sequence	< 2.0 A
Codesignability	`_res_co_scRMSD_{mode}_{model}`	scRMSD using original sequence	< 2.0 A
Best MPNN sequence	`_res_mpnn_best_sequence`	Sequence with lowest scRMSD	—
Secondary structure	`_res_ss_alpha`, `_res_ss_beta`, `_res_ss_coil`	Structure composition	—
pLDDT	`_res_pLDDT`	Refolded confidence	> 0.7

Note: {mode} is ca, bb3o, or all_atom. {model} is esmfold or colabfold. The number of MPNN redesign sequences is controlled by designability_num_seq (default 8).

Motif Evaluation (Monomer Motif)

Evaluates motif scaffolding by measuring how well the generated scaffold preserves the target motif. This is for monomer motif scaffolding (no binder/target complex).

protein_type: monomer_motif

dataset:
  motif_task_name: 1QJG_AA_TIP
  unindexed: true

metric:
  compute_motif_metrics: true
  compute_monomer_metrics: true   # also run monomer metrics on the full structure
  motif_rmsd_modes: [ca, all_atom]
  designability_modes: [ca]
  codesignability_modes: [ca, all_atom]

Workflow per sample:

Motif alignment — Align the ground-truth motif into the generated structure (indexed or unindexed)
Direct motif RMSD — Compare generated structure at motif positions vs ground truth
Motif sequence recovery — Compare generated sequence at motif positions vs ground truth
Designability — ProteinMPNN redesign → fold → compute full-structure + motif-region scRMSD
Codesignability — Fold original sequence → compute full-structure + motif-region scRMSD
Secondary structure — Compute alpha/beta/coil fractions

Key metrics (motif-specific):

Metric	Column pattern	What it measures	Good value
Motif RMSD	`_res_motif_rmsd_{mode}`	Generated vs ground-truth motif	< 1.0 A (CA), < 2.0 A (all-atom)
Motif sequence recovery	`_res_motif_seq_rec`	Fraction of motif residues matching	1.0 (perfect)
Motif-region designability	`_res_des_motif_scRMSD_{mode}_{model}`	Motif region preserved after MPNN+refold	< 1.0 A
Motif-region codesignability	`_res_co_motif_scRMSD_{mode}_{model}`	Motif region preserved after refold	< 1.0 A

Motif Binder Evaluation

Evaluates designs that combine binder refolding with motif preservation. The generated protein must both bind a target and preserve a functional motif. There are two variants depending on whether the target is a protein or a ligand:

Variant	`result_type`	Folding	Inverse Folding	Extra motif criteria
Motif Protein Binder	`motif_protein_binder`	ColabDesign / RF3	ProteinMPNN / SolubleMPNN	motif RMSD, seq recovery
Motif Ligand Binder (AME)	`motif_ligand_binder`	RF3	LigandMPNN	motif RMSD, seq recovery, ligand clashes

Motif Protein Binder

Adds motif constraints on top of standard protein binder evaluation. Use when the designed protein must preserve specific structural motifs while binding a protein target.

protein_type: motif_binder

metric:
  compute_motif_binder_metrics: true
  binder_folding_method: colabdesign   # or rf3_latest
  sequence_types: [self, mpnn_fixed]
  inverse_folding_model: soluble_mpnn  # protein-only

Standalone evaluation:

python -m proteinfoundation.evaluate --config-name evaluate_motif_binder \
    dataset.task_name=MY_TASK \
    metric.binder_folding_method=colabdesign \
    metric.inverse_folding_model=soluble_mpnn

Motif Ligand Binder (AME)

Adds motif constraints on top of ligand binder evaluation, with ligand clash detection. This is the evaluation mode used by the AME pipeline.

protein_type: motif_binder

metric:
  compute_motif_binder_metrics: true
  binder_folding_method: rf3_latest
  sequence_types: [self, mpnn_fixed]
  inverse_folding_model: ligand_mpnn

  # Pre/post-refolding interface metrics
  compute_pre_refolding_metrics: true
  pre_refolding:
    bioinformatics: true
    tmol: true

  compute_refolded_structure_metrics: true
  refolded:
    bioinformatics: true
    tmol: true

  # Optional monomer metrics on the binder chain
  compute_monomer_metrics: false
  monomer_folding_models: [esmfold]

Workflow per sample (both variants):

Binder refolding — Same as standard binder evaluation: inverse folding redesigns the binder sequence, then refold and compute binding confidence metrics (i_pAE, i_pTM, pLDDT, scRMSD)
Motif overlay — Compute motif RMSD and motif sequence recovery on the predicted structure, measuring how well the functional motif is preserved after refolding
Ligand clash detection (ligand variant only) — Check for steric clashes between the refolded structure and the ligand
Interface metrics — Pre- and post-refolding bioinformatics, force field, and hydrogen bond analysis
Optional monomer metrics — If enabled, run designability/codesignability on the binder chain alone

Key metrics (combines binder + motif):

Metric	What it measures	Good value
`{seq}_complex_i_pAE`	Binding confidence	< 10 (protein), varies (ligand)
`{seq}_complex_pLDDT`	Structure confidence	> 0.8
`{seq}_binder_scRMSD`	Design preserved after refold	< 2.0 A
`{seq}_motif_rmsd_pred`	Motif RMSD in predicted structure	< 2.0 A (protein), < 1.5 A (ligand)
`{seq}_correct_motif_sequence`	Motif residues match ground truth	>= 1.0 (perfect)
`{seq}_has_ligand_clashes`	Steric clashes with ligand (ligand only)	< 0.5 (no clashes)

Joint per-redesign success: A sample is "successful" when at least one redesign passes ALL binder criteria AND ALL motif criteria simultaneously. This is evaluated per redesign index, ensuring the same predicted structure satisfies both sets of requirements.

Comparison across all evaluation types:

Aspect	Binder Eval	Monomer Motif Eval	Motif Protein Binder	Motif Ligand Binder
`protein_type`	`binder`	`monomer_motif`	`motif_binder`	`motif_binder`
`result_type`	`protein_binder` / `ligand_binder`	`monomer_motif`	`motif_protein_binder`	`motif_ligand_binder`
Folding model	AF2 / RF3	ESMFold	AF2 / RF3	RF3 (ligand-aware)
Inverse folding	ProteinMPNN / SolubleMPNN	ProteinMPNN	ProteinMPNN / SolubleMPNN	LigandMPNN
Binder metrics	Yes	No	Yes	Yes
Motif RMSD	No	Yes (direct + refolded)	Yes (on predicted)	Yes (on predicted)
Ligand context	No / Yes	No	No	Yes
Ligand clashes	No	No	No	Yes

Ranking Criteria (Best Redesign Selection)

For binder and motif binder evaluation, multiple redesigned sequences are evaluated per sample. The "best" redesign is selected using a composite score from ranking criteria.

Default ranking:

Protein binder: minimize i_pAE (scale 1.0)
Ligand binder: minimize min_ipAE (scale 1.0)

Custom ranking via metric.ranking_criteria:

metric:
  ranking_criteria:
    i_pAE:
      scale: 1.0
      direction: minimize
    pLDDT:
      scale: 0.5
      direction: maximize

The composite score is sum(metric_value * scale * direction_sign) where direction_sign is +1 for minimize, -1 for maximize. The redesign with the lowest composite score is selected as best.

For motif binder evaluation, use metric.motif_ranking_criteria with the same format.

Optional ESM Metrics

ESM pseudo-perplexity can be computed alongside binder evaluation to assess sequence plausibility:

metric:
  compute_esm_metrics: true
  esm_model: facebook/esm2_t33_650M_UR50D   # default

Columns produced (per sequence type):

Column	Description
`{seq}_esm_pseudo_perplexity`	ESM pseudo-perplexity (lower = more natural)
`{seq}_esm_log_likelihood`	ESM log-likelihood

Uses ESM_DIR or CACHE_DIR environment variable for model cache.

Job Parallelization

Evaluation can be split across parallel jobs for large datasets:

job_id: 0          # This job's index (0-based)
eval_njobs: 8      # Total number of parallel jobs

Each job evaluates a shard of the input PDBs. Output CSVs are named {eval_type}_results_{config_name}_{job_id}.csv (e.g. binder_results_evaluate_0.csv). The analyze step automatically discovers and merges all per-job CSVs.

Evaluating External PDB Files

To evaluate PDB files not generated by the pipeline (e.g. from external models), use the evaluate_from_pdb_dir configs:

# Protein/ligand binder evaluation on external PDBs
complexa evaluate configs/evaluate_from_pdb_dir.yaml \
    dataset.sample_storage_path=/path/to/pdbs \
    dataset.task_name=MY_TARGET

# AME motif binder evaluation on external PDBs
complexa evaluate configs/evaluate_ame_from_pdb_dir.yaml \
    dataset.sample_storage_path=/path/to/pdbs \
    dataset.task_name=MY_TASK

Key config differences from pipeline evaluation:

Setting	Pipeline (`binder_evaluate.yaml`)	External PDB (`evaluate_from_pdb_dir.yaml`)
`input_mode`	`generated`	`pdb_dir`
`sample_storage_path`	Set by pipeline	User provides path to PDB directory
`ignore_generated_pdb_suffix`	—	`_binder.pdb` (strips suffix to find target)

Motif Scaffolding Deep Dive

Tip Atoms vs All-Atom

The atom_selection_mode in the motif task definition controls which atoms are used when defining the motif region and computing motif RMSD:

Mode	Atoms included	When to use
`all_atom`	All heavy atoms in each motif residue (N, CA, C, O, CB, side chain)	When the full atomic detail of the motif matters (e.g. active sites with precise side-chain geometry)
`tip_atoms`	Only the terminal/tip atoms of each side chain	When the backbone is flexible but side-chain endpoint placement matters (e.g. functional groups that must reach specific coordinates)
`ca`	Only C-alpha atoms	Backbone-only motif matching
`bb3o`	N, CA, C, O backbone atoms	Backbone with oxygen orientation

Where tip atoms are defined

The per-residue tip atom definitions live in src/proteinfoundation/utils/constants.py as SIDECHAIN_TIP_ATOMS. This dictionary maps each amino acid to its functional/terminal atoms:

SIDECHAIN_TIP_ATOMS = {
    "ALA": ["CA", "CB"],
    "ARG": ["CD", "CZ", "NE", "NH1", "NH2"],
    "ASP": ["CB", "CG", "OD1", "OD2"],
    "ASN": ["CB", "CG", "ND2", "OD1"],
    "CYS": ["CA", "CB", "SG"],
    "GLU": ["CG", "CD", "OE1", "OE2"],
    "GLN": ["CG", "CD", "NE2", "OE1"],
    "GLY": [],                              # no side chain
    "HIS": ["CB", "CG", "CD2", "CE1", "ND1", "NE2"],
    "LYS": ["CE", "NZ"],
    "PHE": ["CB", "CG", "CD1", "CD2", "CE1", "CE2", "CZ"],
    "TRP": ["CB", "CG", "CD1", "CD2", "CE2", "CE3", "CZ2", "CZ3", "CH2", "NE1"],
    "TYR": ["CB", "CG", "CD1", "CD2", "CE1", "CE2", "CZ", "OH"],
    # ... etc for all 20 amino acids
}

At evaluation time, src/proteinfoundation/utils/motif_utils.py::_select_motif_atoms() reads this dictionary to build the atom mask for each motif residue:

# For each residue in the motif, only these atoms are included in the mask
tip_atom_names = SIDECHAIN_TIP_ATOMS.get(residue_name, [])
selected = [atom_order[name] for name in tip_atom_names
            if name in atom_order and atom_order[name] in available_atoms]

This means:

Tip atoms mode focuses on where the side chains end up (functional groups, charged tips, ring systems) rather than backbone placement.
GLY has no tip atoms (empty list), so glycine motif residues contribute nothing to tip-atom RMSD.
The atom indices map into the 37-slot UNIFIED_ATOM37_ENCODING also defined in constants.py, which provides the canonical atom ordering for all residue types.

How `atom_selection_mode` affects metrics

When atom_selection_mode is not all_atom (e.g. tip_atoms), the motif mask contains only the selected atoms. This means CA-based RMSD modes would have an empty atom intersection. To keep columns consistent, the evaluation auto-fills CA-based motif metrics with 0.0:

atom_selection_mode = "all_atom":
  _res_motif_rmsd_ca        → computed normally
  _res_motif_rmsd_all_atom  → computed normally

atom_selection_mode = "tip_atoms":
  _res_motif_rmsd_ca        → auto-filled with 0.0 (no CA atoms in mask)
  _res_motif_rmsd_all_atom  → computed on tip atoms only

This auto-fill ensures that success criteria columns always exist and CA-based threshold checks pass automatically for non-CA motifs.

Configuration

The atom_selection_mode is set per task in the motif task dictionary (e.g. design_tasks/motif_dict.yaml):

1QJG_AA_TIP:
  contig: "15/A45-65/20"
  pdb_path: /path/to/1QJG.pdb
  atom_selection_mode: tip_atoms
  motif_only: true

1QJG_AA_NATIVE:
  contig: "15/A45-65/20"
  pdb_path: /path/to/1QJG.pdb
  atom_selection_mode: all_atom
  motif_only: true

The evaluation config references the task by name:

dataset:
  motif_task_name: 1QJG_AA_TIP   # Uses tip_atoms mode

Motif Alignment Modes: Indexed vs Unindexed

Controls how the ground-truth motif is aligned into the generated structure.

dataset:
  motif_task_name: 1QJG_AA_TIP
  unindexed: true    # or false (default)

Indexed Mode (`unindexed: false`)

The motif occupies known positions in the generated structure, determined by the contig string.

Contig: "15/A45-65/20"
         ^^            ^^
         scaffold       scaffold
              ^^^^^^^^^
              motif (residues 45-65 of chain A)

Generated structure:  [---scaffold---][===MOTIF===][---scaffold---]
                      positions 1-15   16-36        37-56

Motif residues are placed at positions 16-36 → direct comparison.

When to use: When the generative model explicitly places motif residues at contig-specified positions (the standard case for most generation pipelines).

Variable scaffolds: If different samples have different scaffold lengths, provide a per-sample contig CSV. You can set it explicitly or rely on auto-discovery.

Option 1 — Explicit path:

metric:
  motif_info_csv: /path/to/motif_info.csv

Option 2 — Auto-discovery: If motif_info_csv is not set, the evaluator looks for a file named {motif_task_name}_{job_id}_motif_info.csv in:

The evaluation output directory (where copy_motif_csvs copies CSVs)
sample_storage_path
The parent directory of sample_storage_path

Indexed mode requires this CSV to be present; if none is found, evaluation raises FileNotFoundError with instructions.

CSV format: The CSV must have a contig column. For reliable per-sample matching, include a filename column (extensionless PDB stem). If the column is present, matching falls back to sample_num (parsed from filenames like job_0_id_10_motif_TASK) or to row order (same order as PDBs).

Unindexed Mode (`unindexed: true`)

The motif residues are not at known positions. The evaluation uses greedy coordinate matching to find them.

Generated structure:  [------full structure------]
Ground-truth motif:   [==MOTIF==]

Algorithm:
  For each motif residue:
    1. Compute RMSD to every generated residue (using overlapping atoms)
    2. Optionally filter by amino acid type match
    3. Select the closest unmatched residue

When to use:

When samples are refolded structures (not direct model outputs) — refolding may reorder or renumber residues
When the generation method does not preserve contig indexing
When evaluating external model outputs where motif positions are unknown

Important: The motif coordinates must be centered (near the origin) for unindexed matching to work correctly, as the algorithm searches for the best coordinate overlap.

Contig Strings

A contig string encodes the structure layout as alternating scaffold and motif segments, separated by /:

"15/A45-65/20/A20-30/10"
 ↑   ↑       ↑   ↑      ↑
 scaffold  scaffold  scaffold
     motif       motif
     (chain A,   (chain A,
      res 45-65)  res 20-30)

Parsing rules:

Part	Format	Meaning
Integer	`15`	Scaffold segment of length 15
Chain + range	`A45-65`	Motif from chain A, residues 45 to 65 (inclusive)
Chain + single	`A33`	Single motif residue from chain A, position 33

Example dissection:

Contig: "8/A9-16/17/A34-41/8"

Segment 1: scaffold, length 8    → positions 1-8
Segment 2: motif A9-16, length 8 → positions 9-16
Segment 3: scaffold, length 17   → positions 17-33
Segment 4: motif A34-41, length 8 → positions 34-41
Segment 5: scaffold, length 8    → positions 42-49

Total length: 49 residues, 16 motif residues in 2 segments

Analysis Step

The analysis step loads evaluation CSVs, computes aggregate metrics, and organizes output.

complexa analyze configs/evaluate.yaml
# or
python -m proteinfoundation.analyze --config-name evaluate

What it does

Load results — Reads all {result_type}_results_*.csv files from the output directory
Merge monomer results — For monomer_motif and binder result types, merges monomer metrics into the primary results (handles column conflicts with _monomer suffix)
Compute pass rates — Filters by thresholds and computes success rates, grouped by experimental parameters
Compute diversity — FoldSeek (structural) and MMseqs2 (sequence) diversity on full and filtered subsets
Secondary structure — Aggregate SS fractions across groups
Organize output — Moves result files into categorized subdirectories

Analysis Modes

Controlled by aggregation.analysis_modes:

Mode	What it computes	Valid for
`binder`	Success rates, interface metrics, binder diversity	`protein_binder`, `ligand_binder`
`monomer`	Designability/codesignability pass rates, monomer diversity	all result types
`motif`	Motif RMSD pass rates, motif success criteria, motif diversity	`monomer_motif`
`motif_binder`	Joint binder + motif success filtering, per-task pass rates, diversity	`motif_protein_binder`, `motif_ligand_binder`

Defaults (from code, configs may override):

protein_binder / ligand_binder: [binder, monomer]
monomer: [monomer]
monomer_motif: [motif, monomer]
motif_protein_binder / motif_ligand_binder: [motif_binder]

The pipeline configs may add additional modes. For example, ame_analyze.yaml uses [motif_binder, monomer] to also compute monomer designability metrics.

Monomer Merging for Combined Result Types

When result_type is monomer_motif or a binder type, both motif/binder results and monomer results are evaluated separately by evaluate. The analyze step automatically merges them:

motif_results_*.csv  +  monomer_results_*.csv  →  merged DataFrame

Column conflict resolution: If a column exists in both CSVs (e.g. _res_scRMSD_ca_esmfold), the monomer version is renamed with a _monomer suffix:

motif CSV column:   _res_scRMSD_ca_esmfold           (kept as-is)
monomer CSV column: _res_scRMSD_ca_esmfold           → _res_scRMSD_ca_esmfold_monomer

Columns unique to the monomer CSV (e.g. _res_scRMSD_single_ca_esmfold) are added directly.

The analysis functions transparently handle this via resolve_monomer_column(), which looks for the canonical column name first, then falls back to the _monomer variant.

Output Directory Structure

After analyze, results are organized into subdirectories:

evaluation_results/{run_name}/
├── motif_metrics/                    # Motif pass rates, success rates
│   └── motif_results_*_combined.csv
├── monomer_metrics/                  # Designability, codesignability
├── filter_results/                   # Filtered subsets (success, RMSD thresholds)
├── diversity/                        # FoldSeek + MMseqs cluster results
│   ├── foldseek_clusters_*/
│   └── mmseqs_clusters_*/
├── amino_acid_distribution/          # Residue type distributions
├── secondary_structure/              # SS fractions
└── RAW_*.csv                         # Combined raw results (all columns)

Diversity Computation

FoldSeek (structural diversity) — Clusters samples by TM-score similarity:

Mode	Scope	Available for
`complex`	Full complex structure	binder
`binder`	Binder chain only	binder, motif_binder
`interface`	Interface residues only	binder
`monomer`	Monomer structure	monomer, motif

Alignment type 1 (structure only) and 2 (structure + sequence) are both computed.

MMseqs2 (sequence diversity) — Clusters samples by sequence identity:

Parameter	Default	Description
`min_seq_id`	0.1	Minimum sequence identity for clustering
`coverage`	0.7	Minimum alignment coverage

Both diversity methods are run on all samples and on successful samples (filtered subset per sequence type).

Analysis Output Files

File	Description
`res_nsamples.csv`	Sample counts per run
`res_designability.csv`	Designability pass rates (2 Å threshold)
`res_codesignability_{mode}.csv`	Codesignability pass rates per mode
`res_filter_binder_pass_{suffix}.csv`	Protein binder success rates
`res_filter_ligand_pass_{suffix}.csv`	Ligand binder success rates
`res_filter_motif_binder_pass_{suffix}.csv`	Motif binder joint success rates
`res_motif_binder_per_task_pass_rates.csv`	Per-task motif binder pass rates
`res_ss_biot_{suffix}.csv`	Secondary structure fractions
`res_type_prop_{suffix}.csv`	Residue type proportions
`res_aa_distribution_{suffix}.csv`	AA distribution
`res_div_foldseek_{mode}_{suffix}.csv`	FoldSeek diversity
`res_div_mmseqs_{suffix}.csv`	MMseqs diversity
`success_criteria_*.json`	Saved thresholds used for filtering

Result CSV Reference

Identifier Columns (all result types)

Column	Description
`id_gen`	Sample identifier (unique per generated structure)
`pdb_path`	Path to the generated PDB file
`L`	Protein length (residue count)
`run_name`	Experiment name from config
`ckpt_path`	Checkpoint path used for generation

Protein Binder Result Columns

Written to binder_results_*.csv by binder evaluation with binder_folding_method: colabdesign.

Per sequence type (self, mpnn, mpnn_fixed):

Column	Description
`{seq}_complex_i_pAE`	Best interface PAE (lower = better binding)
`{seq}_complex_i_pTM`	Best interface pTM (higher = better)
`{seq}_complex_pLDDT`	Best overall pLDDT (higher = more confident)
`{seq}_complex_pTM`	Best overall pTM score
`{seq}_binder_scRMSD`	Best binder backbone RMSD after refolding
`{seq}_binder_scRMSD_ca`	Binder CA-only RMSD after refolding
`{seq}_binder_scRMSD_bb3`	Binder BB3 (N, CA, C) RMSD after refolding
`{seq}_binder_scRMSD_bb3o`	Binder BB3O (N, CA, C, O) RMSD after refolding
`{seq}_binder_scRMSD_allatom`	Binder all-atom RMSD after refolding
`{seq}_complex_scRMSD`	Best full-complex RMSD after refolding
`{seq}_complex_scRMSD_ca`	Complex CA-only RMSD after refolding
`{seq}_binder_pLDDT`	Binder-only pLDDT
`{seq}_complex_pdb_path`	Path to best refolded structure
`{seq}_sequence`	Best sequence selected by ranking
`{seq}_complex_i_pAE_all`	All i_pAE values (list, one per MPNN sequence)
`{seq}_binder_scRMSD_all`	All scRMSD values (list)
`{seq}_aa_counts`	Amino acid composition (dict)
`{seq}_aa_interface_counts`	Interface residue AA composition (dict)

Optional force field metrics (if compute_pre_refolding_metrics or compute_refolded_structure_metrics enabled):

Column	Description
`generated_n_interface_hbonds_tmol`	H-bond count (generated structure)
`generated_total_interface_hbond_energy_tmol`	H-bond energy (kcal/mol)
`refolded_{seq}_n_interface_hbonds_tmol`	H-bond count (refolded structure)
`generated_binder_interface_sc`	Shape complementarity
`generated_binder_interface_dSASA`	Buried surface area (A^2)

Ligand Binder Result Columns

Written to binder_results_*.csv by binder evaluation with binder_folding_method: rf3_latest. Includes all protein binder columns above, plus:

Additional RF3 confidence metrics (per sequence type):

Column	Description
`{seq}_complex_min_ipAE`	Minimum interface PAE (primary ranking metric for ligand binders)
`{seq}_complex_min_ipSAE`	Minimum interface pSAE (higher = better)
`{seq}_complex_avg_ipSAE`	Average interface pSAE
`{seq}_complex_max_ipSAE`	Maximum interface pSAE
`{seq}_complex_ranking_score`	RF3 composite ranking score
`{seq}_complex_has_clash`	1.0 if clash detected, 0.0 otherwise

Ligand-specific RMSD metrics (per sequence type):

Column	Description
`{seq}_binder_scRMSD_ca`	Binder CA-only RMSD after refolding
`{seq}_binder_scRMSD_allatom`	Binder all-atom RMSD after refolding
`{seq}_ligand_scRMSD`	Direct ligand RMSD (generated vs refolded)
`{seq}_ligand_scRMSD_aligned_bb`	Ligand RMSD after aligning binder backbone
`{seq}_ligand_scRMSD_aligned_allatom`	Ligand RMSD after aligning binder all-atoms

Monomer Result Columns

Written to monomer_results_*.csv by monomer evaluation.

Column	Description
`_res_scRMSD_{mode}_{model}`	Best designability scRMSD (min over N MPNN sequences)
`_res_scRMSD_{mode}_{model}_all`	All designability scRMSD values (list)
`_res_scRMSD_single_{mode}_{model}`	scRMSD of first MPNN sequence only
`_res_co_scRMSD_{mode}_{model}`	Best codesignability scRMSD
`_res_co_scRMSD_{mode}_{model}_all`	All codesignability scRMSD values (list)
`_res_mpnn_sequences`	All MPNN-generated sequences (list)
`_res_mpnn_best_sequence`	Best MPNN sequence (by RMSD)
`_res_co_sequence_recovery`	Sequence recovery (MPNN vs original)
`_res_ss_alpha`	Alpha helix fraction
`_res_ss_beta`	Beta sheet fraction
`_res_ss_coil`	Coil fraction

Mode/model substitutions:

{mode}: ca (C-alpha only), all_atom (all heavy atoms), bb3o (N, CA, C, O)
{model}: esmfold, colabfold

Example column names:

_res_scRMSD_ca_esmfold — Best CA designability with ESMFold
_res_co_scRMSD_all_atom_esmfold — Best all-atom codesignability with ESMFold
_res_scRMSD_single_ca_esmfold — Single-sequence CA designability

Motif Result Columns

Written to motif_results_*.csv by motif evaluation. Includes all monomer columns above plus:

Column	Description
`contig_string`	Contig used for this sample's motif alignment
`_res_motif_rmsd_ca`	Direct motif RMSD, CA atoms
`_res_motif_rmsd_all_atom`	Direct motif RMSD, all atoms
`_res_motif_seq_rec`	Motif sequence recovery (0.0 - 1.0)
`_res_scRMSD_{mode}_{model}`	Best full-structure designability scRMSD (at motif argmin)
`_res_scRMSD_{mode}_{model}_all`	All full-structure designability scRMSD values
`_res_des_motif_scRMSD_{mode}_{model}`	Best motif-region designability scRMSD
`_res_des_motif_scRMSD_{mode}_{model}_all`	All motif-region designability scRMSD values
`_res_co_scRMSD_{mode}_{model}`	Best full-structure codesignability scRMSD (at motif argmin)
`_res_co_scRMSD_{mode}_{model}_all`	All full-structure codesignability scRMSD values
`_res_co_motif_scRMSD_{mode}_{model}`	Best motif-region codesignability scRMSD
`_res_co_motif_scRMSD_{mode}_{model}_all`	All motif-region codesignability scRMSD values
`_res_mpnn_sequences`	MPNN-generated sequences
`_res_mpnn_best_sequence`	Best MPNN sequence (by motif-region RMSD)

Important: For motif evaluation, "best" means the sequence whose motif-region scRMSD is lowest (argmin of motif scRMSD), not the full-structure minimum. The full-structure scRMSD reported is the value at that same index.

Motif Binder Result Columns

Written to motif_binder_results_*.csv by motif binder evaluation. Used by both motif_protein_binder and motif_ligand_binder result types. Combines binder and motif columns.

Metadata columns:

Column	Description
`task_name`	Motif task name
`result_type`	`motif_protein_binder` or `motif_ligand_binder`

Binder columns (same as binder results, per sequence type):

Column	Description
`{seq}_complex_i_pAE`	Best interface PAE (lower = better binding)
`{seq}_complex_i_pTM`	Best interface pTM
`{seq}_complex_pLDDT`	Best overall pLDDT
`{seq}_binder_scRMSD`	Best binder backbone RMSD after refolding
`{seq}_binder_scRMSD_ca`	CA-only binder scRMSD
`{seq}_binder_scRMSD_bb3`	BB3 (N, CA, C) binder scRMSD
`{seq}_complex_pdb_path`	Path to best refolded structure

Generated-structure motif columns (computed on the generated structure before refolding):

Column	Description
`motif_rmsd_gen`	Motif RMSD in the generated (pre-refolding) structure
`motif_seq_rec_gen`	Motif sequence recovery in generated structure
`correct_motif_sequence_gen`	Whether motif sequence is fully recovered in generated structure
`has_ligand_clashes_gen`	Ligand clash flag in generated structure (ligand binder only)

Predicted-structure motif columns (per sequence type, with _all list variants for per-redesign evaluation):

Column	Description
`{seq}_motif_rmsd_pred`	Best motif RMSD in the predicted/refolded structure
`{seq}_motif_rmsd_pred_all`	Motif RMSD per redesign (list)
`{seq}_motif_seq_rec`	Motif sequence recovery fraction
`{seq}_correct_motif_sequence`	Whether motif sequence is fully recovered (best)
`{seq}_correct_motif_sequence_all`	Motif sequence recovery per redesign (list)
`{seq}_has_ligand_clashes`	Ligand clash flag (ligand binder only, best)
`{seq}_has_ligand_clashes_all`	Ligand clash flag per redesign (list, ligand only)

Interface metric columns (if pre/post-refolding metrics enabled):

Column	Description
`generated_n_interface_hbonds_tmol`	H-bond count (generated structure)
`generated_total_interface_hbond_energy_tmol`	H-bond energy (kcal/mol)
`refolded_{seq}_n_interface_hbonds_tmol`	H-bond count (refolded structure)
`generated_binder_interface_sc`	Shape complementarity
`generated_binder_interface_dSASA`	Buried surface area (A^2)

Reading Results in Python

Loading and Basic Inspection

import pandas as pd

# Load motif results
df = pd.read_csv("evaluation_results/my_run/motif_results_my_config_combined.csv")

# Check available columns
print(df.columns.tolist())

# Basic stats
print(f"Samples: {len(df)}")
print(f"Mean motif RMSD (CA): {df['_res_motif_rmsd_ca'].mean():.2f}")
print(f"Mean motif seq rec:   {df['_res_motif_seq_rec'].mean():.2f}")

Working with List Columns

Some columns store lists of values (one per MPNN sequence). After loading from CSV, these are strings that need parsing:

import ast

# Parse list columns
df["_res_scRMSD_ca_esmfold_all"] = df["_res_scRMSD_ca_esmfold_all"].apply(ast.literal_eval)

# Get the number of sequences evaluated per sample
df["n_seqs"] = df["_res_scRMSD_ca_esmfold_all"].apply(len)

Filtering Successful Motif Scaffolds

# Direct motif RMSD filter
good_motif = df[
    (df["_res_motif_rmsd_all_atom"] < 2.0) &
    (df["_res_motif_seq_rec"] >= 1.0)
]
print(f"Motif RMSD pass rate: {len(good_motif) / len(df) * 100:.1f}%")

# Full success criteria (motif + codesignability)
success = df[
    (df["_res_motif_seq_rec"] >= 1.0) &
    (df["_res_motif_rmsd_ca"] < 1.0) &
    (df["_res_motif_rmsd_all_atom"] < 2.0) &
    (df["_res_co_scRMSD_all_atom_esmfold"] < 2.0)
]
print(f"Motif success rate: {len(success) / len(df) * 100:.1f}%")

Filtering Successful Protein Binders

# Standard protein binder success
success = df[
    (df["mpnn_complex_i_pAE"] * 31 <= 7.0) &
    (df["mpnn_complex_pLDDT"] >= 0.9) &
    (df["mpnn_binder_scRMSD"] < 1.5)
]
print(f"Protein binder success rate: {len(success) / len(df) * 100:.1f}%")

Filtering Successful Ligand Binders

# Default ligand binder success thresholds
success = df[
    (df["mpnn_complex_min_ipAE"] * 31 < 2.0) &
    (df["mpnn_binder_scRMSD_ca"] < 2.0) &
    (df["mpnn_ligand_scRMSD_aligned_allatom"] < 5.0)
]
print(f"Ligand binder success rate: {len(success) / len(df) * 100:.1f}%")

Filtering Successful Motif Protein Binders

# Load motif binder results
df = pd.read_csv("evaluation_results/my_run/motif_binder_results_combined.csv")

# Joint binder + motif success (protein target)
success = df[
    (df["mpnn_fixed_complex_i_pAE"] * 31 <= 7.0) &
    (df["mpnn_fixed_complex_pLDDT"] >= 0.8) &
    (df["mpnn_fixed_binder_scRMSD_ca"] < 2.0) &
    (df["mpnn_fixed_motif_rmsd_pred"] < 2.0) &
    (df["mpnn_fixed_correct_motif_sequence"] >= 1.0)
]
print(f"Motif protein binder success rate: {len(success) / len(df) * 100:.1f}%")

Filtering Successful Motif Ligand Binders (AME)

# Load motif binder results
df = pd.read_csv("evaluation_results/my_ame_run/motif_binder_results_combined.csv")

# Joint binder + motif success (ligand target — includes clash check)
success = df[
    (df["mpnn_fixed_binder_scRMSD_bb3"] <= 2.0) &
    (df["mpnn_fixed_motif_rmsd_pred"] <= 1.5) &
    (df["mpnn_fixed_correct_motif_sequence"] >= 1.0) &
    (df["mpnn_fixed_has_ligand_clashes"] < 0.5)
]
print(f"Motif ligand binder success rate: {len(success) / len(df) * 100:.1f}%")

Comparing Across Experiments

# Load multiple experiments
dfs = []
for run in ["run_A", "run_B", "run_C"]:
    d = pd.read_csv(f"evaluation_results/{run}/motif_results_*_combined.csv")
    d["experiment"] = run
    dfs.append(d)

df_all = pd.concat(dfs, ignore_index=True)

# Compare by experiment
summary = df_all.groupby("experiment").agg(
    motif_rmsd_mean=("_res_motif_rmsd_all_atom", "mean"),
    motif_rmsd_median=("_res_motif_rmsd_all_atom", "median"),
    seq_rec_mean=("_res_motif_seq_rec", "mean"),
    des_rmsd_mean=("_res_scRMSD_ca_esmfold", "mean"),
)
print(summary)

Success Criteria

Result Type	Key Thresholds (defaults)
`protein_binder`	i_pAE*31 <= 7.0, pLDDT >= 0.9, scRMSD_ca < 1.5 A
`ligand_binder`	min_ipAE*31 < 2.0, scRMSD_ca < 2.0 A, ligand scRMSD < 5.0 A
`monomer`	designability scRMSD < 2.0 A
`monomer_motif`	motif RMSD + codesignability (see below)
`motif_protein_binder`	binder (i_pAE, pLDDT, scRMSD) + motif RMSD < 2.0 + seq recovery >= 1.0
`motif_ligand_binder`	binder scRMSD_bb3 <= 2.0 + motif RMSD <= 1.5 + seq recovery + no clashes

All thresholds are customizable via aggregation.success_thresholds in the analysis config. Details for each type below.

Protein Binder Success

Default thresholds (result_type: protein_binder), based on AlphaProteo criteria:

Metric	Column	Threshold	Direction
i_pAE	`{seq}_complex_i_pAE`	* 31 <= 7.0	Lower is better
pLDDT	`{seq}_complex_pLDDT`	>= 0.9	Higher is better
scRMSD_ca	`{seq}_binder_scRMSD_ca`	< 1.5 A	Lower is better

A sample passes if all three thresholds are met for at least one redesigned sequence.

Ligand Binder Success

Default thresholds (result_type: ligand_binder):

Metric	Column	Threshold	Direction
min_ipAE	`{seq}_complex_min_ipAE`	* 31 < 2.0	Lower is better
scRMSD_ca	`{seq}_binder_scRMSD_ca`	< 2.0 A	Lower is better
scRMSD_aligned_allatom	`{seq}_ligand_scRMSD_aligned_allatom`	< 5.0 A	Lower is better

A sample passes if all three thresholds are met for at least one redesigned sequence.

Customizing Binder Thresholds

Both protein and ligand binder thresholds can be overridden in the analysis config via success_thresholds. Each entry has a metric name, threshold, comparison operator, optional scale factor, and column prefix:

# In binder_analyze.yaml or ligand_binder_analyze.yaml
aggregation:
  success_thresholds:
    i_pAE:
      threshold: 10.0       # less strict than default 7.0
      op: "<="
      scale: 31.0
      column_prefix: complex
    pLDDT:
      threshold: 0.8        # less strict than default 0.9
      op: ">="
      scale: 1.0
      column_prefix: complex
    scRMSD_ca:
      threshold: 2.0        # less strict than default 1.5
      op: "<"
      scale: 1.0
      column_prefix: binder

If success_thresholds is null (the default), the built-in defaults for the result_type are used (DEFAULT_PROTEIN_BINDER_THRESHOLDS or DEFAULT_LIGAND_BINDER_THRESHOLDS).

Monomer Success

Metric	Default threshold	Direction
`_res_scRMSD_ca_{model}`	< 2.0 A	Lower is better
`_res_co_scRMSD_ca_{model}`	< 2.0 A	Lower is better
`_res_co_scRMSD_all_atom_{model}`	< 2.0 A	Lower is better

Motif Success (Built-in Presets)

Two presets are evaluated automatically during analyze:

motif_success — Direct motif quality:

_res_motif_seq_rec >= 1.0 (perfect sequence recovery)
_res_motif_rmsd_ca < 1.0 A (CA motif RMSD)
_res_motif_rmsd_all_atom < 2.0 A (all-atom motif RMSD)
_res_co_scRMSD_all_atom_{model} < 2.0 A (full-structure codesignability)

refolded_motif_success — Refolded motif quality:

_res_motif_seq_rec >= 1.0
_res_co_motif_scRMSD_ca_{model} < 1.0 A (motif-region codesign CA scRMSD)
_res_co_motif_scRMSD_all_atom_{model} < 2.0 A (motif-region codesign all-atom scRMSD)
_res_co_scRMSD_all_atom_{model} < 2.0 A (full-structure codesignability)

Custom success criteria can be added in the config:

aggregation:
  motif_success_criteria:
    - column: "_res_motif_rmsd_ca"
      threshold: 0.5
      op: "<"
    - column: "_res_motif_seq_rec"
      threshold: 1.0
      op: ">="

Tip atoms note: For tasks with atom_selection_mode: tip_atoms, CA-based motif metrics are auto-filled with 0.0, so CA threshold checks in success criteria pass automatically. The all-atom thresholds evaluate the actual tip atom RMSD.

Motif Protein Binder Success

A sample is "successful" when at least one redesign passes ALL binder AND ALL motif criteria jointly:

Binder criteria:

Metric	Default threshold	Direction
`{seq}_complex_i_pAE` (scaled by 31)	<= 7.0	Lower is better
`{seq}_complex_pLDDT`	>= 0.8	Higher is better
`{seq}_binder_scRMSD_ca`	< 2.0 A	Lower is better

Motif criteria (evaluated on the same redesign):

Metric	Default threshold	Direction
`{seq}_motif_rmsd_pred`	< 2.0 A	Lower is better
`{seq}_correct_motif_sequence`	>= 1.0 (perfect recovery)	Higher is better

Motif Ligand Binder Success (AME Pipeline)

Same joint per-redesign evaluation, but with ligand-specific criteria including clash detection:

Binder criteria:

Metric	Default threshold	Direction
`{seq}_binder_scRMSD_bb3`	<= 2.0 A	Lower is better

Motif criteria (evaluated on the same redesign):

Metric	Default threshold	Direction
`{seq}_motif_rmsd_pred`	<= 1.5 A	Lower is better
`{seq}_correct_motif_sequence`	>= 1.0 (perfect recovery)	Higher is better
`{seq}_has_ligand_clashes`	< 0.5 (no clashes)	Lower is better

Customizing Motif Binder Thresholds

Both binder and motif criteria can be overridden in the analysis config. The motif_binder_success_thresholds has two sub-keys: binder (same format as standard binder thresholds) and motif (list of column/threshold/op dicts):

# In analyze_motif_binder.yaml or ame_analyze.yaml
aggregation:
  motif_binder_success_thresholds:
    binder:
      i_pAE:
        threshold: 8.0
        op: "<="
        scale: 31.0
        column_prefix: complex
      scRMSD:
        threshold: 2.0
        op: "<"
        scale: 1.0
        column_prefix: binder
    motif:
      - column: "{seq_type}_motif_rmsd_pred_all"
        threshold: 1.5
        op: "<"
      - column: "{seq_type}_correct_motif_sequence_all"
        threshold: 1.0
        op: ">="

The {seq_type} placeholder in motif column names is automatically resolved to the active sequence type (e.g. self, mpnn_fixed) at analysis time.

Metric Interpretation Cheat Sheet

RMSD Metrics (Lower is Better)

Value	Interpretation
< 1.0 A	Excellent — near-identical structure
1.0 - 2.0 A	Good — minor deviations, design likely preserved
2.0 - 4.0 A	Moderate — noticeable structural change
> 4.0 A	Poor — structure not preserved

Binding Confidence (i_pAE, Lower is Better)

Value	Interpretation
< 5	Excellent binding confidence
5 - 8	Good, likely successful
8 - 15	Moderate, needs validation
> 15	Poor prediction

Structure Confidence (pLDDT, Higher is Better)

Value	Interpretation
> 90	Very high confidence
70 - 90	Confident
50 - 70	Low confidence
< 50	Very low / disordered

Folding Models

Binder refolding (binder_folding_method):

Value	Model	Best for
`colabdesign`	AlphaFold2 Multimer (ColabDesign)	Protein-protein binder refolding
`rf3_latest`	RoseTTAFold3	Protein-ligand, high accuracy
`protenix_base_default_v0.5.0`	Protenix (base)	Alternative structure prediction
`protenix_mini_default_v0.5.0`	Protenix (mini)	Faster Protenix variant

ColabDesign does not support ligand targets. Use RF3 or Protenix for ligand binder evaluation.

Monomer refolding (monomer_folding_models):

Value	Model	Best for
`esmfold`	ESMFold	Fast monomer designability screening
`colabfold`	ColabFold (AF2 monomer)	Higher accuracy monomer refolding

Inverse folding (inverse_folding_model):

Value	Model	Best for
`protein_mpnn`	ProteinMPNN	Standard protein redesign
`soluble_mpnn`	SolubleMPNN	Protein binder redesign (solubility-aware)
`ligand_mpnn`	LigandMPNN	Ligand-aware binder redesign

RF3 requires environment variables (see Environment Variables). Protenix uses its own checkpoint management.

Environment Variables

These variables are used by the evaluation and generation pipelines when external tools are involved.

RF3 (RoseTTAFold3)

Required when using RF3 for refolding (e.g. binder_folding_method: rf3_latest or RF3 reward during generation):

Variable	Description
`RF3_CKPT_PATH`	Path to the RF3 checkpoint file (e.g. `rf3_latest.pt`).
`RF3_EXEC_PATH`	Path to the RF3 executable.

If either is unset, RF3 reward/refolding will fail at initialization with a clear error.

RF3 output directory: Predictions are written to an output directory passed at call time. In config you can set search.rf3_dump_dir (e.g. under the generation config) to control where RF3 writes results; if unset, the reward runner uses a default of ./rf3_outputs (relative to the process working directory). The output directory is not created at reward initialization—it is created only when a prediction is actually run (e.g. by the binder evaluation pipeline or by the code that calls reset_dump_dir with a concrete path). This avoids leaving an empty rf3_outputs directory when RF3 is never used.

Interface Metrics

Optional pre- and post-refolding interface metrics can be enabled in config:

metric:
  pre_refolding:
    bioinformatics: true   # SC, SASA, hydrophobicity
    tmol: true             # H-bonds, electrostatics (requires TMOL)
  refolded:
    bioinformatics: true
    tmol: true

ESM Model Cache

Variable	Description
`ESM_DIR`	Directory for ESM model cache
`CACHE_DIR`	Fallback cache directory if `ESM_DIR` is unset

Ligand Evaluation

Variable	Description	Default
`LIGAND_CLASH_THRESHOLD`	Distance threshold (Å) for ligand clash detection	`1.5`

Novelty Computation

Novelty is computed using FoldSeek against reference databases. Enable via config:

Config key	Database
`compute_novelty_pdb`	PDB
`compute_novelty_afdb`	AlphaFold DB
`compute_novelty_afdb_rep_v4`	AlphaFold DB representative v4
`compute_novelty_afdb_rep_v4_geniefilters_maxlen512`	AlphaFold DB rep v4, filtered (max 512 residues)

Configuration Guide

For complete evaluation and analysis configuration examples -- including binder, monomer, combined, external PDB, custom ranking criteria, custom success thresholds, and training configs -- see the Configuration Guide.

Minimal Config Quick Reference

Protein binder (protein_type: binder, result_type: protein_binder):

protein_type: binder
metric:
  compute_binder_metrics: true
  binder_folding_method: colabdesign
  sequence_types: [mpnn]
  inverse_folding_model: soluble_mpnn

Ligand binder (protein_type: binder, result_type: ligand_binder):

protein_type: binder
metric:
  compute_binder_metrics: true
  binder_folding_method: rf3_latest
  sequence_types: [mpnn]
  inverse_folding_model: ligand_mpnn

Monomer (protein_type: monomer, result_type: monomer):

protein_type: monomer
metric:
  compute_monomer_metrics: true
  monomer_folding_models: [esmfold]

Motif (protein_type: monomer_motif, result_type: monomer_motif):

protein_type: monomer_motif
dataset:
  motif_task_name: 1QJG_AA_TIP
  unindexed: true
metric:
  compute_motif_metrics: true
  compute_monomer_metrics: true
  monomer_folding_models: [esmfold]

Common Evaluation Options

Config key	Default	Description
`input_mode`	`generated`	`generated` (pipeline output) or `pdb_dir` (external PDBs)
`dryrun`	`false`	Show what would be evaluated without running
`show_progress`	`true`	Display progress bar
`file_limit`	`null`	Limit number of PDBs to evaluate (useful for testing)
`job_id`	`0`	Parallel job index (0-based)
`eval_njobs`	`1`	Total parallel jobs for splitting
`metric.num_redesign_seqs`	`8`	Number of MPNN redesign sequences per sample
`metric.interface_cutoff`	`8.0`	Distance cutoff (Å) for defining interface residues
`metric.compute_esm_metrics`	`false`	Compute ESM pseudo-perplexity
`metric.compute_pre_refolding_metrics`	`false`	Compute interface metrics on generated structure
`metric.compute_refolded_structure_metrics`	`false`	Compute interface metrics on refolded structure
`metric.keep_folding_outputs`	`true`	Retain intermediate folding output files
`metric.designability_num_seq`	`8`	Number of MPNN sequences for monomer designability

FilesExpand file tree

EVALUATION_METRICS.md

Latest commit

History

EVALUATION_METRICS.md

File metadata and controls

Evaluation & Analysis Guide

Table of Contents

Pipeline Overview

Protein Types & Result Types

Evaluation Step

Protein Binder Evaluation

Ligand Binder Evaluation

Monomer Evaluation

Motif Evaluation (Monomer Motif)

Motif Binder Evaluation

Motif Protein Binder

Motif Ligand Binder (AME)

Ranking Criteria (Best Redesign Selection)

Optional ESM Metrics

Job Parallelization

Evaluating External PDB Files

Motif Scaffolding Deep Dive

Tip Atoms vs All-Atom

Where tip atoms are defined

How atom_selection_mode affects metrics

Configuration

Motif Alignment Modes: Indexed vs Unindexed

Indexed Mode (unindexed: false)

Unindexed Mode (unindexed: true)

Contig Strings

Analysis Step

What it does

Analysis Modes

Monomer Merging for Combined Result Types

Output Directory Structure

Diversity Computation

Analysis Output Files

Result CSV Reference

Identifier Columns (all result types)

Protein Binder Result Columns

Ligand Binder Result Columns

Monomer Result Columns

Motif Result Columns

Motif Binder Result Columns

Reading Results in Python

Loading and Basic Inspection

Working with List Columns

Filtering Successful Motif Scaffolds

Filtering Successful Protein Binders

Filtering Successful Ligand Binders

Filtering Successful Motif Protein Binders

Filtering Successful Motif Ligand Binders (AME)

Comparing Across Experiments

Success Criteria

Protein Binder Success

Ligand Binder Success

Customizing Binder Thresholds

Monomer Success

Motif Success (Built-in Presets)

Motif Protein Binder Success

Motif Ligand Binder Success (AME Pipeline)

Customizing Motif Binder Thresholds

Metric Interpretation Cheat Sheet

RMSD Metrics (Lower is Better)

Binding Confidence (i_pAE, Lower is Better)

Structure Confidence (pLDDT, Higher is Better)

Folding Models

Environment Variables

RF3 (RoseTTAFold3)

Interface Metrics

ESM Model Cache

Ligand Evaluation

Novelty Computation

Configuration Guide

Minimal Config Quick Reference

Common Evaluation Options

How `atom_selection_mode` affects metrics

Indexed Mode (`unindexed: false`)

Unindexed Mode (`unindexed: true`)