Skip to content

Commit d16217b

Browse files
isPANNclaude
andcommitted
Add MaximumContactMapOverlap model (#1043)
Classical protein-structure contact-map alignment: given two ordered contact graphs G_1=(V_1,E_1) and G_2=(V_2,E_2), find an order-preserving partial injective map f: V_1 → V_2 ∪ {unmatched} maximizing the number of contacts {i,k} ∈ E_1 such that both i, k are matched and {f(i), f(k)} ∈ E_2. Aliases: CMO, MaxCMO. NP-hard with substantial literature on exact algorithms and integer programming (Andonov, Malod-Dognin & Yanev 2011; Xie & Sahinidis 2007). - src/models/graph/maximum_contact_map_overlap.rs: MaximumContactMapOverlap { num_vertices_1, contacts_1, num_vertices_2, contacts_2 }. Validating constructor normalizes each pair to sorted form (u<v), rejects self-loops, duplicates, and out-of-range endpoints. dims = vec![num_vertices_2 + 1; num_vertices_1] (value 0 encodes unmatched; value j+1 maps to vertex j of G_2). Max<i64> objective; non-injective or non-order-preserving matched values → Max(None). ProblemSchemaEntry + ProblemSizeFieldEntry; inherent getters num_vertices_1/_2 and num_contacts_1/_2. declare_variants! default with complexity (num_vertices_2+1)^num_vertices_1. Canonical example via inventory: G_1 with 4 vertices and contacts {(0,2),(1,3)}, G_2 with 5 vertices and contacts {(0,2),(0,3),(1,4)} — optimum [1,2,4,5] preserves both contacts → Max(Some(2)). - src/unit_tests/models/graph/maximum_contact_map_overlap.rs: 17 tests covering creation, evaluate at optimum, all-unmatched, single-match, non-injective Max(None), non-order-preserving Max(None), suboptimal feasible (config [1,2,3,4] preserves 1 of 2 contacts), brute-force solver returning Max(2), wrong-length and out-of-range guards, serialization, alias resolution for CMO/MaxCMO, and three panic guards (self-loop, duplicate contact, endpoint out of range). - problemreductions-cli/: schema-driven create wires --num-vertices-1 / --num-vertices-2 / --contacts-1 / --contacts-2 (Vec<(usize,usize)> parser) via the existing CreateArgs + flag_map + tests fixture. - docs/paper: problem-def block with the alignment table and the two preserved-contact bullets; display-name; Crossref-verified BibTeX for both Andonov-Malod-Dognin-Yanev 2011 and Xie-Sahinidis 2007 JCB papers (with N{\"o}el encoded per repo umlaut convention). References: doi:10.1089/cmb.2009.0196 (Andonov, Malod-Dognin & Yanev 2011, JCB); doi:10.1089/cmb.2007.R007 (Xie & Sahinidis 2007, JCB). The direct `MaximumContactMapOverlap -> ILP` rule (#1044) is out of scope for this PR and will follow separately. Closes #1043 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent f77ea5c commit d16217b

8 files changed

Lines changed: 610 additions & 0 deletions

File tree

docs/paper/reductions.typ

Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -206,6 +206,7 @@
206206
"MaximumClique": [Maximum Clique],
207207
"MaximumCoKPlex": [Maximum Co-$k$-Plex],
208208
"MaximumCommonEdgeSubgraph": [Maximum Common Edge Subgraph],
209+
"MaximumContactMapOverlap": [Maximum Contact Map Overlap],
209210
"MaximumEdgeWeightedKClique": [Maximum Edge-Weighted $k$-Clique],
210211
"HighlyConnectedDeletion": [Highly Connected Deletion],
211212
"EulerianPath": [Eulerian Path],
@@ -943,6 +944,111 @@ In all graph problems below, $G = (V, E)$ denotes an undirected graph with $|V|
943944
]
944945
}
945946

947+
#{
948+
// Hand-authored canonical example mirroring the in-repo example_db fixture
949+
// (the canonical example for MaximumContactMapOverlap ships via the model
950+
// file's canonical_model_example_specs rather than docs/paper/data/examples.json).
951+
let n1 = 4
952+
let n2 = 5
953+
let contacts1 = ((0, 2), (1, 3))
954+
let contacts2 = ((0, 3), (1, 4), (0, 2))
955+
// Encoded config: value 0 = unmatched, value j + 1 = matched to vertex j of G_2.
956+
// The optimum [1, 2, 4, 5] aligns 0->0, 1->1, 2->3, 3->4.
957+
let config = (1, 2, 4, 5)
958+
// Decoded alignment as (i, f(i)) pairs for matched i; bot otherwise.
959+
let alignment = config.enumerate().map(((i, v)) => (i, v))
960+
let preserved = 2
961+
let fmt-pair(p) = $\{#p.at(0), #p.at(1)\}$
962+
let fmt-edges(es) = es.map(fmt-pair).join(", ")
963+
[
964+
#problem-def("MaximumContactMapOverlap")[
965+
Given two finite ordered contact maps $G_1 = (V_1, E_1)$ and $G_2 = (V_2, E_2)$ with $V_r = {0, 1, dots, n_r - 1}$ ordered by index and $E_r subset.eq binom(V_r, 2)$ a simple undirected contact set, find an order-preserving partial injective alignment $f: V_1 -> V_2 union {bot}$ maximizing the number of preserved contacts
966+
$ |{{i, k} in E_1 : i, k "matched and" {f(i), f(k)} in E_2}|. $
967+
Feasibility requires injectivity on matched vertices and the order-preserving condition: if $i < k$ in $V_1$ and both are matched, then $f(i) < f(k)$ in $V_2$.
968+
][
969+
The Maximum Contact Map Overlap problem (CMO) is a standard combinatorial formulation of flexible protein-structure comparison: each protein is represented by an ordered residue contact graph, and the alignment quality is measured by the number of superimposed contacts. Xie and Sahinidis introduced a reduction-based exact algorithm that solves CMO via a sequence of smaller maximum-weight independent-set subproblems on a derived interaction graph @XieSahinidis2007CMO. Andonov, Malod-Dognin, and Yanev later strengthened the integer-programming bound and B&B search, producing one of the fastest known exact CMO solvers @AndonovMalodDogninYanev2011CMO. The order-preserving constraint distinguishes CMO from the general maximum common edge-subgraph problem and reflects the underlying sequence of residues along each protein backbone. The registered exact baseline enumerates every assignment $V_1 -> V_2 union {bot}$ in $O^*((|V_2| + 1)^(|V_1|))$ time and filters to order-preserving injective maps#footnote[No algorithm improving on full enumeration is registered for the unrestricted variant. The specialized exact algorithms of @XieSahinidis2007CMO and @AndonovMalodDogninYanev2011CMO improve on the worst case in practice but not in the registered worst-case complexity bound.].
970+
971+
*Example.* Let $V_1 = {0, 1, 2, 3\}$ with $E_1 = {#fmt-edges(contacts1)}$ and $V_2 = {0, 1, 2, 3, 4\}$ with $E_2 = {#fmt-edges(contacts2)}$. The alignment $f$ given by
972+
973+
#align(center)[#table(
974+
columns: (auto, auto, auto, auto, auto),
975+
align: center,
976+
stroke: 0.4pt,
977+
[$i$], [$0$], [$1$], [$2$], [$3$],
978+
[$f(i)$], [$0$], [$1$], [$3$], [$4$],
979+
)]
980+
981+
is order-preserving ($0 < 1 < 3 < 4$) and injective. Both contacts of $G_1$ are preserved:
982+
- $\{0, 2\}$ maps to $\{f(0), f(2)\} = \{0, 3\} in E_2$,
983+
- $\{1, 3\}$ maps to $\{f(1), f(3)\} = \{1, 4\} in E_2$.
984+
Hence the alignment achieves the maximum possible objective $#preserved = |E_1|$.
985+
986+
#pred-commands(
987+
"pred create --example MaximumContactMapOverlap -o cmo.json",
988+
"pred solve cmo.json --solver brute-force",
989+
"pred evaluate cmo.json --config " + config.map(str).join(","),
990+
)
991+
992+
#figure({
993+
let dx = 4.0
994+
let pos1 = range(n1).map(i => (i * 1.0, 0.0))
995+
let pos2 = range(n2).map(j => (dx + j * 1.0, 0.0))
996+
canvas(length: 1cm, {
997+
import draw: *
998+
// Backbone of G_1 (visualizes residue order).
999+
for i in range(n1 - 1) {
1000+
line(pos1.at(i), pos1.at(i + 1), stroke: (paint: luma(180), thickness: 0.4pt))
1001+
}
1002+
// Backbone of G_2.
1003+
for j in range(n2 - 1) {
1004+
line(pos2.at(j), pos2.at(j + 1), stroke: (paint: luma(180), thickness: 0.4pt))
1005+
}
1006+
// Contacts of G_1 as arcs above the backbone.
1007+
for (u, v) in contacts1 {
1008+
let p = pos1.at(u)
1009+
let q = pos1.at(v)
1010+
let mid = ((p.at(0) + q.at(0)) / 2, (p.at(1) + q.at(1)) / 2 + 0.45 * (v - u))
1011+
hobby(p, mid, q, stroke: 0.7pt + luma(90))
1012+
}
1013+
// Contacts of G_2 above its backbone.
1014+
for (u, v) in contacts2 {
1015+
let p = pos2.at(u)
1016+
let q = pos2.at(v)
1017+
let mid = ((p.at(0) + q.at(0)) / 2, (p.at(1) + q.at(1)) / 2 + 0.45 * (v - u))
1018+
hobby(p, mid, q, stroke: 0.7pt + luma(90))
1019+
}
1020+
// Vertices of G_1: highlight matched ones.
1021+
for (i, pos) in pos1.enumerate() {
1022+
let matched = config.at(i) != 0
1023+
g-node(pos, name: "u" + str(i),
1024+
fill: if matched { graph-colors.at(0) } else { white },
1025+
label: if matched { text(fill: white)[$#i$] } else { [$#i$] })
1026+
}
1027+
// Vertices of G_2.
1028+
for (j, pos) in pos2.enumerate() {
1029+
g-node(pos, name: "v" + str(j), fill: white, label: [$#j$])
1030+
}
1031+
// Mapping arrows (drawn below the backbones).
1032+
for (i, v) in alignment {
1033+
if v != 0 {
1034+
let j = v - 1
1035+
let p = pos1.at(i)
1036+
let q = pos2.at(j)
1037+
let mid = ((p.at(0) + q.at(0)) / 2, (p.at(1) + q.at(1)) / 2 - 1.0)
1038+
hobby(p, mid, q,
1039+
stroke: (paint: graph-colors.at(0), thickness: 0.6pt, dash: "dashed"))
1040+
}
1041+
}
1042+
content((pos1.at(0).at(0) - 0.7, 0.0), text(9pt, weight: "bold")[$G_1$])
1043+
content((pos2.at(0).at(0) - 0.7, 0.0), text(9pt, weight: "bold")[$G_2$])
1044+
})
1045+
},
1046+
caption: [Maximum Contact Map Overlap instance from the issue. Top: ordered contact maps $G_1$ (left, $|V_1| = #n1$, $|E_1| = #contacts1.len()$) and $G_2$ (right, $|V_2| = #n2$, $|E_2| = #contacts2.len()$); contacts are drawn as arcs above the backbone. Bottom: dashed curves show the order-preserving partial injective alignment $f$; both contacts of $G_1$ are preserved.],
1047+
) <fig:cmo-issue>
1048+
]
1049+
]
1050+
}
1051+
9461052
#{
9471053
let x = load-model-example("MaximumEdgeWeightedKClique")
9481054
let nv = graph-num-vertices(x.instance)

docs/paper/references.bib

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1931,6 +1931,28 @@ @article{Soule2021RNA
19311931
doi = {10.1371/journal.pcbi.1008990}
19321932
}
19331933

1934+
@article{AndonovMalodDogninYanev2011CMO,
1935+
author = {Rumen Andonov and No{\"e}l Malod-Dognin and Nicola Yanev},
1936+
title = {Maximum Contact Map Overlap Revisited},
1937+
journal = {Journal of Computational Biology},
1938+
volume = {18},
1939+
number = {1},
1940+
pages = {27--41},
1941+
year = {2011},
1942+
doi = {10.1089/cmb.2009.0196}
1943+
}
1944+
1945+
@article{XieSahinidis2007CMO,
1946+
author = {Wei Xie and Nikolaos V. Sahinidis},
1947+
title = {A Reduction-Based Exact Algorithm for the Contact Map Overlap Problem},
1948+
journal = {Journal of Computational Biology},
1949+
volume = {14},
1950+
number = {5},
1951+
pages = {637--654},
1952+
year = {2007},
1953+
doi = {10.1089/cmb.2007.R007}
1954+
}
1955+
19341956
@article{GouveiaMartins2015MEWC,
19351957
author = {Luis Gouveia and Pedro Martins},
19361958
title = {Solving the maximum edge-weight clique problem in sparse graphs with compact formulations},

problemreductions-cli/src/cli.rs

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -588,6 +588,18 @@ pub struct CreateArgs {
588588
/// Target labelled digraph G2 for MaximumCommonEdgeSubgraph. Same format as --graph-1 (e.g., "4:0-0-1,1-1-2,0-2-2,2-0-3,1-3-3,0-1-3")
589589
#[arg(long = "graph-2")]
590590
pub graph_2: Option<String>,
591+
/// Number of ordered vertices in the first contact map G_1 for MaximumContactMapOverlap
592+
#[arg(long = "num-vertices-1")]
593+
pub num_vertices_1: Option<usize>,
594+
/// Number of ordered vertices in the second contact map G_2 for MaximumContactMapOverlap
595+
#[arg(long = "num-vertices-2")]
596+
pub num_vertices_2: Option<usize>,
597+
/// Contacts of G_1 for MaximumContactMapOverlap as comma-separated unordered pairs (e.g., "0-2,1-3")
598+
#[arg(long = "contacts-1")]
599+
pub contacts_1: Option<String>,
600+
/// Contacts of G_2 for MaximumContactMapOverlap as comma-separated unordered pairs (e.g., "0-3,1-4,0-2")
601+
#[arg(long = "contacts-2")]
602+
pub contacts_2: Option<String>,
591603
/// Bin capacity for BinPacking
592604
#[arg(long)]
593605
pub capacity: Option<String>,
@@ -1031,6 +1043,10 @@ impl CreateArgs {
10311043
insert!("allowed-pairs", self.allowed_pairs.as_deref());
10321044
insert!("graph-1", self.graph_1.as_deref());
10331045
insert!("graph-2", self.graph_2.as_deref());
1046+
insert!("num-vertices-1", self.num_vertices_1);
1047+
insert!("num-vertices-2", self.num_vertices_2);
1048+
insert!("contacts-1", self.contacts_1.as_deref());
1049+
insert!("contacts-2", self.contacts_2.as_deref());
10341050
insert!("capacity", self.capacity.as_deref());
10351051
insert!("sequence", self.sequence.as_deref());
10361052
insert!("subsets", self.sets.as_deref());

problemreductions-cli/src/commands/create.rs

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -169,6 +169,10 @@ fn all_data_flags_empty(args: &CreateArgs) -> bool {
169169
&& args.allowed_pairs.is_none()
170170
&& args.graph_1.is_none()
171171
&& args.graph_2.is_none()
172+
&& args.num_vertices_1.is_none()
173+
&& args.num_vertices_2.is_none()
174+
&& args.contacts_1.is_none()
175+
&& args.contacts_2.is_none()
172176
&& args.dependencies.is_none()
173177
&& args.num_attributes.is_none()
174178
&& args.source_string.is_none()

problemreductions-cli/src/commands/create/tests.rs

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1635,6 +1635,10 @@ fn empty_args() -> CreateArgs {
16351635
allowed_pairs: None,
16361636
graph_1: None,
16371637
graph_2: None,
1638+
num_vertices_1: None,
1639+
num_vertices_2: None,
1640+
contacts_1: None,
1641+
contacts_2: None,
16381642
capacity: None,
16391643
sequence: None,
16401644
sets: None,

0 commit comments

Comments
 (0)