Skip to content

Commit 3689ff6

Browse files
zazabapclaudeGiggleLiu
authored
Fix #110: Add LongestCommonSubsequence model and LCS to ILP reduction (#598)
* Add plan for #110: LongestCommonSubsequence to ILP * Implement #110: LongestCommonSubsequence model and LCS to ILP reduction Add the LongestCommonSubsequence problem model (also addresses #108) and implement the match-pair ILP formulation from Blum et al. (2021) for reducing LCS to Integer Linear Programming. Model: binary selection over shortest string positions, maximizing common subsequence length across all input strings. Reduction: for 2-string case, creates binary variables for character match pairs with assignment and no-crossing constraints. Objective maximizes matched pairs. Includes CLI registration (create, dispatch, alias), unit tests with closed-loop verification, example program, and paper entries. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: remove plan file after implementation * fix: add missing Problem and Solver trait imports in reduction tests The reduction test file needs Problem trait for evaluate() calls and Solver trait for BruteForce::find_best() calls. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Fix Copilot review comments: assert exactly 2 strings, fix complexity, fix module order - Assert strings.len() == 2 (not >= 2) in LCS-to-ILP reduction to prevent silently ignoring extra strings - Fix complexity from 2^total_length to 2^min_string_length to match actual brute-force search space (dims() uses shortest string) - Fix alphabetical ordering of longestcommonsubsequence_ilp in rules/mod.rs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add empty string edge case test for LCS model Inspired by PR #170 which tested this edge case. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: GiggleLiu <cacate0129@gmail.com>
1 parent d309549 commit 3689ff6

16 files changed

Lines changed: 921 additions & 4 deletions

File tree

docs/paper/reductions.typ

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,7 @@
5353
"BicliqueCover": [Biclique Cover],
5454
"BinPacking": [Bin Packing],
5555
"ClosestVectorProblem": [Closest Vector Problem],
56+
"LongestCommonSubsequence": [Longest Common Subsequence],
5657
"SubsetSum": [Subset Sum],
5758
"MinimumFeedbackVertexSet": [Minimum Feedback Vertex Set],
5859
)
@@ -980,6 +981,14 @@ Biclique Cover is equivalent to factoring the biadjacency matrix $M$ of the bipa
980981
*Example.* Let $n = 4$ items with weights $(2, 3, 4, 5)$, values $(3, 4, 5, 7)$, and capacity $C = 7$. Selecting $S = {1, 2}$ (items with weights 3 and 4) gives total weight $3 + 4 = 7 lt.eq C$ and total value $4 + 5 = 9$. Selecting $S = {0, 3}$ (weights 2 and 5) gives weight $2 + 5 = 7 lt.eq C$ and value $3 + 7 = 10$, which is optimal.
981982
]
982983

984+
#problem-def("LongestCommonSubsequence")[
985+
Given $k$ strings $s_1, dots, s_k$ over a finite alphabet $Sigma$, find a longest string $w$ that is a subsequence of every $s_i$. A string $w$ is a _subsequence_ of $s$ if $w$ can be obtained by deleting zero or more characters from $s$ without changing the order of the remaining characters.
986+
][
987+
The LCS problem is polynomial-time solvable for $k = 2$ strings via dynamic programming in $O(n_1 n_2)$ time (Wagner & Fischer, 1974), but NP-hard for $k gt.eq 3$ strings @maier1978. It is a foundational problem in bioinformatics (sequence alignment), version control (diff algorithms), and data compression. The problem is listed as SR10 in Garey & Johnson @garey1979.
988+
989+
*Example.* Let $s_1 = $ `ABAC` and $s_2 = $ `BACA` over $Sigma = {A, B, C}$. The longest common subsequence has length 3, e.g., `BAC`: positions 1, 2, 3 of $s_1$ match positions 0, 1, 2 of $s_2$.
990+
]
991+
983992
#problem-def("SubsetSum")[
984993
Given a finite set $A = {a_0, dots, a_(n-1)}$ with sizes $s(a_i) in ZZ^+$ and a target $B in ZZ^+$, determine whether there exists a subset $A' subset.eq A$ such that $sum_(a in A') s(a) = B$.
985994
][
@@ -1702,6 +1711,22 @@ The following reductions to Integer Linear Programming are straightforward formu
17021711
_Solution extraction._ For each position $k$, find vertex $v$ with $x_(v,k) = 1$ to recover the tour permutation; then select edges between consecutive positions.
17031712
]
17041713

1714+
#reduction-rule("LongestCommonSubsequence", "ILP")[
1715+
The match-pair ILP formulation @blum2021 encodes subsequence alignment as a binary optimization. For two strings $s_1$ (length $n_1$) and $s_2$ (length $n_2$), each position pair $(j_1, j_2)$ where $s_1[j_1] = s_2[j_2]$ yields a binary variable. Constraints enforce one-to-one matching and order preservation (no crossings). The objective maximizes the number of matched pairs.
1716+
][
1717+
_Construction._ Given strings $s_1$ and $s_2$:
1718+
1719+
_Variables:_ Binary $m_(j_1, j_2) in {0, 1}$ for each $(j_1, j_2)$ with $s_1[j_1] = s_2[j_2]$. Interpretation: $m_(j_1, j_2) = 1$ iff position $j_1$ of $s_1$ is matched to position $j_2$ of $s_2$.
1720+
1721+
_Constraints:_ (1) Each position in $s_1$ matched at most once: $sum_(j_2 : (j_1, j_2) in M) m_(j_1, j_2) lt.eq 1$ for all $j_1$. (2) Each position in $s_2$ matched at most once: $sum_(j_1 : (j_1, j_2) in M) m_(j_1, j_2) lt.eq 1$ for all $j_2$. (3) No crossings: for $(j_1, j_2), (j'_1, j'_2) in M$ with $j_1 < j'_1$ and $j_2 > j'_2$: $m_(j_1, j_2) + m_(j'_1, j'_2) lt.eq 1$.
1722+
1723+
_Objective:_ Maximize $sum_((j_1, j_2) in M) m_(j_1, j_2)$.
1724+
1725+
_Correctness._ ($arrow.r.double$) A common subsequence of length $ell$ defines $ell$ matched pairs that are order-preserving (no crossings) and one-to-one, yielding a feasible ILP solution with objective $ell$. ($arrow.l.double$) An ILP solution with objective $ell$ defines $ell$ matched pairs; constraints (1)--(2) ensure one-to-one matching, and constraint (3) ensures order preservation, so the matched characters form a common subsequence of length $ell$.
1726+
1727+
_Solution extraction._ Collect pairs $(j_1, j_2)$ with $m_(j_1, j_2) = 1$, sort by $j_1$, and read the characters.
1728+
]
1729+
17051730
== Unit Disk Mapping
17061731

17071732
#reduction-rule("MaximumIndependentSet", "KingsSubgraph")[

docs/paper/references.bib

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -407,6 +407,27 @@ @article{ibarra1975
407407
doi = {10.1145/321906.321909}
408408
}
409409

410+
@article{maier1978,
411+
author = {David Maier},
412+
title = {The Complexity of Some Problems on Subsequences and Supersequences},
413+
journal = {Journal of the ACM},
414+
volume = {25},
415+
number = {2},
416+
pages = {322--336},
417+
year = {1978},
418+
doi = {10.1145/322063.322075}
419+
}
420+
421+
@article{blum2021,
422+
author = {Christian Blum and Maria J. Blesa and Borja Calvo},
423+
title = {{ILP}-based reduced variable neighborhood search for the longest common subsequence problem},
424+
journal = {Computers \& Operations Research},
425+
volume = {125},
426+
pages = {105089},
427+
year = {2021},
428+
doi = {10.1016/j.cor.2020.105089}
429+
}
430+
410431
@book{sipser2012,
411432
author = {Michael Sipser},
412433
title = {Introduction to the Theory of Computation},
Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
// # LongestCommonSubsequence to ILP Reduction
2+
//
3+
// ## Mathematical Formulation
4+
// Uses the match-pair formulation (Blum et al., 2021).
5+
// For each position pair (j1, j2) where s1[j1] == s2[j2], a binary variable m_{j1,j2}.
6+
// Constraints:
7+
// (1) Each s1 position matched at most once
8+
// (2) Each s2 position matched at most once
9+
// (3) Order preservation: no crossings among matched pairs
10+
// Objective: maximize total matched pairs.
11+
//
12+
// ## This Example
13+
// - Instance: s1 = "ABAC", s2 = "BACA"
14+
// - 6 match pairs, LCS = "BAC" (length 3)
15+
//
16+
// ## Output
17+
// Exports `docs/paper/examples/longestcommonsubsequence_to_ilp.json`.
18+
19+
use problemreductions::export::*;
20+
use problemreductions::models::algebraic::ILP;
21+
use problemreductions::prelude::*;
22+
use problemreductions::solvers::ILPSolver;
23+
24+
pub fn run() {
25+
// 1. Create LCS instance: s1 = "ABAC", s2 = "BACA"
26+
let problem = LongestCommonSubsequence::new(vec![
27+
vec![b'A', b'B', b'A', b'C'],
28+
vec![b'B', b'A', b'C', b'A'],
29+
]);
30+
31+
// 2. Reduce to ILP
32+
let reduction = ReduceTo::<ILP<bool>>::reduce_to(&problem);
33+
let ilp = reduction.target_problem();
34+
35+
// 3. Print transformation
36+
println!("\n=== Problem Transformation ===");
37+
println!(
38+
"Source: LCS with {} strings, total length {}",
39+
problem.num_strings(),
40+
problem.total_length()
41+
);
42+
println!(
43+
"Target: ILP with {} variables, {} constraints",
44+
ilp.num_vars,
45+
ilp.constraints.len()
46+
);
47+
48+
// 4. Solve ILP
49+
let solver = ILPSolver::new();
50+
let ilp_solution = solver
51+
.solve(ilp)
52+
.expect("ILP should be feasible for ABAC/BACA");
53+
println!("\n=== Solution ===");
54+
println!("ILP solution: {:?}", &ilp_solution);
55+
56+
// 5. Extract LCS solution
57+
let extracted = reduction.extract_solution(&ilp_solution);
58+
println!("Source LCS config: {:?}", extracted);
59+
60+
// 6. Verify
61+
let metric = problem.evaluate(&extracted);
62+
assert!(metric.is_valid());
63+
let lcs_length = metric.unwrap();
64+
println!("LCS length: {}", lcs_length);
65+
assert_eq!(lcs_length, 3);
66+
println!("\nReduction verified successfully");
67+
68+
// 7. Collect solutions and export JSON
69+
let solutions = vec![SolutionPair {
70+
source_config: extracted,
71+
target_config: ilp_solution,
72+
}];
73+
74+
let source_variant = variant_to_map(LongestCommonSubsequence::variant());
75+
let target_variant = variant_to_map(ILP::<bool>::variant());
76+
let overhead =
77+
lookup_overhead("LongestCommonSubsequence", &source_variant, "ILP", &target_variant)
78+
.expect("LCS -> ILP overhead not found");
79+
80+
let data = ReductionData {
81+
source: ProblemSide {
82+
problem: LongestCommonSubsequence::NAME.to_string(),
83+
variant: source_variant,
84+
instance: serde_json::json!({
85+
"strings": [
86+
[65, 66, 65, 67],
87+
[66, 65, 67, 65],
88+
],
89+
}),
90+
},
91+
target: ProblemSide {
92+
problem: ILP::<bool>::NAME.to_string(),
93+
variant: target_variant,
94+
instance: serde_json::json!({
95+
"num_vars": ilp.num_vars,
96+
"num_constraints": ilp.constraints.len(),
97+
}),
98+
},
99+
overhead: overhead_to_json(&overhead),
100+
};
101+
102+
let results = ResultData { solutions };
103+
let name = "longestcommonsubsequence_to_ilp";
104+
write_example(name, &data, &results);
105+
}
106+
107+
fn main() {
108+
run()
109+
}

problemreductions-cli/src/cli.rs

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -217,6 +217,7 @@ Flags by problem type:
217217
BicliqueCover --left, --right, --biedges, --k
218218
BMF --matrix (0/1), --rank
219219
CVP --basis, --target-vec [--bounds]
220+
LCS --strings
220221
FVS --arcs [--weights] [--num-vertices]
221222
ILP, CircuitSAT (via reduction only)
222223
@@ -329,6 +330,9 @@ pub struct CreateArgs {
329330
/// Variable bounds for CVP as "lower,upper" (e.g., "-10,10") [default: -10,10]
330331
#[arg(long, allow_hyphen_values = true)]
331332
pub bounds: Option<String>,
333+
/// Input strings for LCS (semicolon-separated, e.g., "ABAC;BACA")
334+
#[arg(long)]
335+
pub strings: Option<String>,
332336
/// Directed arcs for directed graph problems (e.g., 0>1,1>2,2>0)
333337
#[arg(long)]
334338
pub arcs: Option<String>,

problemreductions-cli/src/commands/create.rs

Lines changed: 20 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ use crate::util;
66
use anyhow::{bail, Context, Result};
77
use problemreductions::models::algebraic::{ClosestVectorProblem, BMF};
88
use problemreductions::models::graph::GraphPartitioning;
9-
use problemreductions::models::misc::{BinPacking, PaintShop};
9+
use problemreductions::models::misc::{BinPacking, LongestCommonSubsequence, PaintShop};
1010
use problemreductions::prelude::*;
1111
use problemreductions::registry::collect_schemas;
1212
use problemreductions::topology::{
@@ -47,6 +47,7 @@ fn all_data_flags_empty(args: &CreateArgs) -> bool {
4747
&& args.basis.is_none()
4848
&& args.target_vec.is_none()
4949
&& args.bounds.is_none()
50+
&& args.strings.is_none()
5051
&& args.arcs.is_none()
5152
}
5253

@@ -424,6 +425,24 @@ pub fn create(args: &CreateArgs, out: &OutputConfig) -> Result<()> {
424425
(ser(BMF::new(matrix, rank))?, resolved_variant.clone())
425426
}
426427

428+
// LongestCommonSubsequence
429+
"LongestCommonSubsequence" => {
430+
let strings_str = args.strings.as_deref().ok_or_else(|| {
431+
anyhow::anyhow!(
432+
"LCS requires --strings\n\n\
433+
Usage: pred create LCS --strings \"ABAC;BACA\""
434+
)
435+
})?;
436+
let strings: Vec<Vec<u8>> = strings_str
437+
.split(';')
438+
.map(|s| s.trim().as_bytes().to_vec())
439+
.collect();
440+
(
441+
ser(LongestCommonSubsequence::new(strings))?,
442+
resolved_variant.clone(),
443+
)
444+
}
445+
427446
// ClosestVectorProblem
428447
"ClosestVectorProblem" => {
429448
let basis_str = args.basis.as_deref().ok_or_else(|| {

problemreductions-cli/src/dispatch.rs

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
use anyhow::{bail, Context, Result};
22
use problemreductions::models::algebraic::{ClosestVectorProblem, ILP};
3-
use problemreductions::models::misc::{BinPacking, Knapsack, SubsetSum};
3+
use problemreductions::models::misc::{BinPacking, Knapsack, LongestCommonSubsequence, SubsetSum};
44
use problemreductions::prelude::*;
55
use problemreductions::rules::{MinimizeSteps, ReductionGraph};
66
use problemreductions::solvers::{BruteForce, ILPSolver, Solver};
@@ -246,6 +246,7 @@ pub fn load_problem(
246246
_ => deser_opt::<ClosestVectorProblem<i32>>(data),
247247
},
248248
"Knapsack" => deser_opt::<Knapsack>(data),
249+
"LongestCommonSubsequence" => deser_opt::<LongestCommonSubsequence>(data),
249250
"MinimumFeedbackVertexSet" => deser_opt::<MinimumFeedbackVertexSet<i32>>(data),
250251
"SubsetSum" => deser_sat::<SubsetSum>(data),
251252
_ => bail!("{}", crate::problem_name::unknown_problem_error(&canonical)),
@@ -309,6 +310,7 @@ pub fn serialize_any_problem(
309310
_ => try_ser::<ClosestVectorProblem<i32>>(any),
310311
},
311312
"Knapsack" => try_ser::<Knapsack>(any),
313+
"LongestCommonSubsequence" => try_ser::<LongestCommonSubsequence>(any),
312314
"MinimumFeedbackVertexSet" => try_ser::<MinimumFeedbackVertexSet<i32>>(any),
313315
"SubsetSum" => try_ser::<SubsetSum>(any),
314316
_ => bail!("{}", crate::problem_name::unknown_problem_error(&canonical)),

problemreductions-cli/src/problem_name.rs

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ pub const ALIASES: &[(&str, &str)] = &[
2020
("KSAT", "KSatisfiability"),
2121
("TSP", "TravelingSalesman"),
2222
("CVP", "ClosestVectorProblem"),
23+
("LCS", "LongestCommonSubsequence"),
2324
("MaxMatching", "MaximumMatching"),
2425
("FVS", "MinimumFeedbackVertexSet"),
2526
];
@@ -54,6 +55,7 @@ pub fn resolve_alias(input: &str) -> String {
5455
"binpacking" => "BinPacking".to_string(),
5556
"cvp" | "closestvectorproblem" => "ClosestVectorProblem".to_string(),
5657
"knapsack" => "Knapsack".to_string(),
58+
"lcs" | "longestcommonsubsequence" => "LongestCommonSubsequence".to_string(),
5759
"fvs" | "minimumfeedbackvertexset" => "MinimumFeedbackVertexSet".to_string(),
5860
"subsetsum" => "SubsetSum".to_string(),
5961
_ => input.to_string(), // pass-through for exact names

src/lib.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ pub mod prelude {
4545
KColoring, MaxCut, MaximalIS, MaximumClique, MaximumIndependentSet, MaximumMatching,
4646
MinimumDominatingSet, MinimumFeedbackVertexSet, MinimumVertexCover, TravelingSalesman,
4747
};
48-
pub use crate::models::misc::{BinPacking, Factoring, Knapsack, PaintShop, SubsetSum};
48+
pub use crate::models::misc::{BinPacking, Factoring, Knapsack, LongestCommonSubsequence, PaintShop, SubsetSum};
4949
pub use crate::models::set::{MaximumSetPacking, MinimumSetCovering};
5050

5151
// Core traits

0 commit comments

Comments
 (0)