|
108 | 108 | "BoundedComponentSpanningForest": [Bounded Component Spanning Forest], |
109 | 109 | "BinPacking": [Bin Packing], |
110 | 110 | "BoyceCoddNormalFormViolation": [Boyce-Codd Normal Form Violation], |
| 111 | + "ConsistencyOfDatabaseFrequencyTables": [Consistency of Database Frequency Tables], |
111 | 112 | "ClosestVectorProblem": [Closest Vector Problem], |
112 | 113 | "ConsecutiveSets": [Consecutive Sets], |
113 | 114 | "MinimumMultiwayCut": [Minimum Multiway Cut], |
@@ -3201,6 +3202,77 @@ A classical NP-complete problem from Garey and Johnson @garey1979[Ch.~3, p.~76], |
3201 | 3202 | A relation satisfies _Boyce-Codd Normal Form_ (BCNF) if every non-trivial functional dependency $X arrow.r Y$ has $X$ as a superkey --- that is, $X^+$ = $A'$. This classical NP-complete problem from database theory asks whether the given attribute subset $A'$ violates BCNF. The NP-completeness was established by Beeri and Bernstein (1979) via reduction from Hitting Set. It appears as problem SR29 in Garey and Johnson's compendium (category A4: Storage and Retrieval). |
3202 | 3203 | ] |
3203 | 3204 |
|
| 3205 | +#{ |
| 3206 | + let x = load-model-example("ConsistencyOfDatabaseFrequencyTables") |
| 3207 | + let num_objects = x.instance.num_objects |
| 3208 | + let num_attrs = x.instance.attribute_domains.len() |
| 3209 | + let domains = x.instance.attribute_domains |
| 3210 | + let table01 = x.instance.frequency_tables.at(0).counts |
| 3211 | + let table12 = x.instance.frequency_tables.at(1).counts |
| 3212 | + let config = x.optimal_config |
| 3213 | + let value = (object, attr) => config.at(object * num_attrs + attr) |
| 3214 | + [ |
| 3215 | + #problem-def("ConsistencyOfDatabaseFrequencyTables")[ |
| 3216 | + Given a finite set $V$ of objects, a finite set $A$ of attributes, a domain $D_a$ for each $a in A$, a collection of pairwise frequency tables $f_(a,b): D_a times D_b -> ZZ^(>=0)$ whose entries sum to $|V|$, and a set $K subset.eq V times A times union_(a in A) D_a$ of known triples $(v, a, x)$, determine whether there exist functions $g_a: V -> D_a$ such that $g_a(v) = x$ for every $(v, a, x) in K$ and, for every published table $f_(a,b)$, exactly $f_(a,b)(x, y)$ objects satisfy $(g_a(v), g_b(v)) = (x, y)$. |
| 3217 | + ][ |
| 3218 | + Consistency of Database Frequency Tables is Garey and Johnson's storage-and-retrieval problem SR35 @garey1979. It asks whether released pairwise marginals can come from some hidden microdata table while respecting already known individual attribute values, making it a natural decision problem in statistical disclosure control. The direct witness space implemented in this crate assigns one categorical variable to each object-attribute pair, so exhaustive search runs in $O^*((product_(a in A) |D_a|)^(|V|))$. #footnote[This is the exact search bound induced by the implementation's configuration space; no faster general exact worst-case algorithm is claimed here.] |
| 3219 | + |
| 3220 | + *Example.* Let $|V| = #num_objects$ with attributes $a_0, a_1, a_2$ having domain sizes $#domains.at(0)$, $#domains.at(1)$, and $#domains.at(2)$ respectively. Publish the pairwise tables |
| 3221 | + |
| 3222 | + #align(center, table( |
| 3223 | + columns: 4, |
| 3224 | + align: center, |
| 3225 | + table.header([$f_(a_0, a_1)$], [$0$], [$1$], [$2$]), |
| 3226 | + [$0$], [#table01.at(0).at(0)], [#table01.at(0).at(1)], [#table01.at(0).at(2)], |
| 3227 | + [$1$], [#table01.at(1).at(0)], [#table01.at(1).at(1)], [#table01.at(1).at(2)], |
| 3228 | + )) |
| 3229 | + |
| 3230 | + and |
| 3231 | + |
| 3232 | + #align(center, table( |
| 3233 | + columns: 3, |
| 3234 | + align: center, |
| 3235 | + table.header([$f_(a_1, a_2)$], [$0$], [$1$]), |
| 3236 | + [$0$], [#table12.at(0).at(0)], [#table12.at(0).at(1)], |
| 3237 | + [$1$], [#table12.at(1).at(0)], [#table12.at(1).at(1)], |
| 3238 | + [$2$], [#table12.at(2).at(0)], [#table12.at(2).at(1)], |
| 3239 | + )) |
| 3240 | + |
| 3241 | + together with the known values $K = {(v_0, a_0, 0), (v_3, a_0, 1), (v_1, a_2, 1)}$. One consistent completion is: |
| 3242 | + |
| 3243 | + #align(center, table( |
| 3244 | + columns: 4, |
| 3245 | + align: center, |
| 3246 | + table.header([object], [$a_0$], [$a_1$], [$a_2$]), |
| 3247 | + [$v_0$], [#value(0, 0)], [#value(0, 1)], [#value(0, 2)], |
| 3248 | + [$v_1$], [#value(1, 0)], [#value(1, 1)], [#value(1, 2)], |
| 3249 | + [$v_2$], [#value(2, 0)], [#value(2, 1)], [#value(2, 2)], |
| 3250 | + [$v_3$], [#value(3, 0)], [#value(3, 1)], [#value(3, 2)], |
| 3251 | + [$v_4$], [#value(4, 0)], [#value(4, 1)], [#value(4, 2)], |
| 3252 | + [$v_5$], [#value(5, 0)], [#value(5, 1)], [#value(5, 2)], |
| 3253 | + )) |
| 3254 | + |
| 3255 | + This witness satisfies every published count: in $f_(a_0, a_1)$ each of the six cells appears exactly once, while in $f_(a_1, a_2)$ the five occupied cells have multiplicities $1, 1, 2, 1, 1$ exactly as listed above. It also respects all three known triples, so the answer is YES. |
| 3256 | + ] |
| 3257 | + ] |
| 3258 | +} |
| 3259 | + |
| 3260 | +#reduction-rule("ConsistencyOfDatabaseFrequencyTables", "ILP")[ |
| 3261 | + Each object-attribute pair is encoded by a one-hot binary vector over its domain, and each pairwise frequency count becomes a linear equality over McCormick auxiliary variables that linearize the product of two one-hot indicators. Known values are fixed by pinning the corresponding indicator to 1. The resulting ILP is a pure feasibility problem (trivial objective). |
| 3262 | +][ |
| 3263 | + _Construction._ Let $V$ be the set of objects, $A$ the set of attributes with domains $D_a$, $cal(T)$ the set of published frequency tables, and $K$ the set of known triples $(v, a, x)$. |
| 3264 | + |
| 3265 | + _Variables:_ (1) Binary one-hot indicators $y_(v,a,x) in {0, 1}$ for each object $v in V$, attribute $a in A$, and value $x in D_a$: $y_(v,a,x) = 1$ iff object $v$ takes value $x$ for attribute $a$. (2) Binary auxiliary variables $z_(t,v,x,x') in {0, 1}$ for each table $t in cal(T)$ (with attribute pair $(a, b)$), object $v in V$, and cell $(x, x') in D_a times D_b$: $z_(t,v,x,x') = 1$ iff object $v$ realizes cell $(x, x')$ in table $t$. |
| 3266 | + |
| 3267 | + _Constraints:_ (1) One-hot: $sum_(x in D_a) y_(v,a,x) = 1$ for all $v in V$, $a in A$. (2) Known values: $y_(v,a,x) = 1$ for each $(v, a, x) in K$. (3) McCormick linearization for $z_(t,v,x,x') = y_(v,a,x) dot y_(v,b,x')$: $z_(t,v,x,x') lt.eq y_(v,a,x)$, $z_(t,v,x,x') lt.eq y_(v,b,x')$, $z_(t,v,x,x') gt.eq y_(v,a,x) + y_(v,b,x') - 1$. (4) Frequency counts: $sum_(v in V) z_(t,v,x,x') = f_t (x, x')$ for each table $t$ and cell $(x, x')$. |
| 3268 | + |
| 3269 | + _Objective:_ Minimize $0$ (feasibility problem). |
| 3270 | + |
| 3271 | + _Correctness._ ($arrow.r.double$) A consistent assignment defines one-hot indicators and their products; all constraints hold by construction, and the frequency equalities match the published counts. ($arrow.l.double$) Any feasible binary solution assigns exactly one value per object-attribute (one-hot), respects known values, and the McCormick constraints force $z_(t,v,x,x') = y_(v,a,x) dot y_(v,b,x')$ for binary variables, so the frequency equalities certify consistency. |
| 3272 | + |
| 3273 | + _Solution extraction._ For each object $v$ and attribute $a$, find $x$ with $y_(v,a,x) = 1$; assign value $x$ to $(v, a)$. |
| 3274 | +] |
| 3275 | + |
3204 | 3276 | #problem-def("SumOfSquaresPartition")[ |
3205 | 3277 | Given a finite set $A = {a_0, dots, a_(n-1)}$ with sizes $s(a_i) in ZZ^+$, a positive integer $K lt.eq |A|$ (number of groups), and a positive integer $J$ (bound), determine whether $A$ can be partitioned into $K$ disjoint sets $A_1, dots, A_K$ such that $sum_(i=1)^K (sum_(a in A_i) s(a))^2 lt.eq J$. |
3206 | 3278 | ][ |
|
0 commit comments