Refactor explanation of dominance rules and examples

ppdewolf · web-flow · commit 15f4d44b73ea · 2026-03-19T14:14:06.000+01:00
diff --git a/chapters/04/_04-magnitude-tabular-data-02-Disclosure-Control-Concepts-for-Magnitude-Tabular-Data.qmd b/chapters/04/_04-magnitude-tabular-data-02-Disclosure-Control-Concepts-for-Magnitude-Tabular-Data.qmd
@@ -133,7 +133,7 @@ When we replace a $(2,k)$‑dominance rule, by a $p\%$‑rule, the natural choic
 
 Thus, a $(2,80)$‑dominance rule would be replaced by a $p\%$‑rule with $p = 25$, a $(2,95)$‑dominance rule by a $p\%$‑rule with $p = 5.26$ .
 
-If we also derive $p$ from this formula, when replacing a $(1,k)$‑dominance rule, we will obtain a much larger number of sensitive cells. In addition to the cells which are unsafe according to the $(1,k)$-dominance rule which will then also be unsafe according to the $p\%$‑rule, there will be cells which were safe according to the $(1,k)$‑dominance rule, but are not safe according to the $p\%$‑rule, because the rule correctly considers the insider knowledge of a large second largest contributor. We could then put up with this increase in the number of sensitive cells. Alternatively, we could consider the number of sensitive cells that we used to assign (with the $(1,k)$-dominance rule) as a kind of a maximum-prize we are prepared to 'pay' for data protection. In that case we will reduce the parameter $p$. The effect will be that some of the cells we used to consider as sensitive according to the $(1,k)$-dominance rule will now not be sensitive. But this would be justified because those cells are less sensitive as the cells which are unsafe according to the $p\%$-rule, but are not according to the former $(1,k)$-dominance rule, as illustrated above by Example 1.
+If we also derive $p$ from this formula, when replacing a $(1,k)$‑dominance rule, we will obtain a much larger number of sensitive cells. In addition to the cells which are unsafe according to the $(1,k)$-dominance rule which will then also be unsafe according to the $p\%$‑rule, there will be cells which were safe according to the $(1,k)$‑dominance rule, but are not safe according to the $p\%$‑rule, because the rule correctly considers the insider knowledge of a large second largest contributor. We could then put up with this increase in the number of sensitive cells. Alternatively, we could consider the number of sensitive cells that we used to assign (with the $(1,k)$-dominance rule) as a kind of a maximum-price we are prepared to 'pay' for data protection. In that case we will reduce the parameter $p$. The effect will be that some of the cells we used to consider as sensitive according to the $(1,k)$-dominance rule will now not be sensitive. But this would be justified because those cells are less sensitive as the cells which are unsafe according to the $p\%$-rule, but are not according to the former $(1,k)$-dominance rule, as illustrated above by Example 1.
 
 :::{.callout-note appearance="simple"}
 **Example 2**
@@ -178,7 +178,7 @@ $$
 
 \
 ***The $(p,q)$-rule***\
-A well known extension of the $p\%$-rule is the so called prior‑posterior $(p,q)$‑rule. With the extended rule, one can formally account for general knowledge about individual contributions assumed to be around *prior* to the publication, in particular that the second largest contributor can estimate the smaller contributions $X_{R} = \sum_{ i > 2 } x_{1}$ to within $q\%$. An aggregate is then considered unsafe when the second largest respondent could estimate the largest contribution $x_{1}$ to within $p$ percent of $x_{1}$ , by subtracting her own contribution and this estimate ${\hat{X}}_{R}$ from the cell total, *i.e.* when $|\left( X - x_{2} \right) - x_{1} - {\hat{X}}_{R}| < \frac{p}{100} x_{1}$. Because $\left( X - x_{2} \right) - x_{1} = X_{R}$, the left hand side is assumed to be less than $\frac{q}{100} X_{R}$. So the aggregate is considered to be sensitive, if $X_{R} < \frac{p}{q} x_{1}$. Evidently, it is actually the ratio $\frac{p}{q}$ which determines which cells are considered safe, or unsafe. Therefore, any $(p,q)$‑rule with $q < 100$ can also be expressed as $( p^*, q^*)$‑rule, with $q^* = 100$ and
+A well known extension of the $p\%$-rule is the so called prior‑posterior $(p,q)$‑rule. With the extended rule, one can formally account for general knowledge about individual contributions assumed to be around *prior* to the publication, in particular that the second largest contributor can estimate the smaller contributions $X_{R} = \sum_{ i > 2 } x_{i}$ to within $q\%$. An aggregate is then considered unsafe when the second largest respondent could estimate the largest contribution $x_{1}$ to within $p$ percent of $x_{1}$ , by subtracting her own contribution and this estimate ${\hat{X}}_{R}$ from the cell total, *i.e.* when $|\left( X - x_{2} \right) - x_{1} - {\hat{X}}_{R}| < \frac{p}{100} x_{1}$. Because $\left( X - x_{2} \right) - x_{1} = X_{R}$, the left hand side is assumed to be less than $\frac{q}{100} X_{R}$. So the aggregate is considered to be sensitive, if $X_{R} < \frac{p}{q} x_{1}$. Evidently, it is actually the ratio $\frac{p}{q}$ which determines which cells are considered safe, or unsafe. Therefore, any $(p,q)$‑rule with $q < 100$ can also be expressed as $( p^*, q^*)$‑rule, with $q^* = 100$ and
 $$
 p^* := 100  \frac{p}{q}
 $${#eq-p-star}