Merge pull request #740 from PlanExeOrg/napkin-math/phase2-extract-decomposition-rules

neoneye · web-flow · commit d73ff9b5ac2b · 2026-05-21T03:05:15.000+02:00
napkin-math: Phase 2 extract — source-arithmetic preservation, threshold-pairing parity, source_text truncation
diff --git a/experiments/napkin_math/.claude/skills/extract-parameters-from-digest/system-prompt.txt b/experiments/napkin_math/.claude/skills/extract-parameters-from-digest/system-prompt.txt
@@ -55,6 +55,7 @@ Hard limits:
 - Do not include values merely because they appear in the digest.
 - Prefer missing-but-needed modelling values over minor explicit values.
 - For source_text from a compressed section, use the bullet text with the inline `[…]` tag removed. For source_text from a raw section, use a short verbatim quote or paraphrase. Never include the `[…]` tag in source_text.
+- The 20-word cap is hard. If the source phrase exceeds 20 words, truncate to the subject-plus-threshold portion that names the variable and its bound, drop the consequence clause, and end with an ellipsis if mid-sentence. Do not paste an entire if/then conditional when only the antecedent's threshold portion is load-bearing for the key_value. Count words before emitting; if at or above the cap, shorten further.
 
 Goal:
 
@@ -377,6 +378,22 @@ Neutral patterns:
 - combined_coverage_ratio = available_capacity / total_required_capacity
 - total_required_capacity = requirement_a + requirement_b
 
+Source-arithmetic preservation rule:
+
+When the source explicitly relates a dependent quantity to its named components through an arithmetic operation — sum, product, ratio, fraction of a base, scaled magnitude, weighted average — the extractor MUST preserve that relationship as a recommended_first_calculation. Do not declare the dependent quantity as a flat bounded variable when the source supplies both its components and the arithmetic between them. Sampling a derived quantity independently of the components it is derived from double-counts uncertainty, lets the simulation produce physically incoherent combinations (a low component with a high "total"), and obscures the cause of any sensitivity finding.
+
+Three common patterns to recognise:
+
+1. Aggregate sum. Apply this pattern ONLY when the source states or clearly implies that a total is computed from named constituents — the source says "the total is A + B + C", "broken down as named line items summing to X", or otherwise aggregates the named components into the stated total. In those cases, declare the total as `aggregate = sum_of_components`, with each component in `key_values` or `missing_values_to_estimate`, and the aggregate as a recommended_first_calculation. Do NOT apply this pattern to independent caps, ceilings, committed budgets, targets, or funding envelopes simply because allocations or line items appear nearby; those remain primitive thresholds in `key_values` and are tested via the threshold-pairing rule above (spend-vs-cap margin, not sum-vs-cap identity).
+
+2. Burn rate × duration. The source names a per-period or per-unit rate (a value per unit of time, area, count, person, capacity) AND a separable duration or count the rate applies over. Declare the dependent total as `total = rate * duration`. The rate and the duration each go in `key_values` or `missing_values_to_estimate`; the total is the recommended_first_calculation. Do not bound the dependent total directly when the source supplies the two operands.
+
+3. Explicit decomposition block. The source explicitly does the arithmetic itself — a base quantity, one or more share fractions, possibly a perturbation magnitude, and the named resulting dependent value. Preserve every named operand and the operation between them as a calculation. Do not collapse the source's explicit math into a single flat variable; doing so loses the source's stated sensitivity to each operand.
+
+Discipline shared by all three patterns: every quantity for which the source itself supplies a formula is a derived quantity, not a primitive. Primitives go in `key_values` or `missing_values_to_estimate`; derived quantities go in `recommended_first_calculations` when the relation is critical to a viability claim, or in `derived_questions` when the relation is useful but secondary. The choice between those two containers is a placement decision; it does not relax the rule that the source's stated arithmetic must be preserved as a calculation rather than collapsed into a flat variable.
+
+When hard limits make this preservation feel tight, do not collapse the decomposition. If the arithmetic relation is critical to a viability claim, keep it in `recommended_first_calculations`; if it is useful but secondary, `derived_questions` is the right home; if neither fits under cap pressure, drop a less load-bearing key_value to make room. Same posture as the threshold-pairing rule above.
+
 Critical-output completeness rule:
 
 Before returning JSON, identify the plan's 1-3 most important viability claims. For each claim, ensure there is either:
diff --git a/experiments/napkin_math/.claude/skills/extract-parameters-from-full/system-prompt.txt b/experiments/napkin_math/.claude/skills/extract-parameters-from-full/system-prompt.txt
@@ -15,6 +15,7 @@ Hard limits:
 - Return at most 5 recommended_first_calculations.
 - Each comment must be at most 25 words.
 - Each source_text must be at most 20 words.
+- The 20-word cap is hard. If the source phrase exceeds 20 words, truncate to the subject-plus-threshold portion that names the variable and its bound, drop the consequence clause, and end with an ellipsis if mid-sentence. Do not paste an entire if/then conditional when only the antecedent's threshold portion is load-bearing for the key_value. Count words before emitting; if at or above the cap, shorten further.
 - Never output suggested_low, suggested_base, or suggested_high inside key_values.
 - Do not include values merely because they are present in the report.
 - Prefer missing-but-needed modelling values over minor explicit values.
@@ -311,6 +312,20 @@ Use generic plan-native ids only. Good neutral patterns include:
 
 If the plan provides bounded inputs such as capacity, hours, rate, utilization, demand, staffing level, or overhead, do not leave those inputs dead-ended when they are needed to evaluate a stated coverage or capacity claim.
 
+Threshold pairing rule:
+
+When you extract a key_value that names a numeric threshold — a floor, cap, ceiling, minimum, maximum, target volume, target share, target deadline, or any other "must be at least X" / "must not exceed X" boundary the plan states — you MUST also emit a paired margin calculation comparing the realised quantity against the threshold. The pairing has three parts:
+
+1. The threshold goes in key_values (you have already done this step).
+2. The realised quantity goes in missing_values_to_estimate if the source does not name it. The realised quantity is the variable the threshold tests against, not the threshold itself.
+3. The margin calculation goes in recommended_first_calculations or derived_questions, with a formula like `realised - threshold` (so positive = pass) when the threshold is a floor, or `threshold - realised` when the threshold is a cap. Name the output with the `_margin` or `_surplus` suffix.
+
+Without the paired margin calc, the threshold is a dead-end variable: it is extracted, generate-bounds samples it, but the simulation never tests whether it is cleared. The threshold then contributes to bounds noise without contributing to any gate verdict.
+
+When hard limits make the pairing feel tight, do NOT drop the pairing. Drop a less load-bearing key_value, or move a less-critical calculation to derived_questions, to make room. A threshold without its pairing fails the "no dead-end variables" rule and the "critical-output completeness rule" simultaneously.
+
+Apply this rule even when the threshold is one of several stated by the plan. Every extracted threshold gets a pairing or the threshold is dropped from key_values. There is no third option.
+
 Combined viability gate preservation:
 
 When a plan frames a strategy as surviving, absorbing, balancing, covering, offsetting, or withstanding multiple pressures, shocks, gaps, constraints, or risks, preserve the highest-level combined viability test.
@@ -326,6 +341,22 @@ Neutral patterns:
 - combined_coverage_ratio = available_capacity / total_required_capacity
 - total_required_capacity = requirement_a + requirement_b
 
+Source-arithmetic preservation rule:
+
+When the source explicitly relates a dependent quantity to its named components through an arithmetic operation — sum, product, ratio, fraction of a base, scaled magnitude, weighted average — the extractor MUST preserve that relationship as a recommended_first_calculation. Do not declare the dependent quantity as a flat bounded variable when the source supplies both its components and the arithmetic between them. Sampling a derived quantity independently of the components it is derived from double-counts uncertainty, lets the simulation produce physically incoherent combinations (a low component with a high "total"), and obscures the cause of any sensitivity finding.
+
+Three common patterns to recognise:
+
+1. Aggregate sum. Apply this pattern ONLY when the source states or clearly implies that a total is computed from named constituents — the source says "the total is A + B + C", "broken down as named line items summing to X", or otherwise aggregates the named components into the stated total. In those cases, declare the total as `aggregate = sum_of_components`, with each component in `key_values` or `missing_values_to_estimate`, and the aggregate as a recommended_first_calculation. Do NOT apply this pattern to independent caps, ceilings, committed budgets, targets, or funding envelopes simply because allocations or line items appear nearby; those remain primitive thresholds in `key_values` and are tested via the threshold-pairing rule above (spend-vs-cap margin, not sum-vs-cap identity).
+
+2. Burn rate × duration. The source names a per-period or per-unit rate (a value per unit of time, area, count, person, capacity) AND a separable duration or count the rate applies over. Declare the dependent total as `total = rate * duration`. The rate and the duration each go in `key_values` or `missing_values_to_estimate`; the total is the recommended_first_calculation. Do not bound the dependent total directly when the source supplies the two operands.
+
+3. Explicit decomposition block. The source explicitly does the arithmetic itself — a base quantity, one or more share fractions, possibly a perturbation magnitude, and the named resulting dependent value. Preserve every named operand and the operation between them as a calculation. Do not collapse the source's explicit math into a single flat variable; doing so loses the source's stated sensitivity to each operand.
+
+Discipline shared by all three patterns: every quantity for which the source itself supplies a formula is a derived quantity, not a primitive. Primitives go in `key_values` or `missing_values_to_estimate`; derived quantities go in `recommended_first_calculations` when the relation is critical to a viability claim, or in `derived_questions` when the relation is useful but secondary. The choice between those two containers is a placement decision; it does not relax the rule that the source's stated arithmetic must be preserved as a calculation rather than collapsed into a flat variable.
+
+When hard limits make this preservation feel tight, do not collapse the decomposition. If the arithmetic relation is critical to a viability claim, keep it in `recommended_first_calculations`; if it is useful but secondary, `derived_questions` is the right home; if neither fits under cap pressure, drop a less load-bearing key_value to make room. Same posture as the threshold-pairing rule above.
+
 Critical-output completeness rule:
 
 Before returning JSON, identify the plan's 1-3 most important viability claims. For each claim, ensure there is either:
diff --git a/experiments/napkin_math/docs/20260520_plan.md b/experiments/napkin_math/docs/20260520_plan.md