You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: experiments/napkin_math/docs/20260520_plan.md
+36-26Lines changed: 36 additions & 26 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -76,13 +76,11 @@ separated below.
76
76
77
77
-**PR #737** (merged) — Phase 1 compress prompts + initial extract threshold-pairing rule + `OPTIMIZE_INSTRUCTIONS` discipline banner. Substantive content described below under "PR #737 detail" for continuity.
78
78
-**PR #739** (merged) — Proposal 141 ("Source-Preservation Audit for the Napkin Math Pipeline") landed as design only. No code or prompt change. Implementation deferred.
79
-
80
-
### Open for merge
81
-
82
-
-**PR #740** — Phase 2 extract-prompt rules. Three commits land in `extract-parameters-from-digest` and `extract-parameters-from-full`:
79
+
-**PR #740** (merged) — Phase 2 extract-prompt rules. Four commits landed in `extract-parameters-from-digest` and `extract-parameters-from-full`:
-`19f927b7` — Tightened aggregate-sum wording so independent caps/envelopes are NOT collapsed into derived sums; reconciled discipline-shared paragraph with the cap-pressure paragraph.
85
82
-`8f94c8cd` — 20-word `source_text` cap reinforced with explicit truncation discipline (drop the consequence clause, end with ellipsis if mid-sentence).
83
+
-`f9d90ebb` — Updated this plan-status section for PR #740's narrow scope and verification limits.
86
84
All edits applied symmetrically to both extract skills. No corpus literals introduced.
87
85
88
86
### PR #737 detail (already on main)
@@ -225,7 +223,7 @@ too:
225
223
| Phase | Skill / module | Status |
226
224
|---|---|---|
227
225
| 1 |`compress_report_section.py`|**DONE on main via PR #737** (R2.3 numeric_values, R2.3 missing_data, R2.5 gates_and_thresholds, OPTIMIZE_INSTRUCTIONS banner) |
228
-
| 2 |`extract-parameters-from-{full,digest}`|**PARTIAL** — threshold-pairing on `from-digest` shipped in PR #737. Source-arithmetic preservation (Patterns 1/2/3 for R1.1, R2.3, R2.4), threshold-pairing parity into `from-full`, aggregate-sum tightening, and source_text truncation discipline are in PR #740, open for merge. After #740 merges, Phase 2's prompt-side work is complete for the original directives. |
226
+
| 2 |`extract-parameters-from-{full,digest}`|**DONE for prompt-side directives on main via PR #740** — threshold-pairing on `from-digest` shipped in PR #737; source-arithmetic preservation (Patterns 1/2/3 for R1.1, R2.3, R2.4), threshold-pairing parity into `from-full`, aggregate-sum tightening, and source_text truncation discipline shipped in PR #740. Behavioural validation on a different LLM remains a follow-up, not additional prompt-scope work. |
229
227
| 3 |`validate-parameters`| not started for the no-dead-end / threshold-pair extensions in the plan. Note: `validate_parameters.py` itself exists and was used to validate v51. |
230
228
| 4 |`generate-bounds`| not started |
231
229
| 5 |`verify-bounds-citations` (new) | not started |
@@ -237,30 +235,42 @@ too:
237
235
238
236
### Next likely move
239
237
240
-
After PR #740 merges, the next-most-load-bearing follow-ups, in
241
-
preferred order:
242
-
243
-
1.**Implement proposal 141** (`dropped_signals` schema in extract
238
+
After PR #740, the next work should be ordered by what improves
239
+
napkin_math output quality most directly, not by what is easiest to
240
+
measure. Preferred order:
241
+
242
+
1.**Compress-LLM variance handling.** Deterministic retry/merge or
243
+
lower-temperature reruns for high-impact compress buckets should
244
+
come next. The clearest driver is the paperclip OPC UA / latency
245
+
tripwires that v49 surfaced and v50/v51 drop at the compress
246
+
layer. This is upstream of extraction: if the digest does not
247
+
carry the tripwire, no extract prompt can recover it. Proposal
248
+
141 would classify this loss, but variance handling is the piece
249
+
that can restore the missing source signal.
250
+
2.**Implement proposal 141** (`dropped_signals` schema in extract
0 commit comments