|
| 1 | +# Draft reply to darval — issue #14 (O1 instrumentation ask) |
| 2 | + |
| 3 | +**Instructions**: review before posting. Target comment at |
| 4 | +https://github.com/cdeust/Cortex/issues/14 |
| 5 | + |
| 6 | +--- |
| 7 | + |
| 8 | +Quick update on **O1** from your 3.13.2 report — the one where |
| 9 | +`cohort_correction` moved bimodality 0.07% in 336 s while recall |
| 10 | +quality improved dramatically. I ran the diagnosis locally before |
| 11 | +shipping a fix, and the answer is not obvious from in-process |
| 12 | +simulation alone. I'd like to read your next `consolidate` output |
| 13 | +with the new instrumentation I just merged to `main`. |
| 14 | + |
| 15 | +## What I found so far |
| 16 | + |
| 17 | +I simulated several distribution shapes that match your reported |
| 18 | +stats (`mean=0.6487, std=0.3162, bimodality=0.8433, cohort_size=33604`): |
| 19 | + |
| 20 | +| Simulated distribution | Δ bimodality per cycle | |
| 21 | +|---|---| |
| 22 | +| Wide two-peak (σ=0.08 around each mode) | −0.022 | |
| 23 | +| Narrow hot peak at 0.98 + wide cold tail | −0.046 | |
| 24 | +| Three-mode (0.98/0.5/0.2) | −0.014 | |
| 25 | +| Saturated hot peak + uniform cold tail | −0.024 | |
| 26 | + |
| 27 | +Every reasonable reconstruction of your numbers shows the correction |
| 28 | +should move bimodality by **1.4–4.6 percentage points per cycle** at |
| 29 | +the default `correction_strength=0.3`. You saw **0.07 pp** — at least |
| 30 | +20× less than expected. |
| 31 | + |
| 32 | +Two live hypotheses: |
| 33 | + |
| 34 | +1. **The correction IS moving per-row heat** (which is what actually |
| 35 | + drives WRRF ranking), but the bimodality metric is a poor index of |
| 36 | + that — it measures global distribution shape, not per-row moves. |
| 37 | + This would explain why recall improved dramatically while the |
| 38 | + metric barely moved. |
| 39 | + |
| 40 | +2. **Something is suppressing per-row writes** — e.g. protected/stale |
| 41 | + filter, pool-connection race, a silent fallback path I'm missing. |
| 42 | + This would be a real bug and the recall improvement came from |
| 43 | + somewhere else entirely (reranker, query dispatch, new heat-weight |
| 44 | + mix). |
| 45 | + |
| 46 | +Without per-row movement data from your production distribution I |
| 47 | +can't choose between them. |
| 48 | + |
| 49 | +## What I shipped to `main` (not tagged yet) |
| 50 | + |
| 51 | +Commit [`ae6f280`](https://github.com/cdeust/Cortex/commit/ae6f280) |
| 52 | +adds three new fields to the homeostatic cycle output: |
| 53 | + |
| 54 | +```json |
| 55 | +"homeostatic": { |
| 56 | + "scaling_kind": "cohort_correction", |
| 57 | + "cohort_size": 33604, |
| 58 | + "bimodality_before": 0.8433, |
| 59 | + "bimodality_after": 0.8427, |
| 60 | + |
| 61 | + "cohort_mean_heat_delta": 0.1234, // NEW |
| 62 | + "cohort_max_heat_delta": 0.1650, // NEW |
| 63 | + "cohort_rows_written": 33600 // NEW |
| 64 | +} |
| 65 | +``` |
| 66 | + |
| 67 | +These let us see per-row movement directly without inferring it from |
| 68 | +a shape metric. |
| 69 | + |
| 70 | +**Expected values for hypothesis (1)**: your cohort members have |
| 71 | +heat ≈ 0.93 pre-correction. With default `strength=0.3` and |
| 72 | +`target=0.4`, each drops by `0.3 × (0.93 − 0.4) = 0.159`. So: |
| 73 | + |
| 74 | + * `cohort_mean_heat_delta` ≈ **0.13–0.17** (depending on the hot-peak |
| 75 | + shape) |
| 76 | + * `cohort_max_heat_delta` ≈ **0.18** (for memories near heat=1.0) |
| 77 | + * `cohort_rows_written` ≈ `cohort_size` (every cohort member > 0.001 |
| 78 | + delta → every one writes) |
| 79 | + |
| 80 | +**If these match your expected values**, cohort_correction is doing |
| 81 | +its job on ranking — the fix is to add a better retrieval-relevant |
| 82 | +health metric, not to change the correction behaviour. |
| 83 | + |
| 84 | +**If `cohort_mean_heat_delta` is close to 0 or `cohort_rows_written` |
| 85 | +is much less than `cohort_size`**, there's a real bug and I'll fix |
| 86 | +the write path. |
| 87 | + |
| 88 | +## What I'm NOT shipping yet |
| 89 | + |
| 90 | +v3.13.3 is on deck but held until I have the numbers. The bundle |
| 91 | +includes: |
| 92 | + |
| 93 | +- Pipeline → wiki/memory/KG integration (auto-wire on SessionStart, |
| 94 | + incremental detect_changes on file edits, graph-TTL background |
| 95 | + re-analyze). |
| 96 | +- Doc grooming: wiki templates per kind (ADR/specs/guides/…), |
| 97 | + naming-convention regex, deterministic auditor, and a |
| 98 | + `cortex-wiki-groomer` sub-agent that rewrites pages to template |
| 99 | + without deleting content. |
| 100 | +- Plain-language `/wiki/README.md` generator (readable by non-tech |
| 101 | + stakeholders; tech detail stays in `.generated/INDEX.md` and the |
| 102 | + templated pages). |
| 103 | +- O2 (`schema_acceleration.ratio_defined` + `reason_for_undefined`). |
| 104 | +- O3 (`forgetting_curve.fit_quality` ∈ `poor/weak/good/insufficient/ |
| 105 | + degenerate`). |
| 106 | + |
| 107 | +All are orthogonal to O1 and test-green (2500+ passing), so the tag |
| 108 | +is just waiting on the O1 write-path decision. |
| 109 | + |
| 110 | +## Ask |
| 111 | + |
| 112 | +When you next run `consolidate` on your 66 k store (whenever you'd |
| 113 | +normally do so — no rush), please share the `homeostatic` block from |
| 114 | +the output. The three new fields will tell me whether the fix is |
| 115 | +observability (option 1) or the write path (option 2). |
| 116 | + |
| 117 | +Cheers. |
0 commit comments