Skip to content

Commit 97010a9

Browse files
cdeustclaude
andcommitted
docs(darval): github issue #14 O1 instrumentation ask
Posts the diagnosis-so-far and asks darval for the heat_delta numbers from his next consolidate run so we can choose between: (1) cohort_correction works on ranking, bimodality is a laggy metric (2) write path is suppressed, recall improvement came from elsewhere Holds v3.13.3 tag until the decision is made. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent ef9a988 commit 97010a9

1 file changed

Lines changed: 117 additions & 0 deletions

File tree

Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
# Draft reply to darval — issue #14 (O1 instrumentation ask)
2+
3+
**Instructions**: review before posting. Target comment at
4+
https://github.com/cdeust/Cortex/issues/14
5+
6+
---
7+
8+
Quick update on **O1** from your 3.13.2 report — the one where
9+
`cohort_correction` moved bimodality 0.07% in 336 s while recall
10+
quality improved dramatically. I ran the diagnosis locally before
11+
shipping a fix, and the answer is not obvious from in-process
12+
simulation alone. I'd like to read your next `consolidate` output
13+
with the new instrumentation I just merged to `main`.
14+
15+
## What I found so far
16+
17+
I simulated several distribution shapes that match your reported
18+
stats (`mean=0.6487, std=0.3162, bimodality=0.8433, cohort_size=33604`):
19+
20+
| Simulated distribution | Δ bimodality per cycle |
21+
|---|---|
22+
| Wide two-peak (σ=0.08 around each mode) | −0.022 |
23+
| Narrow hot peak at 0.98 + wide cold tail | −0.046 |
24+
| Three-mode (0.98/0.5/0.2) | −0.014 |
25+
| Saturated hot peak + uniform cold tail | −0.024 |
26+
27+
Every reasonable reconstruction of your numbers shows the correction
28+
should move bimodality by **1.4–4.6 percentage points per cycle** at
29+
the default `correction_strength=0.3`. You saw **0.07 pp** — at least
30+
20× less than expected.
31+
32+
Two live hypotheses:
33+
34+
1. **The correction IS moving per-row heat** (which is what actually
35+
drives WRRF ranking), but the bimodality metric is a poor index of
36+
that — it measures global distribution shape, not per-row moves.
37+
This would explain why recall improved dramatically while the
38+
metric barely moved.
39+
40+
2. **Something is suppressing per-row writes** — e.g. protected/stale
41+
filter, pool-connection race, a silent fallback path I'm missing.
42+
This would be a real bug and the recall improvement came from
43+
somewhere else entirely (reranker, query dispatch, new heat-weight
44+
mix).
45+
46+
Without per-row movement data from your production distribution I
47+
can't choose between them.
48+
49+
## What I shipped to `main` (not tagged yet)
50+
51+
Commit [`ae6f280`](https://github.com/cdeust/Cortex/commit/ae6f280)
52+
adds three new fields to the homeostatic cycle output:
53+
54+
```json
55+
"homeostatic": {
56+
"scaling_kind": "cohort_correction",
57+
"cohort_size": 33604,
58+
"bimodality_before": 0.8433,
59+
"bimodality_after": 0.8427,
60+
61+
"cohort_mean_heat_delta": 0.1234, // NEW
62+
"cohort_max_heat_delta": 0.1650, // NEW
63+
"cohort_rows_written": 33600 // NEW
64+
}
65+
```
66+
67+
These let us see per-row movement directly without inferring it from
68+
a shape metric.
69+
70+
**Expected values for hypothesis (1)**: your cohort members have
71+
heat ≈ 0.93 pre-correction. With default `strength=0.3` and
72+
`target=0.4`, each drops by `0.3 × (0.93 − 0.4) = 0.159`. So:
73+
74+
* `cohort_mean_heat_delta`**0.13–0.17** (depending on the hot-peak
75+
shape)
76+
* `cohort_max_heat_delta`**0.18** (for memories near heat=1.0)
77+
* `cohort_rows_written``cohort_size` (every cohort member > 0.001
78+
delta → every one writes)
79+
80+
**If these match your expected values**, cohort_correction is doing
81+
its job on ranking — the fix is to add a better retrieval-relevant
82+
health metric, not to change the correction behaviour.
83+
84+
**If `cohort_mean_heat_delta` is close to 0 or `cohort_rows_written`
85+
is much less than `cohort_size`**, there's a real bug and I'll fix
86+
the write path.
87+
88+
## What I'm NOT shipping yet
89+
90+
v3.13.3 is on deck but held until I have the numbers. The bundle
91+
includes:
92+
93+
- Pipeline → wiki/memory/KG integration (auto-wire on SessionStart,
94+
incremental detect_changes on file edits, graph-TTL background
95+
re-analyze).
96+
- Doc grooming: wiki templates per kind (ADR/specs/guides/…),
97+
naming-convention regex, deterministic auditor, and a
98+
`cortex-wiki-groomer` sub-agent that rewrites pages to template
99+
without deleting content.
100+
- Plain-language `/wiki/README.md` generator (readable by non-tech
101+
stakeholders; tech detail stays in `.generated/INDEX.md` and the
102+
templated pages).
103+
- O2 (`schema_acceleration.ratio_defined` + `reason_for_undefined`).
104+
- O3 (`forgetting_curve.fit_quality` ∈ `poor/weak/good/insufficient/
105+
degenerate`).
106+
107+
All are orthogonal to O1 and test-green (2500+ passing), so the tag
108+
is just waiting on the O1 write-path decision.
109+
110+
## Ask
111+
112+
When you next run `consolidate` on your 66 k store (whenever you'd
113+
normally do so — no rush), please share the `homeostatic` block from
114+
the output. The three new fields will tell me whether the fix is
115+
observability (option 1) or the write path (option 2).
116+
117+
Cheers.

0 commit comments

Comments
 (0)