feat: v4.3.1 — planning/reuse quality + gap-closure (partial) by thedoublejay · Pull Request #42 · thedoublejay/gather-step

thedoublejay · 2026-06-02T16:27:30Z

Summary

Planning- and reuse-quality work toward v4.3.1, plus the first batch of the v4.3.1 gap-closure backlog. Opened as a draft — this is a reviewable checkpoint, not the full backlog. The branch carries the prior WS-1/WS-2/WS-3 + A13 work (already committed) and this session's additions on top.

This release is versioned 4.3.1 (app, Cargo workspace, internal crate deps, website metadata, landing stamps, and changelog all synced).

Root Cause / Context

Driven by two triple-reviewed plans:

2026-06-02-v4.3.0-implementation-plan.md (§0.5 authoritative) — recall → reuse → planning rails.
2026-06-02-v4.3.1-gap-closure-plan.md — closes recorded braingent misses (lock-degradation, display-ownership, mongo/Atlas traps, polyrepo review).

The core v4.3.0 diagnosis (search is conjunctive-by-default → reuse search returns empty) and the v4.3.1 root-causes (redb exclusive-on-open lock; semantically-orphaned gateway node) are code-grounded in those plans.

Key Decisions

Did not rush the large, schema-changing waves. WS-5/6/7 (4 NodeKinds + ~16 EdgeKinds), WS-8 DF4, embeddings (WS-12), polyrepo materialization (WS-G1) are deferred. Per §0.5.1 the R7 snapshot ratchet had to land before any schema work; it now has (this PR), so the schema wave is unblocked but remains its own multi-day effort. Quality bar over volume.
Mongo detectors ship as a pure module over a JSON-shaped mongo op (stable rule IDs + confidence). Wiring the parser's pipeline extraction into them is the heavier follow-on the plan lists separately (MQS1 key files).
plan_change contract bumped to schema v2 for the new display_ownership_checks section; the G7 gate asserts the exact manifest, so the bump is deterministic.
REL1 exit code = 75 (EX_TEMPFAIL), distinct from generic failure (1), so a blocked read can never look like "found nothing".

What landed this session (all TDD, green)

ID	Change	Plan
B8/B9	`traverse` multi-path provenance + `depth_capped`/`truncated` signaling	v4.3.0 WS-4
A13	Query-time index freshness surfaced per-repo in `gather-step status`	v4.3.0 WS-4
REL1	Distinct lock-contention exit code (75) + `degraded: graph_locked` JSON disclosure	v4.3.1 WS-G2
DSO1	`display_ownership_checks` planning section in `plan_change`	v4.3.1 WS-G4
MQS1–3, AIX1	Mongo/Atlas structural detectors (index-defeat, unsafe-coercion, null-parent-path, atlas-index-drift)	v4.3.1 WS-G3
B1/B3/WS-16	`pass_two_gap_dimensions` (8) + `v1_completeness_checklist` (19, V1–V9 folded) in `plan_change` (schema v3)	v4.3.0 WS-3/16
REL4	Lock-free reads: graph snapshot fallback when the store is held (daemon-attach already existed)	v4.3.1 WS-G2
R7	Resolution snapshot ratchet: golden test fails on any resolved-edge drift; unblocks the schema wave	v4.3.0 WS-11
A6	Shared-component reuse-opportunity audit (design-system fork detection)	v4.3.0 WS-5
P4	Cross-repo cycle detection (Tarjan SCC)	v4.3.0 WS-9
F7	Mock/fixture-import-into-production detector	v4.3.0 WS-6
wiring	A6/P4/F7 surfaced as non-gating advisories in `gather-step doctor`	—
—	Version + changelog sync to 4.3.1	release

Already on the branch from prior sessions: S1/S2/S4 recall, A14 confidence filter, S3 graph-ranked reuse, E1 typed plan_change, B7, G7, A13 core.

Deferred (with reasons — not silently dropped)

WS-5/6/7 schema (FE component facet, FE-beyond, backend graph), WS-8 DF4/reach, WS-12 embeddings — large; R7 (now landed) is their prerequisite.
WS-G1 polyrepo pr-review materialization (PRM1–4) — P0 product blocker but L-effort, touches engine.rs/index_runner.rs; needs its own PR + parity coverage.
GWP1–3 gateway-aware planning — GWP1 cross-repo edge is M and depends on v4.3.0 D1; GWP2/3 follow.
A18/A20 (unresolved-call debt, dangling-target detection), REL2/REL3 — self-contained follow-ons.

Files Changed

crates/gather-step-analysis/src/query.rs — traverse provenance/caps (B8/B9)
crates/gather-step-analysis/src/mongo_query_safety.rs (new) + lib.rs — MQS1–3, AIX1
crates/gather-step-cli/src/commands/status.rs — A13 freshness column
crates/gather-step-cli/src/errors.rs, main.rs — REL1
crates/gather-step-mcp/src/tools/packs.rs, catalog.rs, server.rs — DSO1 + contract v2
Cargo.toml, Cargo.lock, website/** — 4.3.1 version + changelog sync

Test Plan

cargo test --workspace → 1557 passed, 12 ignored (51 suites).
cargo clippy --workspace --all-targets -- -D warnings → clean.
cargo fmt --all --check → clean.
Each feature added failing tests first (TDD): traverse depth-cap/provenance, A13 freshness label + payload field, REL1 lock predicate + disclosure, DSO1 section presence/absence, MQS1–3 + AIX1 against the GO4 trap shapes + clean siblings.

Follow-ups

WS-5/6/7 schema wave (now unblocked by the R7 ratchet).
WS-G1 polyrepo pr-review materialization (P0 blocker, own PR).
GWP1–3 gateway planning.
Mongo/Atlas detectors (MQS1–3, AIX1) are built but cannot be surfaced yet: the parser does not extract mongo query/pipeline ASTs or Atlas index definitions into the graph, so there is no input to feed them. That parser extraction is the follow-on (A6/P4/F7 are already wired into doctor).

🤖 Generated with Claude Code

Search used set_conjunction_by_default() for both the exact and fuzzy passes, so a multi-word capability query ("email notification delivery") returned nothing whenever a symbol covered only some terms — the root cause behind empty reuse-discovery results (WS-1 / S1). Add a third fallback pass that fires only when both conjunctive passes return empty: each whitespace-separated word becomes a Should clause and a hit must match a majority of them (ceil(n/2)) via BooleanQuery::with_minimum_required_clauses. Precision guard: the fallback is gated on the original query containing two or more words. A single identifier like createOrderUseCase is one word (even though it camelCase-splits into several tokens) and must not fuzzily match token-sharing siblings such as updateOrderUseCase. Tests (WS-0a): a partial multi-word query recovers via the OR fallback, and a single-identifier miss does not OR-match a token-sharing sibling.

Add a term-coverage boost to rerank_hits (WS-1 / S2): a hit whose symbol-name tokens cover more of the query's tokens is ranked higher, scaling up to +25% at full coverage. This keeps higher-coverage matches above incidental single-term matches once the disjunction fallback (S1) widens the candidate set. Gated to genuine multi-word queries (mirrors the S1 fallback) so a single identifier is never re-ranked by partial token overlap. Test: at equal base score, a two-term-coverage hit outranks a one-term-coverage hit regardless of input order.

Add a curated concept->vocabulary synonym map (WS-1 / S4) so capability phrases bridge to the identifiers code actually uses. In the disjunction fallback each word becomes (word OR synonyms) — e.g. "login" also matches authenticate/authentication — before the majority floor is applied. The map is intentionally small and high-signal; the original word is always searched too, so an unmapped word is a no-op. Lookup uses eq_ignore_ascii_case (no allocation) per the repo's disallowed-methods. Test: "login workflow" surfaces authenticateUser via the login -> authenticate bridge, which returns empty without expansion.

Edge confidence was already stored on EdgeMetadata and rendered by trace, but no traversal could filter on it and the meaning of a None confidence was undefined (WS-2 / A14). Add EdgeMetadata::passes_confidence(min): a None confidence is a definite structural edge (import-resolved call, not a heuristic guess) and passes any threshold; an explicit confidence is compared against it; a None threshold disables filtering. This fixes the "structural edges silently dropped" failure mode a naive `confidence >= min` would cause. Thread an Option<u16> min_confidence through QueryEngine::traverse, get_edges, and get_reverse_edges so trace/impact-style traversals can grade proven vs guessed edges. Tests: None passes any threshold, explicit-low is filtered, explicit-high passes; and traversal keeps a structural edge while dropping a low- confidence sibling under a threshold.

Planning-pack ranking only boosted items that had an evidence chain, so the canonical reusable symbol did not surface above local one-offs (WS-2 / S3). Add a graph-derived reuse boost applied before the existing evidence ranking: - shared / design-system membership inferred from the file path (+40) - sibling-consumer count from inbound graph edges (3 pts/consumer, capped at 30 consumers so a hub node cannot dominate) reuse_evidence_boost is a pure, unit-tested helper; apply_reuse_evidence_ ranking wires it to the graph at the planning call site. This is the ranking half of "is this already a reusable component?" — it runs after node rehydration at the pack layer, not inside the lexical reranker. Test: a shared + widely-consumed symbol outranks a local one-off; more consumers yield a larger boost.

Tightens WS-2 reuse ranking after review found the boost could not change which items surface. - HIGH: reuse ranking ran after items.truncate(limit), so a canonical reusable symbol below the base-score cut was discarded before the boost applied. Score reuse on a bounded window (limit*5) BEFORE the cut and re-sort, so it can be promoted into the pack, not merely reordered. - consumer count now excludes structural Defines/Imports edges and counts distinct source nodes, mirroring the resolution scorer — raw inbound edge volume overstated reuse. - deterministic search rerank: equal adjusted scores now tie-break on symbol name, stored path, then node id instead of Tantivy doc order. Refactor: extract sort_pack_items for the shared pack-item comparator. Tests: a shared, widely-consumed symbol with a lower base score is promoted past a truncate-to-1; equal-score rerank hits sort stably.

plan_change was a bare alias for planning_pack, returning a ContextPackResponse (WS-3 / E1). Make it a distinct typed product: - PlanChangeResponse with the nine fixed sections (reuse_candidates, sibling_clone_targets, standards_to_preserve, integration_checks, cross_repo_reachability, write_path_or_state_machine_risks, required_braingent_records, open_unknowns, verification_plan) — every section always present so the contract is stable for consumers and the G7 gate. - build_plan_change projects the planning-pack data into those sections (reuse_candidates = shared-module members surfaced by S3 ranking; sibling_clone_targets = local related items; cross_repo_reachability = planning proofs; open_unknowns = unresolved gaps). Pure over its inputs for unit testing. - plan_change_tool now returns Json<PlanChangeResponse> via run_plan_change. Sections needing proactive queries (B7) and evidentiary fields (G7) are populated by later WS-3 slices; they ship as empty-but-present for now. Test: the projection routes shared vs local items, gaps, and proofs into the correct sections and excludes the target item.

The typed plan_change sections integration_checks and write_path_or_state_machine_risks shipped empty (WS-3 / B7). Surface the change-impact evidence the planning pack already gathers into them: - integration_checks: confirmed downstream consumers ("verify X still works") and probable downstream repos ("check X, partial evidence"). - write_path_or_state_machine_risks: cross-repo callers whose contract must be preserved, unresolved-possible impacts to confirm, and a note when downstream fan-out was capped. Kept as a pure projection over ChangeImpactSummary so it stays unit- testable; threaded through run_plan_change from the pack data. Test: confirmed downstream + cross-repo caller populate the two sections; standards_to_preserve / required_braingent_records remain empty for G7.

Make the plan_change contract evidentiary, not just structural (WS-3 / G7): - PlanChangeContract carries deterministic metadata — a schema version (PLAN_CHANGE_SCHEMA_VERSION) and the fixed nine-section manifest in canonical order — plus an exclusion_ledger recording what was dropped (downstream fan-out caps, planning warnings) so a consumer never reads a capped/filtered result as exhaustive. - validate_plan_change_contract is the gate: it fails on a stale schema version or a mangled/incomplete section manifest. Rule provenance and requirements traceability are intentionally deferred to a follow-up workstream (they need a rule-id registry and AC ingestion that don't exist yet). Tests: a freshly built product passes the gate deterministically; the gate fails on a bumped schema version and on a popped section; the exclusion ledger records a capped fan-out.

The direct MCP route used run_plan_change (typed nine-section product), but the batch_query dispatcher still routed both planning_pack and plan_change to planning_pack_tool, so batched callers silently got the legacy ContextPackResponse shape — a split-brain contract. - Split the composite dispatcher arm: plan_change now calls run_plan_change; planning_pack keeps planning_pack_tool. - Update the tool catalog: plan_change is the typed plan-change product, not an "alias for planning-oriented context". Test: a batch_query plan_change op returns reuse_candidates and the contract section manifest (keys absent from ContextPackResponse), proving the batch and direct routes now agree.

The index records last_commit_sha per repo and head_sha() resolves the current HEAD, but nothing compared them at query time, so a trace/search could silently answer about code that no longer matches the working tree (WS-4 / A13). Storage and HEAD resolution already exist; this adds the comparison. - IndexFreshness { Fresh | Stale { indexed_sha, head_sha } | NeverIndexed } — a query-time freshness verdict, distinct from HistorySyncOutcome (which describes an indexing run). - classify_freshness(indexed_sha, head_sha): pure, unit-testable. - GitHistoryIndexer::index_freshness(indexed_sha): resolves HEAD and classifies. Both re-exported at the crate root. Test: matching SHA => Fresh; older indexed SHA => Stale (carrying both); no recorded SHA => NeverIndexed.

Follow-up review pass on the recall + reuse + plan_change work. Correctness: - plan_change exclusion_ledger now records standards_to_preserve and required_braingent_records as "not yet computed", so an empty section is never read as "nothing applies"; stale "arrive with G7" comment fixed. - Synonym table is now symmetric concept groups, so recall no longer depends on which side of a pair the user typed (login <-> authenticate). - is_shared_module_path markers made language-agnostic (common/, /lib/, libs/, internal/, /pkg/, packages/) so non-JS layouts aren't mis-bucketed. Cleanup / perf: - Extracted EdgeKind::is_consumer_edge() in core; replaced the four duplicated `!matches!(.., Defines | Imports)` filters (packs x3, anchor). - Reuse boost precomputes distinct consumer counts in one pre-pass (deduped by symbol_id) instead of per-item graph I/O in the loop, and re-sorts only the boosted window (bounded) instead of the full vec. Docs (explicit reviewer decisions): - limit*5 reuse window is a documented recall ceiling for high fan-out. - exact-match boost is inert for multi-word queries (coverage ranks them). - ceil(n/2)=1 partial-match recall for 2-word fallback is intentional. Tests: symmetric-synonym reverse bridge; ledger records not-yet-computed sections. Not changed: threading Canonical onto PackItem (#6-full) — canonical_for_ node yields cross-repo identity, not a reusable-module signal, so it is the wrong input for reuse_candidates; a true classifier is the deferred A6 work.

cloudflare-workers-and-pages · 2026-06-02T16:27:37Z

Deploying gather-step with Cloudflare Pages

Latest commit:	`94496e6`
Status:	✅ Deploy successful!
Preview URL:	https://1b3a73df.gather-step.pages.dev
Branch Preview URL:	https://feat-search-recall-or-fallba.gather-step.pages.dev

View logs

…S-16)

…tions

thedoublejay added 18 commits June 2, 2026 19:49

feat: traverse provenance + depth/fan-out cap signaling (B8/B9)

24bf647

feat: surface A13 index freshness in status output

6febce8

feat: distinct exit code + json disclosure for graph lock (REL1)

dd76761

feat: display-ownership planning section in plan_change (DSO1)

25842e5

feat: mongo/atlas structural safety detectors (MQS1/MQS2/MQS3)

2cfe371

chore: prepare 4.3.1 release

a23d103

thedoublejay added 11 commits June 2, 2026 23:29

feat: atlas index/doc-field drift detector (AIX1)

c57933d

feat: pass-2 gap + v1-completeness checklists in plan_change (B1/B3/W…

7f1f171

…S-16)

test: e2e graph-lock (REL1), fan-out cap, A13 status, plan_change sec…

a1119aa

…tions

refactor: drop planning comments and product-specific references

fe30b91

feat: fall back to a read-only graph snapshot when the store is locked

6305f36

docs: note lock-free snapshot reads in 4.3.1 changelog

a24cec4

test: resolution snapshot ratchet guarding against silent edge drift

a837afa

feat: shared-component reuse-opportunity analysis (design-system audit)

63f8fb8

feat: cross-repo cycle detection via Tarjan SCC

018fda8

feat: detect mock/fixture imports leaking into production modules

bf4bc75

feat: surface cycle, mock-leakage, and reuse advisories in doctor

a3ea340

thedoublejay added 2 commits June 3, 2026 10:22

docs: note doctor code-quality advisories in 4.3.1 changelog

0c73181

fix: correct 'mis-read' typo failing the spell-check lint

94496e6

thedoublejay marked this pull request as ready for review June 3, 2026 07:26

thedoublejay merged commit b867ace into main Jun 3, 2026
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: v4.3.1 — planning/reuse quality + gap-closure (partial)#42

feat: v4.3.1 — planning/reuse quality + gap-closure (partial)#42
thedoublejay merged 31 commits into
mainfrom
feat/search-recall-or-fallback

thedoublejay commented Jun 2, 2026 •

edited

Loading

Uh oh!

cloudflare-workers-and-pages Bot commented Jun 2, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

thedoublejay commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root Cause / Context

Key Decisions

What landed this session (all TDD, green)

Deferred (with reasons — not silently dropped)

Files Changed

Test Plan

Follow-ups

Uh oh!

cloudflare-workers-and-pages Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying gather-step with Cloudflare Pages

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

thedoublejay commented Jun 2, 2026 •

edited

Loading

cloudflare-workers-and-pages Bot commented Jun 2, 2026 •

edited

Loading