Skip to content

feat: v4.3.1 — planning/reuse quality + gap-closure (partial)#42

Merged
thedoublejay merged 31 commits into
mainfrom
feat/search-recall-or-fallback
Jun 3, 2026
Merged

feat: v4.3.1 — planning/reuse quality + gap-closure (partial)#42
thedoublejay merged 31 commits into
mainfrom
feat/search-recall-or-fallback

Conversation

@thedoublejay
Copy link
Copy Markdown
Owner

@thedoublejay thedoublejay commented Jun 2, 2026

Summary

Planning- and reuse-quality work toward v4.3.1, plus the first batch of the v4.3.1 gap-closure backlog. Opened as a draft — this is a reviewable checkpoint, not the full backlog. The branch carries the prior WS-1/WS-2/WS-3 + A13 work (already committed) and this session's additions on top.

This release is versioned 4.3.1 (app, Cargo workspace, internal crate deps, website metadata, landing stamps, and changelog all synced).

Root Cause / Context

Driven by two triple-reviewed plans:

  • 2026-06-02-v4.3.0-implementation-plan.md (§0.5 authoritative) — recall → reuse → planning rails.
  • 2026-06-02-v4.3.1-gap-closure-plan.md — closes recorded braingent misses (lock-degradation, display-ownership, mongo/Atlas traps, polyrepo review).

The core v4.3.0 diagnosis (search is conjunctive-by-default → reuse search returns empty) and the v4.3.1 root-causes (redb exclusive-on-open lock; semantically-orphaned gateway node) are code-grounded in those plans.

Key Decisions

  • Did not rush the large, schema-changing waves. WS-5/6/7 (4 NodeKinds + ~16 EdgeKinds), WS-8 DF4, embeddings (WS-12), polyrepo materialization (WS-G1) are deferred. Per §0.5.1 the R7 snapshot ratchet had to land before any schema work; it now has (this PR), so the schema wave is unblocked but remains its own multi-day effort. Quality bar over volume.
  • Mongo detectors ship as a pure module over a JSON-shaped mongo op (stable rule IDs + confidence). Wiring the parser's pipeline extraction into them is the heavier follow-on the plan lists separately (MQS1 key files).
  • plan_change contract bumped to schema v2 for the new display_ownership_checks section; the G7 gate asserts the exact manifest, so the bump is deterministic.
  • REL1 exit code = 75 (EX_TEMPFAIL), distinct from generic failure (1), so a blocked read can never look like "found nothing".

What landed this session (all TDD, green)

ID Change Plan
B8/B9 traverse multi-path provenance + depth_capped/truncated signaling v4.3.0 WS-4
A13 Query-time index freshness surfaced per-repo in gather-step status v4.3.0 WS-4
REL1 Distinct lock-contention exit code (75) + degraded: graph_locked JSON disclosure v4.3.1 WS-G2
DSO1 display_ownership_checks planning section in plan_change v4.3.1 WS-G4
MQS1–3, AIX1 Mongo/Atlas structural detectors (index-defeat, unsafe-coercion, null-parent-path, atlas-index-drift) v4.3.1 WS-G3
B1/B3/WS-16 pass_two_gap_dimensions (8) + v1_completeness_checklist (19, V1–V9 folded) in plan_change (schema v3) v4.3.0 WS-3/16
REL4 Lock-free reads: graph snapshot fallback when the store is held (daemon-attach already existed) v4.3.1 WS-G2
R7 Resolution snapshot ratchet: golden test fails on any resolved-edge drift; unblocks the schema wave v4.3.0 WS-11
A6 Shared-component reuse-opportunity audit (design-system fork detection) v4.3.0 WS-5
P4 Cross-repo cycle detection (Tarjan SCC) v4.3.0 WS-9
F7 Mock/fixture-import-into-production detector v4.3.0 WS-6
wiring A6/P4/F7 surfaced as non-gating advisories in gather-step doctor
Version + changelog sync to 4.3.1 release

Already on the branch from prior sessions: S1/S2/S4 recall, A14 confidence filter, S3 graph-ranked reuse, E1 typed plan_change, B7, G7, A13 core.

Deferred (with reasons — not silently dropped)

  • WS-5/6/7 schema (FE component facet, FE-beyond, backend graph), WS-8 DF4/reach, WS-12 embeddings — large; R7 (now landed) is their prerequisite.
  • WS-G1 polyrepo pr-review materialization (PRM1–4) — P0 product blocker but L-effort, touches engine.rs/index_runner.rs; needs its own PR + parity coverage.
  • GWP1–3 gateway-aware planning — GWP1 cross-repo edge is M and depends on v4.3.0 D1; GWP2/3 follow.
  • A18/A20 (unresolved-call debt, dangling-target detection), REL2/REL3 — self-contained follow-ons.

Files Changed

  • crates/gather-step-analysis/src/query.rs — traverse provenance/caps (B8/B9)
  • crates/gather-step-analysis/src/mongo_query_safety.rs (new) + lib.rs — MQS1–3, AIX1
  • crates/gather-step-cli/src/commands/status.rs — A13 freshness column
  • crates/gather-step-cli/src/errors.rs, main.rs — REL1
  • crates/gather-step-mcp/src/tools/packs.rs, catalog.rs, server.rs — DSO1 + contract v2
  • Cargo.toml, Cargo.lock, website/** — 4.3.1 version + changelog sync

Test Plan

  • cargo test --workspace1557 passed, 12 ignored (51 suites).
  • cargo clippy --workspace --all-targets -- -D warningsclean.
  • cargo fmt --all --check → clean.
  • Each feature added failing tests first (TDD): traverse depth-cap/provenance, A13 freshness label + payload field, REL1 lock predicate + disclosure, DSO1 section presence/absence, MQS1–3 + AIX1 against the GO4 trap shapes + clean siblings.

Follow-ups

  1. WS-5/6/7 schema wave (now unblocked by the R7 ratchet).
  2. WS-G1 polyrepo pr-review materialization (P0 blocker, own PR).
  3. GWP1–3 gateway planning.
  4. Mongo/Atlas detectors (MQS1–3, AIX1) are built but cannot be surfaced yet: the parser does not extract mongo query/pipeline ASTs or Atlas index definitions into the graph, so there is no input to feed them. That parser extraction is the follow-on (A6/P4/F7 are already wired into doctor).

🤖 Generated with Claude Code

Search used set_conjunction_by_default() for both the exact and fuzzy
passes, so a multi-word capability query ("email notification delivery")
returned nothing whenever a symbol covered only some terms — the root
cause behind empty reuse-discovery results (WS-1 / S1).

Add a third fallback pass that fires only when both conjunctive passes
return empty: each whitespace-separated word becomes a Should clause and
a hit must match a majority of them (ceil(n/2)) via
BooleanQuery::with_minimum_required_clauses.

Precision guard: the fallback is gated on the original query containing
two or more words. A single identifier like createOrderUseCase is one
word (even though it camelCase-splits into several tokens) and must not
fuzzily match token-sharing siblings such as updateOrderUseCase.

Tests (WS-0a): a partial multi-word query recovers via the OR fallback,
and a single-identifier miss does not OR-match a token-sharing sibling.
Add a term-coverage boost to rerank_hits (WS-1 / S2): a hit whose
symbol-name tokens cover more of the query's tokens is ranked higher,
scaling up to +25% at full coverage. This keeps higher-coverage matches
above incidental single-term matches once the disjunction fallback (S1)
widens the candidate set.

Gated to genuine multi-word queries (mirrors the S1 fallback) so a
single identifier is never re-ranked by partial token overlap.

Test: at equal base score, a two-term-coverage hit outranks a
one-term-coverage hit regardless of input order.
Add a curated concept->vocabulary synonym map (WS-1 / S4) so capability
phrases bridge to the identifiers code actually uses. In the disjunction
fallback each word becomes (word OR synonyms) — e.g. "login" also matches
authenticate/authentication — before the majority floor is applied.

The map is intentionally small and high-signal; the original word is
always searched too, so an unmapped word is a no-op. Lookup uses
eq_ignore_ascii_case (no allocation) per the repo's disallowed-methods.

Test: "login workflow" surfaces authenticateUser via the login ->
authenticate bridge, which returns empty without expansion.
Edge confidence was already stored on EdgeMetadata and rendered by trace,
but no traversal could filter on it and the meaning of a None confidence
was undefined (WS-2 / A14).

Add EdgeMetadata::passes_confidence(min): a None confidence is a definite
structural edge (import-resolved call, not a heuristic guess) and passes
any threshold; an explicit confidence is compared against it; a None
threshold disables filtering. This fixes the "structural edges silently
dropped" failure mode a naive `confidence >= min` would cause.

Thread an Option<u16> min_confidence through QueryEngine::traverse,
get_edges, and get_reverse_edges so trace/impact-style traversals can
grade proven vs guessed edges.

Tests: None passes any threshold, explicit-low is filtered, explicit-high
passes; and traversal keeps a structural edge while dropping a low-
confidence sibling under a threshold.
Planning-pack ranking only boosted items that had an evidence chain, so
the canonical reusable symbol did not surface above local one-offs
(WS-2 / S3). Add a graph-derived reuse boost applied before the existing
evidence ranking:

- shared / design-system membership inferred from the file path (+40)
- sibling-consumer count from inbound graph edges (3 pts/consumer, capped
  at 30 consumers so a hub node cannot dominate)

reuse_evidence_boost is a pure, unit-tested helper; apply_reuse_evidence_
ranking wires it to the graph at the planning call site. This is the
ranking half of "is this already a reusable component?" — it runs after
node rehydration at the pack layer, not inside the lexical reranker.

Test: a shared + widely-consumed symbol outranks a local one-off; more
consumers yield a larger boost.
Tightens WS-2 reuse ranking after review found the boost could not change
which items surface.

- HIGH: reuse ranking ran after items.truncate(limit), so a canonical
  reusable symbol below the base-score cut was discarded before the boost
  applied. Score reuse on a bounded window (limit*5) BEFORE the cut and
  re-sort, so it can be promoted into the pack, not merely reordered.
- consumer count now excludes structural Defines/Imports edges and counts
  distinct source nodes, mirroring the resolution scorer — raw inbound
  edge volume overstated reuse.
- deterministic search rerank: equal adjusted scores now tie-break on
  symbol name, stored path, then node id instead of Tantivy doc order.

Refactor: extract sort_pack_items for the shared pack-item comparator.

Tests: a shared, widely-consumed symbol with a lower base score is
promoted past a truncate-to-1; equal-score rerank hits sort stably.
plan_change was a bare alias for planning_pack, returning a
ContextPackResponse (WS-3 / E1). Make it a distinct typed product:

- PlanChangeResponse with the nine fixed sections (reuse_candidates,
  sibling_clone_targets, standards_to_preserve, integration_checks,
  cross_repo_reachability, write_path_or_state_machine_risks,
  required_braingent_records, open_unknowns, verification_plan) — every
  section always present so the contract is stable for consumers and the
  G7 gate.
- build_plan_change projects the planning-pack data into those sections
  (reuse_candidates = shared-module members surfaced by S3 ranking;
  sibling_clone_targets = local related items; cross_repo_reachability =
  planning proofs; open_unknowns = unresolved gaps). Pure over its inputs
  for unit testing.
- plan_change_tool now returns Json<PlanChangeResponse> via run_plan_change.

Sections needing proactive queries (B7) and evidentiary fields (G7) are
populated by later WS-3 slices; they ship as empty-but-present for now.

Test: the projection routes shared vs local items, gaps, and proofs into
the correct sections and excludes the target item.
The typed plan_change sections integration_checks and
write_path_or_state_machine_risks shipped empty (WS-3 / B7). Surface the
change-impact evidence the planning pack already gathers into them:

- integration_checks: confirmed downstream consumers ("verify X still
  works") and probable downstream repos ("check X, partial evidence").
- write_path_or_state_machine_risks: cross-repo callers whose contract
  must be preserved, unresolved-possible impacts to confirm, and a note
  when downstream fan-out was capped.

Kept as a pure projection over ChangeImpactSummary so it stays unit-
testable; threaded through run_plan_change from the pack data.

Test: confirmed downstream + cross-repo caller populate the two sections;
standards_to_preserve / required_braingent_records remain empty for G7.
Make the plan_change contract evidentiary, not just structural (WS-3 / G7):

- PlanChangeContract carries deterministic metadata — a schema version
  (PLAN_CHANGE_SCHEMA_VERSION) and the fixed nine-section manifest in
  canonical order — plus an exclusion_ledger recording what was dropped
  (downstream fan-out caps, planning warnings) so a consumer never reads a
  capped/filtered result as exhaustive.
- validate_plan_change_contract is the gate: it fails on a stale schema
  version or a mangled/incomplete section manifest.

Rule provenance and requirements traceability are intentionally deferred
to a follow-up workstream (they need a rule-id registry and AC ingestion
that don't exist yet).

Tests: a freshly built product passes the gate deterministically; the
gate fails on a bumped schema version and on a popped section; the
exclusion ledger records a capped fan-out.
The direct MCP route used run_plan_change (typed nine-section product),
but the batch_query dispatcher still routed both planning_pack and
plan_change to planning_pack_tool, so batched callers silently got the
legacy ContextPackResponse shape — a split-brain contract.

- Split the composite dispatcher arm: plan_change now calls
  run_plan_change; planning_pack keeps planning_pack_tool.
- Update the tool catalog: plan_change is the typed plan-change product,
  not an "alias for planning-oriented context".

Test: a batch_query plan_change op returns reuse_candidates and the
contract section manifest (keys absent from ContextPackResponse), proving
the batch and direct routes now agree.
The index records last_commit_sha per repo and head_sha() resolves the
current HEAD, but nothing compared them at query time, so a trace/search
could silently answer about code that no longer matches the working tree
(WS-4 / A13). Storage and HEAD resolution already exist; this adds the
comparison.

- IndexFreshness { Fresh | Stale { indexed_sha, head_sha } | NeverIndexed }
  — a query-time freshness verdict, distinct from HistorySyncOutcome
  (which describes an indexing run).
- classify_freshness(indexed_sha, head_sha): pure, unit-testable.
- GitHistoryIndexer::index_freshness(indexed_sha): resolves HEAD and
  classifies. Both re-exported at the crate root.

Test: matching SHA => Fresh; older indexed SHA => Stale (carrying both);
no recorded SHA => NeverIndexed.
Follow-up review pass on the recall + reuse + plan_change work.

Correctness:
- plan_change exclusion_ledger now records standards_to_preserve and
  required_braingent_records as "not yet computed", so an empty section
  is never read as "nothing applies"; stale "arrive with G7" comment fixed.
- Synonym table is now symmetric concept groups, so recall no longer
  depends on which side of a pair the user typed (login <-> authenticate).
- is_shared_module_path markers made language-agnostic (common/, /lib/,
  libs/, internal/, /pkg/, packages/) so non-JS layouts aren't mis-bucketed.

Cleanup / perf:
- Extracted EdgeKind::is_consumer_edge() in core; replaced the four
  duplicated `!matches!(.., Defines | Imports)` filters (packs x3, anchor).
- Reuse boost precomputes distinct consumer counts in one pre-pass
  (deduped by symbol_id) instead of per-item graph I/O in the loop, and
  re-sorts only the boosted window (bounded) instead of the full vec.

Docs (explicit reviewer decisions):
- limit*5 reuse window is a documented recall ceiling for high fan-out.
- exact-match boost is inert for multi-word queries (coverage ranks them).
- ceil(n/2)=1 partial-match recall for 2-word fallback is intentional.

Tests: symmetric-synonym reverse bridge; ledger records not-yet-computed
sections.

Not changed: threading Canonical onto PackItem (#6-full) — canonical_for_
node yields cross-repo identity, not a reusable-module signal, so it is the
wrong input for reuse_candidates; a true classifier is the deferred A6 work.
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented Jun 2, 2026

Deploying gather-step with  Cloudflare Pages  Cloudflare Pages

Latest commit: 94496e6
Status: ✅  Deploy successful!
Preview URL: https://1b3a73df.gather-step.pages.dev
Branch Preview URL: https://feat-search-recall-or-fallba.gather-step.pages.dev

View logs

@thedoublejay thedoublejay marked this pull request as ready for review June 3, 2026 07:26
@thedoublejay thedoublejay merged commit b867ace into main Jun 3, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant