feat: v4.3.1 — planning/reuse quality + gap-closure (partial)#42
Merged
Conversation
Search used set_conjunction_by_default() for both the exact and fuzzy
passes, so a multi-word capability query ("email notification delivery")
returned nothing whenever a symbol covered only some terms — the root
cause behind empty reuse-discovery results (WS-1 / S1).
Add a third fallback pass that fires only when both conjunctive passes
return empty: each whitespace-separated word becomes a Should clause and
a hit must match a majority of them (ceil(n/2)) via
BooleanQuery::with_minimum_required_clauses.
Precision guard: the fallback is gated on the original query containing
two or more words. A single identifier like createOrderUseCase is one
word (even though it camelCase-splits into several tokens) and must not
fuzzily match token-sharing siblings such as updateOrderUseCase.
Tests (WS-0a): a partial multi-word query recovers via the OR fallback,
and a single-identifier miss does not OR-match a token-sharing sibling.
Add a term-coverage boost to rerank_hits (WS-1 / S2): a hit whose symbol-name tokens cover more of the query's tokens is ranked higher, scaling up to +25% at full coverage. This keeps higher-coverage matches above incidental single-term matches once the disjunction fallback (S1) widens the candidate set. Gated to genuine multi-word queries (mirrors the S1 fallback) so a single identifier is never re-ranked by partial token overlap. Test: at equal base score, a two-term-coverage hit outranks a one-term-coverage hit regardless of input order.
Add a curated concept->vocabulary synonym map (WS-1 / S4) so capability phrases bridge to the identifiers code actually uses. In the disjunction fallback each word becomes (word OR synonyms) — e.g. "login" also matches authenticate/authentication — before the majority floor is applied. The map is intentionally small and high-signal; the original word is always searched too, so an unmapped word is a no-op. Lookup uses eq_ignore_ascii_case (no allocation) per the repo's disallowed-methods. Test: "login workflow" surfaces authenticateUser via the login -> authenticate bridge, which returns empty without expansion.
Edge confidence was already stored on EdgeMetadata and rendered by trace, but no traversal could filter on it and the meaning of a None confidence was undefined (WS-2 / A14). Add EdgeMetadata::passes_confidence(min): a None confidence is a definite structural edge (import-resolved call, not a heuristic guess) and passes any threshold; an explicit confidence is compared against it; a None threshold disables filtering. This fixes the "structural edges silently dropped" failure mode a naive `confidence >= min` would cause. Thread an Option<u16> min_confidence through QueryEngine::traverse, get_edges, and get_reverse_edges so trace/impact-style traversals can grade proven vs guessed edges. Tests: None passes any threshold, explicit-low is filtered, explicit-high passes; and traversal keeps a structural edge while dropping a low- confidence sibling under a threshold.
Planning-pack ranking only boosted items that had an evidence chain, so the canonical reusable symbol did not surface above local one-offs (WS-2 / S3). Add a graph-derived reuse boost applied before the existing evidence ranking: - shared / design-system membership inferred from the file path (+40) - sibling-consumer count from inbound graph edges (3 pts/consumer, capped at 30 consumers so a hub node cannot dominate) reuse_evidence_boost is a pure, unit-tested helper; apply_reuse_evidence_ ranking wires it to the graph at the planning call site. This is the ranking half of "is this already a reusable component?" — it runs after node rehydration at the pack layer, not inside the lexical reranker. Test: a shared + widely-consumed symbol outranks a local one-off; more consumers yield a larger boost.
Tightens WS-2 reuse ranking after review found the boost could not change which items surface. - HIGH: reuse ranking ran after items.truncate(limit), so a canonical reusable symbol below the base-score cut was discarded before the boost applied. Score reuse on a bounded window (limit*5) BEFORE the cut and re-sort, so it can be promoted into the pack, not merely reordered. - consumer count now excludes structural Defines/Imports edges and counts distinct source nodes, mirroring the resolution scorer — raw inbound edge volume overstated reuse. - deterministic search rerank: equal adjusted scores now tie-break on symbol name, stored path, then node id instead of Tantivy doc order. Refactor: extract sort_pack_items for the shared pack-item comparator. Tests: a shared, widely-consumed symbol with a lower base score is promoted past a truncate-to-1; equal-score rerank hits sort stably.
plan_change was a bare alias for planning_pack, returning a ContextPackResponse (WS-3 / E1). Make it a distinct typed product: - PlanChangeResponse with the nine fixed sections (reuse_candidates, sibling_clone_targets, standards_to_preserve, integration_checks, cross_repo_reachability, write_path_or_state_machine_risks, required_braingent_records, open_unknowns, verification_plan) — every section always present so the contract is stable for consumers and the G7 gate. - build_plan_change projects the planning-pack data into those sections (reuse_candidates = shared-module members surfaced by S3 ranking; sibling_clone_targets = local related items; cross_repo_reachability = planning proofs; open_unknowns = unresolved gaps). Pure over its inputs for unit testing. - plan_change_tool now returns Json<PlanChangeResponse> via run_plan_change. Sections needing proactive queries (B7) and evidentiary fields (G7) are populated by later WS-3 slices; they ship as empty-but-present for now. Test: the projection routes shared vs local items, gaps, and proofs into the correct sections and excludes the target item.
The typed plan_change sections integration_checks and
write_path_or_state_machine_risks shipped empty (WS-3 / B7). Surface the
change-impact evidence the planning pack already gathers into them:
- integration_checks: confirmed downstream consumers ("verify X still
works") and probable downstream repos ("check X, partial evidence").
- write_path_or_state_machine_risks: cross-repo callers whose contract
must be preserved, unresolved-possible impacts to confirm, and a note
when downstream fan-out was capped.
Kept as a pure projection over ChangeImpactSummary so it stays unit-
testable; threaded through run_plan_change from the pack data.
Test: confirmed downstream + cross-repo caller populate the two sections;
standards_to_preserve / required_braingent_records remain empty for G7.
Make the plan_change contract evidentiary, not just structural (WS-3 / G7): - PlanChangeContract carries deterministic metadata — a schema version (PLAN_CHANGE_SCHEMA_VERSION) and the fixed nine-section manifest in canonical order — plus an exclusion_ledger recording what was dropped (downstream fan-out caps, planning warnings) so a consumer never reads a capped/filtered result as exhaustive. - validate_plan_change_contract is the gate: it fails on a stale schema version or a mangled/incomplete section manifest. Rule provenance and requirements traceability are intentionally deferred to a follow-up workstream (they need a rule-id registry and AC ingestion that don't exist yet). Tests: a freshly built product passes the gate deterministically; the gate fails on a bumped schema version and on a popped section; the exclusion ledger records a capped fan-out.
The direct MCP route used run_plan_change (typed nine-section product), but the batch_query dispatcher still routed both planning_pack and plan_change to planning_pack_tool, so batched callers silently got the legacy ContextPackResponse shape — a split-brain contract. - Split the composite dispatcher arm: plan_change now calls run_plan_change; planning_pack keeps planning_pack_tool. - Update the tool catalog: plan_change is the typed plan-change product, not an "alias for planning-oriented context". Test: a batch_query plan_change op returns reuse_candidates and the contract section manifest (keys absent from ContextPackResponse), proving the batch and direct routes now agree.
The index records last_commit_sha per repo and head_sha() resolves the
current HEAD, but nothing compared them at query time, so a trace/search
could silently answer about code that no longer matches the working tree
(WS-4 / A13). Storage and HEAD resolution already exist; this adds the
comparison.
- IndexFreshness { Fresh | Stale { indexed_sha, head_sha } | NeverIndexed }
— a query-time freshness verdict, distinct from HistorySyncOutcome
(which describes an indexing run).
- classify_freshness(indexed_sha, head_sha): pure, unit-testable.
- GitHistoryIndexer::index_freshness(indexed_sha): resolves HEAD and
classifies. Both re-exported at the crate root.
Test: matching SHA => Fresh; older indexed SHA => Stale (carrying both);
no recorded SHA => NeverIndexed.
Follow-up review pass on the recall + reuse + plan_change work. Correctness: - plan_change exclusion_ledger now records standards_to_preserve and required_braingent_records as "not yet computed", so an empty section is never read as "nothing applies"; stale "arrive with G7" comment fixed. - Synonym table is now symmetric concept groups, so recall no longer depends on which side of a pair the user typed (login <-> authenticate). - is_shared_module_path markers made language-agnostic (common/, /lib/, libs/, internal/, /pkg/, packages/) so non-JS layouts aren't mis-bucketed. Cleanup / perf: - Extracted EdgeKind::is_consumer_edge() in core; replaced the four duplicated `!matches!(.., Defines | Imports)` filters (packs x3, anchor). - Reuse boost precomputes distinct consumer counts in one pre-pass (deduped by symbol_id) instead of per-item graph I/O in the loop, and re-sorts only the boosted window (bounded) instead of the full vec. Docs (explicit reviewer decisions): - limit*5 reuse window is a documented recall ceiling for high fan-out. - exact-match boost is inert for multi-word queries (coverage ranks them). - ceil(n/2)=1 partial-match recall for 2-word fallback is intentional. Tests: symmetric-synonym reverse bridge; ledger records not-yet-computed sections. Not changed: threading Canonical onto PackItem (#6-full) — canonical_for_ node yields cross-repo identity, not a reusable-module signal, so it is the wrong input for reuse_candidates; a true classifier is the deferred A6 work.
Deploying gather-step with
|
| Latest commit: |
94496e6
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://1b3a73df.gather-step.pages.dev |
| Branch Preview URL: | https://feat-search-recall-or-fallba.gather-step.pages.dev |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Planning- and reuse-quality work toward v4.3.1, plus the first batch of the v4.3.1 gap-closure backlog. Opened as a draft — this is a reviewable checkpoint, not the full backlog. The branch carries the prior WS-1/WS-2/WS-3 + A13 work (already committed) and this session's additions on top.
This release is versioned 4.3.1 (app, Cargo workspace, internal crate deps, website metadata, landing stamps, and changelog all synced).
Root Cause / Context
Driven by two triple-reviewed plans:
2026-06-02-v4.3.0-implementation-plan.md(§0.5 authoritative) — recall → reuse → planning rails.2026-06-02-v4.3.1-gap-closure-plan.md— closes recorded braingent misses (lock-degradation, display-ownership, mongo/Atlas traps, polyrepo review).The core v4.3.0 diagnosis (search is conjunctive-by-default → reuse search returns empty) and the v4.3.1 root-causes (redb exclusive-on-open lock; semantically-orphaned gateway node) are code-grounded in those plans.
Key Decisions
DF4, embeddings (WS-12), polyrepo materialization (WS-G1) are deferred. Per §0.5.1 the R7 snapshot ratchet had to land before any schema work; it now has (this PR), so the schema wave is unblocked but remains its own multi-day effort. Quality bar over volume.plan_changecontract bumped to schema v2 for the newdisplay_ownership_checkssection; the G7 gate asserts the exact manifest, so the bump is deterministic.EX_TEMPFAIL), distinct from generic failure (1), so a blocked read can never look like "found nothing".What landed this session (all TDD, green)
traversemulti-path provenance +depth_capped/truncatedsignalinggather-step statusdegraded: graph_lockedJSON disclosuredisplay_ownership_checksplanning section inplan_changepass_two_gap_dimensions(8) +v1_completeness_checklist(19, V1–V9 folded) inplan_change(schema v3)gather-step doctorAlready on the branch from prior sessions: S1/S2/S4 recall, A14 confidence filter, S3 graph-ranked reuse, E1 typed
plan_change, B7, G7, A13 core.Deferred (with reasons — not silently dropped)
pr-reviewmaterialization (PRM1–4) — P0 product blocker but L-effort, touchesengine.rs/index_runner.rs; needs its own PR + parity coverage.Files Changed
crates/gather-step-analysis/src/query.rs— traverse provenance/caps (B8/B9)crates/gather-step-analysis/src/mongo_query_safety.rs(new) +lib.rs— MQS1–3, AIX1crates/gather-step-cli/src/commands/status.rs— A13 freshness columncrates/gather-step-cli/src/errors.rs,main.rs— REL1crates/gather-step-mcp/src/tools/packs.rs,catalog.rs,server.rs— DSO1 + contract v2Cargo.toml,Cargo.lock,website/**— 4.3.1 version + changelog syncTest Plan
cargo test --workspace→ 1557 passed, 12 ignored (51 suites).cargo clippy --workspace --all-targets -- -D warnings→ clean.cargo fmt --all --check→ clean.Follow-ups
pr-reviewmaterialization (P0 blocker, own PR).doctor).🤖 Generated with Claude Code