feat(query): position-weighted seeding + coherence gate for LIP rerank by SimplyLiz · Pull Request #209 · SimplyLiz/CodeMCP

SimplyLiz · 2026-04-15T10:57:06Z

Summary

Two long-standing issues in the Fast-tier LIP re-ranker (internal/query/lip_ranker.go):

Lexical-bias feedback loop. The centroid was built from the top-5 lexically-ranked seeds, uniformly averaged. If lexical was wrong, the centroid inherited the bias and the semantic "correction" reinforced it.
No signal-strength check. Orthogonal / scattered seed vectors produced a near-zero centroid that still got the same 0.4 semantic weight as a strong coherent signal — so weak signals drove rerank decisions.

This PR:

Position-weighted seeding with weights 1/(rank+1), each seed L2-normalised before averaging. Top-1 still dominates, but softly — lower-ranked seeds pull the centroid proportionally less.
Coherence gate. Because each seed is unit-length, ||centroid|| lies in [0, 1] and directly measures seed agreement. Below MinCoherence (default 0.35) the rerank skips the semantic pass and returns lexical order — fall back instead of amplify noise.
RerankConfig struct (LexicalWeight, SemanticWeight, SeedCount, MinCoherence) + DefaultRerankConfig(). Defaults preserve prior 0.6/0.4/5 behaviour; per-repo tuning no longer requires a recompile.
Injected embedBatchFn so the rerank logic is unit-testable without a running LIP daemon.

No call-site changes — RerankWithLIP(ctx, results, repoRoot, query) signature unchanged.

Test plan

6 new tests in lip_ranker_test.go:
- coherent seeds promote an aligned lower-ranked candidate
- scattered seeds fall back to lexical order
- daemon unavailable → order preserved
- ≤3 results → no embed call
- config override with lexical-dominant weights preserves top-1
- buildSeedCentroid coherence in {aligned, opposed, single-seed} cases
go test ./internal/query/... — full package passes.
go build ./... clean.
gofmt clean.

Follow-ups (not in this PR)

Build an actual evaluation harness against a golden query set to empirically tune the four RerankConfig values — the current defaults are a starting point, not a proven optimum.

🤖 Generated with Claude Code

The Fast-tier LIP rerank built its query centroid from the top-5 lexical results, uniformly averaged — two problems: 1. Lexical-bias feedback loop: if the top-5 lexical hits are wrong, the centroid inherits the bias verbatim and amplifies it downstream. 2. No signal-strength check: orthogonal / scattered seeds produced a near-zero centroid that still got the same 0.4 semantic weight as a strong, coherent signal. Changes: - Position-weighted seeding (1/(rank+1)) with each seed L2-normalised before averaging — top-1 dominates softly rather than uniformly. - Centroid coherence gate: since each seed is unit-length, the resulting centroid norm lies in [0, 1] and is a direct measure of seed agreement. Below MinCoherence (0.35) we skip the semantic pass and return lexical order — better to be lexical-only than to amplify noise. - Extract RerankConfig {LexicalWeight, SemanticWeight, SeedCount, MinCoherence} with DefaultRerankConfig(). Defaults match prior behaviour; per-repo tuning is now a config change, not a code change. - Inject embedBatchFn so the rerank logic is unit-testable without a running LIP daemon. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-04-15T10:58:26Z

NFR Tests ✅ 39 unchanged

Comparing PR against main branch (dynamic baseline).

Regressions: 0 ✅

Thresholds: WARN ≥ +5% • FAIL ≥ +10%

All scenarios

	Scenario	Change	Actual (B)	Base (B)	Time
⚪	analyzeChange / large	+0.0%	193,169	193,169	654µs
⚪	analyzeChange / medium	+0.0%	38,575	38,575	151µs
⚪	analyzeChange / small	+0.0%	4,046	4,046	28µs
⚪	analyzeChange / xlarge	+0.0%	387,417	387,417	1.70872ms
⚪	analyzeImpact / large	+0.0%	17,966	17,966	123µs
⚪	analyzeImpact / small	+0.0%	1,924	1,924	20µs
⚪	batchGet / large	+0.0%	11,789	11,789	59µs
⚪	batchGet / small	+0.0%	4,733	4,733	35µs
⚪	batchSearch / large	+0.0%	90,816	90,816	259µs
⚪	batchSearch / medium	+0.0%	18,036	18,036	98µs
⚪	batchSearch / small	+0.0%	3,379	3,379	21µs
⚪	explore / large	+0.0%	94,262	94,262	1.014956ms
⚪	explore / small	+0.0%	4,253	4,253	73µs
⚪	findReferences / large	+0.0%	445,943	445,943	2.289934ms
⚪	findReferences / medium	+0.0%	44,123	44,123	291µs
⚪	findReferences / small	+0.0%	4,395	4,395	49µs
⚪	getAffectedTests / large	+0.0%	7,521	7,521	47µs
⚪	getAffectedTests / medium	+0.0%	3,110	3,110	42µs
⚪	getAffectedTests / small	+0.0%	903	903	14µs
⚪	getAffectedTests / xlarge	+0.0%	14,870	14,870	76µs
⚪	getArchitecture / large	+0.0%	6,690	6,690	76µs
⚪	getArchitecture / small	+0.0%	960	960	16µs
⚪	getCallGraph / deep	+0.0%	15,238	15,238	117µs
⚪	getCallGraph / shallow	+0.0%	887	887	37µs
⚪	getHotspots / large	+0.0%	16,748	16,748	183µs
⚪	getHotspots / small	+0.0%	886	886	16µs
⚪	listEntrypoints / large	+0.0%	23,798	23,798	550µs
⚪	listEntrypoints / small	+0.0%	4,795	4,795	217µs
⚪	prepareChange / large	+0.0%	16,194	16,194	119µs
⚪	prepareChange / small	+0.0%	2,483	2,483	27µs
⚪	searchSymbols / large	+0.0%	90,246	90,246	453µs
⚪	searchSymbols / medium	+0.0%	17,766	17,766	101µs
⚪	searchSymbols / small	+0.0%	3,588	3,588	39µs
⚪	summarizeDiff / large	+0.0%	19,939	19,939	176µs
⚪	summarizeDiff / small	+0.0%	2,133	2,133	28µs
⚪	traceUsage / large	+0.0%	7,728	7,728	87µs
⚪	traceUsage / small	+0.0%	725	725	13µs
⚪	understand / large	+0.0%	460,608	460,608	2.064011ms
⚪	understand / small	+0.0%	5,555	5,555	55µs

* = new scenario, compared against static baseline

github-actions · 2026-04-15T10:58:27Z

🟢 Change Impact Analysis

Metric	Value
Risk Level	LOW 🟢
Files Changed	2
Symbols Changed	1
Directly Affected	0
Transitively Affected	0

Blast Radius: 0 modules, 0 files, 0 unique callers

📝 Changed Symbols (1)

Symbol	File	Type	Confidence
`internal/query/lip_ranker.go`	`internal/query/lip_ranker.go`	modified	30%

Recommendations

ℹ️ coverage: 1 symbols have low mapping confidence. Index may be stale.
- Action: Run 'ckb index' to refresh the SCIP index

_{Generated by CKB}

github-actions · 2026-04-15T10:59:21Z

CKB Analysis

🎯 1 changed → 0 affected · 🔥 1 hotspot · 📊 1 complex · 💣 1 blast · 📚 197 stale

Risk factors: Touches 1 hotspot(s)

👥 Suggested: @lisa.welsch1985@gmail.com (100%), @lisa@tastehub.io (50%)

Metric	Value
Impact Analysis	1 symbols → 0 affected	🟢
Doc Coverage	6.598984771573605%	⚠️
Complexity	1 violations	⚠️
Coupling	0 gaps	✅
Blast Radius	0 modules, 0 files	✅
Index	indexed (0s)	🆕

🎯 Change Impact Analysis · 🟢 LOW · 1 changed → 0 affected

Metric	Value
Symbols Changed	1
Directly Affected	0
Transitively Affected	0
Modules in Blast Radius	0
Files in Blast Radius	0

Symbols changed in this PR:

internal/query/lip_ranker.go [modified] (30%) — internal/query/lip_ranker.go

Recommendations:

ℹ️ 1 symbols have low mapping confidence. Index may be stale.
- Action: Run 'ckb index' to refresh the SCIP index

💣 Blast radius · 0 symbols · 1 tests · 0 consumers

Tests that may break:

internal/query/lip_ranker_test.go

🔥 Hotspots · 1 volatile files

File	Churn Score
`internal/query/lip_ranker.go`	7.69

📊 Complexity · 1 violations

File	Cyclomatic	Cognitive
`internal/query/lip_ranker.go`	⚠️ 18	⚠️ 31

💡 Quick wins · 10 suggestions

🟡 Add tests → internal/audit/analyzer.go
🟡 Add tests → internal/query/engine_test.go
🟡 Add tests → internal/incremental/store_test.go
🟡 Add tests → internal/incremental/detector_test.go
🟡 Add tests → internal/identity/identity_test.go
🟢 Assign backup owner → internal/audit/analyzer.go
🟢 Assign backup owner → internal/query/engine_test.go
🟢 Assign backup owner → internal/incremental/store_test.go

📚 Stale docs · 197 broken references

docs/CONTRIBUTING.md:108 → errors.New
docs/CONTRIBUTING.md:108 → errors.SYMBOL_NOT_FOUND
docs/CONTRIBUTING.md:111 → errors.Wrap
docs/backlog/index-refresh-improvements.md:36 → watcher.Add
docs/backlog/index-refresh-improvements.md:36 → filepath.Join
docs/backlog/index-refresh-improvements.md:97 → time.Second
docs/featureplans/change-impact-analysis-checklist.md:208 → coverage.out
docs/featureplans/change-impact-analysis-checklist.md:214 → lcov.info
docs/featureplans/change-impact-analysis-checklist.md:309 → coverage.out
docs/featureplans/change-impact-analysis.md:448 → scip.SCIPIndex
docs/featureplans/change-impact-analysis.md:478 → context.Context
docs/featureplans/change-impact-analysis.md:669 → coverage.out
docs/features/a2a/overview.md:41 → index.initialized
docs/features/a2a/overview.md:42 → index.fresh
docs/features/a2a/overview.md:43 → index.commitsBehind

_{Generated by CKB · Run details}

codecov · 2026-04-15T11:02:12Z

Codecov Report

❌ Patch coverage is 80.64516% with 12 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
internal/query/lip_ranker.go	80.6%	7 Missing and 5 partials ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##            main    #209     +/-   ##
=======================================
+ Coverage   43.0%   43.1%   +0.1%     
=======================================
  Files        526     526             
  Lines      81101   81133     +32     
=======================================
+ Hits       34914   35018    +104     
+ Misses     43719   43647     -72     
  Partials    2468    2468

Flag	Coverage Δ
unit	`43.1% <80.6%> (+0.1%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

📢 Thoughts on this report? Let us know!

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

github-actions · 2026-04-15T11:02:54Z

CKB review failed to generate output.

CKB used to re-probe IndexStatus on a 60 s TTL every time the rerank path checked whether the LIP index was mixed-models. The LIP daemon has pushed IndexChanged frames to all active sessions since v1.5.0, so the polling was pure debt. New internal/lip/subscribe.go opens a long-lived connection, pings IndexStatus every 3 s to flush the daemon's broadcast channel (the session loop drains queued pushes only after writing a response), reads every frame in a loop, and routes index_changed and index_status by type tag. Reconnects with exponential backoff to 30 s on daemon drop. Engine owns one subscriber, started in NewEngine and cancelled in Close. The cached availability/mixed flags are now written by the subscriber, not the query path — lipSemanticAvailable is lock-free RLock+read, no RPC. Worst-case staleness for the rerank gate drops from 60 s to ~3 s. Tests rewritten: the fake daemon now serves multiple requests per connection and tests wait for the first health frame before asserting, plus a new TestLipSubscriber_ReusesSingleConnection verifies the hot path issues zero requests. Also: CHANGELOG entries for #208 and #209 which landed on develop without one. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(lip): push-driven health via long-lived subscribe CKB used to re-probe IndexStatus on a 60 s TTL every time the rerank path checked whether the LIP index was mixed-models. The LIP daemon has pushed IndexChanged frames to all active sessions since v1.5.0, so the polling was pure debt. New internal/lip/subscribe.go opens a long-lived connection, pings IndexStatus every 3 s to flush the daemon's broadcast channel (the session loop drains queued pushes only after writing a response), reads every frame in a loop, and routes index_changed and index_status by type tag. Reconnects with exponential backoff to 30 s on daemon drop. Engine owns one subscriber, started in NewEngine and cancelled in Close. The cached availability/mixed flags are now written by the subscriber, not the query path — lipSemanticAvailable is lock-free RLock+read, no RPC. Worst-case staleness for the rerank gate drops from 60 s to ~3 s. Tests rewritten: the fake daemon now serves multiple requests per connection and tests wait for the first health frame before asserting, plus a new TestLipSubscriber_ReusesSingleConnection verifies the hot path issues zero requests. Also: CHANGELOG entries for #208 and #209 which landed on develop without one. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: drop stray cmd/ckb-bench/version_test.go Had a package-level init() that printed cartographer version on import under the cartographer build tag — not a test, and leaked output into unrelated test runs when the tag was enabled. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

SimplyLiz merged commit 88b3ab3 into main Apr 15, 2026
12 checks passed

SimplyLiz deleted the lip-rerank-tuning branch April 15, 2026 11:00

SimplyLiz mentioned this pull request Apr 15, 2026

feat(lip): push-driven health via long-lived subscribe #210

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(query): position-weighted seeding + coherence gate for LIP rerank#209

feat(query): position-weighted seeding + coherence gate for LIP rerank#209
SimplyLiz merged 1 commit intomainfrom
lip-rerank-tuning

SimplyLiz commented Apr 15, 2026

Uh oh!

github-actions bot commented Apr 15, 2026

Uh oh!

github-actions bot commented Apr 15, 2026

Uh oh!

github-actions bot commented Apr 15, 2026

Uh oh!

Uh oh!

codecov bot commented Apr 15, 2026

Uh oh!

github-actions bot commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

SimplyLiz commented Apr 15, 2026

Summary

Test plan

Follow-ups (not in this PR)

Uh oh!

github-actions bot commented Apr 15, 2026

NFR Tests ✅ 39 unchanged

Uh oh!

github-actions bot commented Apr 15, 2026

🟢 Change Impact Analysis

Recommendations

Uh oh!

github-actions bot commented Apr 15, 2026

CKB Analysis

Uh oh!

Uh oh!

codecov bot commented Apr 15, 2026

Codecov Report

Uh oh!

github-actions bot commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant