Skip to content

feat(query): position-weighted seeding + coherence gate for LIP rerank#209

Merged
SimplyLiz merged 1 commit intomainfrom
lip-rerank-tuning
Apr 15, 2026
Merged

feat(query): position-weighted seeding + coherence gate for LIP rerank#209
SimplyLiz merged 1 commit intomainfrom
lip-rerank-tuning

Conversation

@SimplyLiz
Copy link
Copy Markdown
Owner

Summary

Two long-standing issues in the Fast-tier LIP re-ranker (internal/query/lip_ranker.go):

  1. Lexical-bias feedback loop. The centroid was built from the top-5 lexically-ranked seeds, uniformly averaged. If lexical was wrong, the centroid inherited the bias and the semantic "correction" reinforced it.
  2. No signal-strength check. Orthogonal / scattered seed vectors produced a near-zero centroid that still got the same 0.4 semantic weight as a strong coherent signal — so weak signals drove rerank decisions.

This PR:

  • Position-weighted seeding with weights 1/(rank+1), each seed L2-normalised before averaging. Top-1 still dominates, but softly — lower-ranked seeds pull the centroid proportionally less.
  • Coherence gate. Because each seed is unit-length, ||centroid|| lies in [0, 1] and directly measures seed agreement. Below MinCoherence (default 0.35) the rerank skips the semantic pass and returns lexical order — fall back instead of amplify noise.
  • RerankConfig struct (LexicalWeight, SemanticWeight, SeedCount, MinCoherence) + DefaultRerankConfig(). Defaults preserve prior 0.6/0.4/5 behaviour; per-repo tuning no longer requires a recompile.
  • Injected embedBatchFn so the rerank logic is unit-testable without a running LIP daemon.

No call-site changes — RerankWithLIP(ctx, results, repoRoot, query) signature unchanged.

Test plan

  • 6 new tests in lip_ranker_test.go:
    • coherent seeds promote an aligned lower-ranked candidate
    • scattered seeds fall back to lexical order
    • daemon unavailable → order preserved
    • ≤3 results → no embed call
    • config override with lexical-dominant weights preserves top-1
    • buildSeedCentroid coherence in {aligned, opposed, single-seed} cases
  • go test ./internal/query/... — full package passes.
  • go build ./... clean.
  • gofmt clean.

Follow-ups (not in this PR)

  • Build an actual evaluation harness against a golden query set to empirically tune the four RerankConfig values — the current defaults are a starting point, not a proven optimum.

🤖 Generated with Claude Code

The Fast-tier LIP rerank built its query centroid from the top-5 lexical
results, uniformly averaged — two problems:

1. Lexical-bias feedback loop: if the top-5 lexical hits are wrong, the
   centroid inherits the bias verbatim and amplifies it downstream.
2. No signal-strength check: orthogonal / scattered seeds produced a
   near-zero centroid that still got the same 0.4 semantic weight as a
   strong, coherent signal.

Changes:
- Position-weighted seeding (1/(rank+1)) with each seed L2-normalised
  before averaging — top-1 dominates softly rather than uniformly.
- Centroid coherence gate: since each seed is unit-length, the resulting
  centroid norm lies in [0, 1] and is a direct measure of seed agreement.
  Below MinCoherence (0.35) we skip the semantic pass and return lexical
  order — better to be lexical-only than to amplify noise.
- Extract RerankConfig {LexicalWeight, SemanticWeight, SeedCount,
  MinCoherence} with DefaultRerankConfig(). Defaults match prior
  behaviour; per-repo tuning is now a config change, not a code change.
- Inject embedBatchFn so the rerank logic is unit-testable without a
  running LIP daemon.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

NFR Tests ✅ 39 unchanged

Comparing PR against main branch (dynamic baseline).

Regressions: 0 ✅

Thresholds: WARN ≥ +5% • FAIL ≥ +10%

All scenarios
Scenario Change Actual (B) Base (B) Time
analyzeChange / large +0.0% 193,169 193,169 654µs
analyzeChange / medium +0.0% 38,575 38,575 151µs
analyzeChange / small +0.0% 4,046 4,046 28µs
analyzeChange / xlarge +0.0% 387,417 387,417 1.70872ms
analyzeImpact / large +0.0% 17,966 17,966 123µs
analyzeImpact / small +0.0% 1,924 1,924 20µs
batchGet / large +0.0% 11,789 11,789 59µs
batchGet / small +0.0% 4,733 4,733 35µs
batchSearch / large +0.0% 90,816 90,816 259µs
batchSearch / medium +0.0% 18,036 18,036 98µs
batchSearch / small +0.0% 3,379 3,379 21µs
explore / large +0.0% 94,262 94,262 1.014956ms
explore / small +0.0% 4,253 4,253 73µs
findReferences / large +0.0% 445,943 445,943 2.289934ms
findReferences / medium +0.0% 44,123 44,123 291µs
findReferences / small +0.0% 4,395 4,395 49µs
getAffectedTests / large +0.0% 7,521 7,521 47µs
getAffectedTests / medium +0.0% 3,110 3,110 42µs
getAffectedTests / small +0.0% 903 903 14µs
getAffectedTests / xlarge +0.0% 14,870 14,870 76µs
getArchitecture / large +0.0% 6,690 6,690 76µs
getArchitecture / small +0.0% 960 960 16µs
getCallGraph / deep +0.0% 15,238 15,238 117µs
getCallGraph / shallow +0.0% 887 887 37µs
getHotspots / large +0.0% 16,748 16,748 183µs
getHotspots / small +0.0% 886 886 16µs
listEntrypoints / large +0.0% 23,798 23,798 550µs
listEntrypoints / small +0.0% 4,795 4,795 217µs
prepareChange / large +0.0% 16,194 16,194 119µs
prepareChange / small +0.0% 2,483 2,483 27µs
searchSymbols / large +0.0% 90,246 90,246 453µs
searchSymbols / medium +0.0% 17,766 17,766 101µs
searchSymbols / small +0.0% 3,588 3,588 39µs
summarizeDiff / large +0.0% 19,939 19,939 176µs
summarizeDiff / small +0.0% 2,133 2,133 28µs
traceUsage / large +0.0% 7,728 7,728 87µs
traceUsage / small +0.0% 725 725 13µs
understand / large +0.0% 460,608 460,608 2.064011ms
understand / small +0.0% 5,555 5,555 55µs

* = new scenario, compared against static baseline

@github-actions
Copy link
Copy Markdown

🟢 Change Impact Analysis

Metric Value
Risk Level LOW 🟢
Files Changed 2
Symbols Changed 1
Directly Affected 0
Transitively Affected 0

Blast Radius: 0 modules, 0 files, 0 unique callers

📝 Changed Symbols (1)
Symbol File Type Confidence
internal/query/lip_ranker.go internal/query/lip_ranker.go modified 30%

Recommendations

  • ℹ️ coverage: 1 symbols have low mapping confidence. Index may be stale.
    • Action: Run 'ckb index' to refresh the SCIP index

Generated by CKB

@github-actions
Copy link
Copy Markdown

CKB Analysis

Risk Files +337 -55 Modules

🎯 1 changed → 0 affected · 🔥 1 hotspot · 📊 1 complex · 💣 1 blast · 📚 197 stale

Risk factors: Touches 1 hotspot(s)

👥 Suggested: @lisa.welsch1985@gmail.com (100%), @lisa@tastehub.io (50%)

Metric Value
Impact Analysis 1 symbols → 0 affected 🟢
Doc Coverage 6.598984771573605% ⚠️
Complexity 1 violations ⚠️
Coupling 0 gaps
Blast Radius 0 modules, 0 files
Index indexed (0s) 🆕
🎯 Change Impact Analysis · 🟢 LOW · 1 changed → 0 affected
Metric Value
Symbols Changed 1
Directly Affected 0
Transitively Affected 0
Modules in Blast Radius 0
Files in Blast Radius 0

Symbols changed in this PR:

Recommendations:

  • ℹ️ 1 symbols have low mapping confidence. Index may be stale.
    • Action: Run 'ckb index' to refresh the SCIP index
💣 Blast radius · 0 symbols · 1 tests · 0 consumers

Tests that may break:

  • internal/query/lip_ranker_test.go
🔥 Hotspots · 1 volatile files
File Churn Score
internal/query/lip_ranker.go 7.69
📊 Complexity · 1 violations
File Cyclomatic Cognitive
internal/query/lip_ranker.go ⚠️ 18 ⚠️ 31
💡 Quick wins · 10 suggestions
📚 Stale docs · 197 broken references

Generated by CKB · Run details

@SimplyLiz SimplyLiz merged commit 88b3ab3 into main Apr 15, 2026
12 checks passed
@SimplyLiz SimplyLiz deleted the lip-rerank-tuning branch April 15, 2026 11:00
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 15, 2026

Codecov Report

❌ Patch coverage is 80.64516% with 12 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
internal/query/lip_ranker.go 80.6% 7 Missing and 5 partials ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##            main    #209     +/-   ##
=======================================
+ Coverage   43.0%   43.1%   +0.1%     
=======================================
  Files        526     526             
  Lines      81101   81133     +32     
=======================================
+ Hits       34914   35018    +104     
+ Misses     43719   43647     -72     
  Partials    2468    2468             
Flag Coverage Δ
unit 43.1% <80.6%> (+0.1%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

📢 Thoughts on this report? Let us know!

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions
Copy link
Copy Markdown

CKB review failed to generate output.

SimplyLiz added a commit that referenced this pull request Apr 15, 2026
CKB used to re-probe IndexStatus on a 60 s TTL every time the rerank
path checked whether the LIP index was mixed-models. The LIP daemon has
pushed IndexChanged frames to all active sessions since v1.5.0, so the
polling was pure debt.

New internal/lip/subscribe.go opens a long-lived connection, pings
IndexStatus every 3 s to flush the daemon's broadcast channel (the
session loop drains queued pushes only after writing a response), reads
every frame in a loop, and routes index_changed and index_status by
type tag. Reconnects with exponential backoff to 30 s on daemon drop.

Engine owns one subscriber, started in NewEngine and cancelled in
Close. The cached availability/mixed flags are now written by the
subscriber, not the query path — lipSemanticAvailable is lock-free
RLock+read, no RPC. Worst-case staleness for the rerank gate drops
from 60 s to ~3 s.

Tests rewritten: the fake daemon now serves multiple requests per
connection and tests wait for the first health frame before asserting,
plus a new TestLipSubscriber_ReusesSingleConnection verifies the hot
path issues zero requests.

Also: CHANGELOG entries for #208 and #209 which landed on develop
without one.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
SimplyLiz added a commit that referenced this pull request Apr 15, 2026
* feat(lip): push-driven health via long-lived subscribe

CKB used to re-probe IndexStatus on a 60 s TTL every time the rerank
path checked whether the LIP index was mixed-models. The LIP daemon has
pushed IndexChanged frames to all active sessions since v1.5.0, so the
polling was pure debt.

New internal/lip/subscribe.go opens a long-lived connection, pings
IndexStatus every 3 s to flush the daemon's broadcast channel (the
session loop drains queued pushes only after writing a response), reads
every frame in a loop, and routes index_changed and index_status by
type tag. Reconnects with exponential backoff to 30 s on daemon drop.

Engine owns one subscriber, started in NewEngine and cancelled in
Close. The cached availability/mixed flags are now written by the
subscriber, not the query path — lipSemanticAvailable is lock-free
RLock+read, no RPC. Worst-case staleness for the rerank gate drops
from 60 s to ~3 s.

Tests rewritten: the fake daemon now serves multiple requests per
connection and tests wait for the first health frame before asserting,
plus a new TestLipSubscriber_ReusesSingleConnection verifies the hot
path issues zero requests.

Also: CHANGELOG entries for #208 and #209 which landed on develop
without one.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: drop stray cmd/ckb-bench/version_test.go

Had a package-level init() that printed cartographer version on import
under the cartographer build tag — not a test, and leaked output into
unrelated test runs when the tag was enabled.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant