Skip to content

fix(knn): reapply sort after rescoring distances#4588

Open
sanikolaev wants to merge 1 commit into
masterfrom
fix/knn-rescore-stable-order
Open

fix(knn): reapply sort after rescoring distances#4588
sanikolaev wants to merge 1 commit into
masterfrom
fix/knn-rescore-stable-order

Conversation

@sanikolaev
Copy link
Copy Markdown
Collaborator

Related issue #4320

@sanikolaev sanikolaev force-pushed the fix/knn-rescore-stable-order branch from 9536712 to 12501e9 Compare May 21, 2026 04:21
@manticoresoftware manticoresoftware deleted a comment from github-actions Bot May 21, 2026
@sanikolaev sanikolaev force-pushed the fix/knn-rescore-stable-order branch from 12501e9 to 262fdc0 Compare May 21, 2026 05:12
@sanikolaev sanikolaev changed the title fix(knn): preserve prior order for equal rescored distances fix(knn): reapply sort after rescoring distances May 21, 2026
Related issue #4320

KNN rescore now copies exact distances into the original knn_dist() sort key and reruns the original comparator, so explicit ORDER BY tie-breakers are respected for equal rescored distances.
@sanikolaev sanikolaev force-pushed the fix/knn-rescore-stable-order branch from 262fdc0 to 6b7bc79 Compare May 21, 2026 05:32
@github-actions
Copy link
Copy Markdown
Contributor

clt

❌ CLT tests in test/clt-tests/buddy/test-buddy-protocol-validation test/clt-tests/buddy/test-fuzzy-search-non-min-infix-len test/clt-tests/buddy/test-log-level-buddy-sync test/clt-tests/buddy/test-manticore-version-in-telemetry test/clt-tests/buddy/test-show-version test/clt-tests/buddy/test-unserialize-error-absence-kafka-operations test/clt-tests/buddy-plugins/test-conversational-advanced test/clt-tests/buddy-plugins/test-conversational-basic test/clt-tests/buddy-plugins/test-conversational-intent test/clt-tests/buddy-plugins/test-conversational-negative test/clt-tests/buddy-plugins/test-enable-disable-buddy-plugin
✅ OK: 10
❌ Failed: 1
⏳ Duration: 195s
👉 Check Action Results for commit cfe45f3

Failed tests:

🔧 Edit failed tests in UI:

test/clt-tests/buddy-plugins/test-conversational-advanced.rec
––– input –––
export SEARCHD_FLAGS="--iostats --cpustats"
––– output –––
OK
––– input –––
rm -f /var/log/manticore/searchd.log; stdbuf -oL searchd $SEARCHD_FLAGS > /dev/null; if timeout 10 grep -qm1 '\[BUDDY\] started' <(tail -n 1000 -f /var/log/manticore/searchd.log); then echo 'Buddy started!'; else echo 'Timeout or failed!'; cat /var/log/manticore/searchd.log;fi
––– output –––
OK
––– input –––
apt-get install jq -y > /dev/null; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "DROP TABLE IF EXISTS knowledge_base;"
––– output –––
OK
––– input –––
mysql -h0 -P9306 << EOF
CREATE TABLE knowledge_base (
    id BIGINT,
    title TEXT,
    summary TEXT,
    content TEXT,
    category TEXT,
    tags TEXT,
    embedding_vector FLOAT_VECTOR
    knn_type='hnsw'
    hnsw_similarity='cosine'
    model_name='sentence-transformers/all-MiniLM-L6-v2'
    from='title,content'
    api_key='\${OPENAI_API_KEY}'
) TYPE='rt';
EOF
––– output –––
OK
––– input –––
mysql -h0 -P9306 << 'EOF'
INSERT INTO knowledge_base (id, title, summary, content, category, tags)
VALUES
(1, 'Getting Started with Manticore', 'Quick start guide', 'Install Manticore Search and configure your first index', 'Tutorial', 'installation,quickstart'),
(2, 'Vector Search Explained', 'Understanding vectors', 'Vector search uses embeddings to find semantically similar documents', 'Concepts', 'vectors,knn,similarity'),
(3, 'Full-Text Search Guide', 'Text search basics', 'Use MATCH() operator for full-text search with ranking', 'Tutorial', 'fulltext,match,ranking'),
(4, 'RAG Architecture', 'RAG pattern overview', 'Retrieval-Augmented Generation combines search with LLM for intelligent responses', 'Advanced', 'rag,llm,ai'),
(5, 'Clustering Setup', 'Distributed search', 'Configure replication and sharding for high availability', 'Advanced', 'cluster,replication,sharding');
EOF
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) FROM knowledge_base;"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE CHAT MODEL fast_assistant (model='openai:gpt-3.5-turbo', max_document_length=0, retrieval_limit=3);"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE CHAT MODEL detailed_assistant (model='openai:gpt-4o', max_document_length=0, retrieval_limit=5);"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE CHAT MODEL creative_assistant (model='openai:gpt-4o-mini', max_document_length=2000, retrieval_limit=4);"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SHOW CHAT MODELS;" | grep -c "openai"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CALL CHAT('Tell me about vector search', 'knowledge_base', 'fast_assistant')\G;" | awk '/^         response: ?/ { r=$0; in_response=1; next } in_response && /^          sources: / { print r; print; in_response=0; next } in_response { r=r " " $0; next } { print } END { if (in_response) print r }' | tee /tmp/output_1.txt
––– output –––
OK
––– input –––
grep 'sources:' /tmp/output_1.txt | grep -o '\[.*\]' | head -1 | { read -r s; if echo "$s" | jq -e 'map(.id) | contains([2,3,4])' >/dev/null 2>&1; then echo "OK"; else echo "FAIL"; echo "Found: $(echo "$s" | jq -c 'map(.id)')"; echo "$s"; fi; }
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CALL CHAT('Explain vector search', 'knowledge_base', 'detailed_assistant')\G;" | awk '/^         response: ?/ { r=$0; in_response=1; next } in_response && /^          sources: / { print r; print; in_response=0; next } in_response { r=r " " $0; next } { print } END { if (in_response) print r }' | tee /tmp/output_2.txt
––– output –––
OK
––– input –––
grep 'sources:' /tmp/output_2.txt | grep -o '\[.*\]' | head -1 | { read -r s; if echo "$s" | jq -e 'map(.id) | contains([2,3,4])' >/dev/null 2>&1; then echo "OK"; else echo "FAIL"; echo "Found: $(echo "$s" | jq -c 'map(.id)')"; echo "$s"; fi; }
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CALL CHAT('Tell me about RAG architecture', 'knowledge_base', 'creative_assistant')\G;" | awk '/^         response: ?/ { r=$0; in_response=1; next } in_response && /^          sources: / { print r; print; in_response=0; next } in_response { r=r " " $0; next } { print } END { if (in_response) print r }' | tee /tmp/output_3.txt
––– output –––
OK
––– input –––
grep 'sources:' /tmp/output_3.txt | grep -o '\[.*\]' | head -1 | { read -r s; if echo "$s" | jq -e 'map(.id) | contains([4])' >/dev/null 2>&1; then echo "OK"; else echo "FAIL"; echo "Found: $(echo "$s" | jq -c 'map(.id)')"; echo "$s"; fi; }
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CALL CHAT('What is clustering?', 'knowledge_base', 'fast_assistant', 'conv-test-1234-5678-90ab-cdef12345678')\G;" | awk '/^         response: ?/ { r=$0; in_response=1; next } in_response && /^          sources: / { print r; print; in_response=0; next } in_response { r=r " " $0; next } { print } END { if (in_response) print r }' | tee /tmp/output_4.txt
––– output –––
OK
––– input –––
grep 'sources:' /tmp/output_4.txt | grep -o '\[.*\]' | head -1 | { read -r s; if echo "$s" | jq -e 'map(.id) | contains([5])' >/dev/null 2>&1; then echo "OK"; else echo "FAIL"; echo "Found: $(echo "$s" | jq -c 'map(.id)')"; echo "$s"; fi; }
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CALL CHAT('How do I configure it?', 'knowledge_base', 'fast_assistant', 'conv-test-1234-5678-90ab-cdef12345678')\G;" | awk '/^         response: ?/ { r=$0; in_response=1; next } in_response && /^          sources: / { print r; print; in_response=0; next } in_response { r=r " " $0; next } { print } END { if (in_response) print r }' | tee /tmp/output_5.txt
––– output –––
OK
––– input –––
grep 'sources:' /tmp/output_5.txt | grep -o '\[.*\]' | head -1 | { read -r s; if echo "$s" | jq -e 'map(.id) | contains([5])' >/dev/null 2>&1; then echo "OK"; else echo "FAIL"; echo "Found: $(echo "$s" | jq -c 'map(.id)')"; echo "$s"; fi; }
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CALL CHAT('What are best practices?', 'knowledge_base', 'detailed_assistant', 'conv-test-1234-5678-90ab-cdef12345678')\G;" | awk '/^         response: ?/ { r=$0; in_response=1; next } in_response && /^          sources: / { print r; print; in_response=0; next } in_response { r=r " " $0; next } { print } END { if (in_response) print r }' | tee /tmp/output_6.txt
––– output –––
OK
––– input –––
grep 'sources:' /tmp/output_6.txt | grep -o '\[.*\]' | head -1 | { read -r s; if echo "$s" | jq -e 'map(.id) | contains([5])' >/dev/null 2>&1; then echo "OK"; else echo "FAIL"; echo "Found: $(echo "$s" | jq -c 'map(.id)')"; echo "$s"; fi; }
––– output –––
- OK
+ FAIL
+ Found: []
+ []
––– input –––
mysql -h0 -P9306 -e "CALL CHAT('Show me tutorials', 'knowledge_base', 'creative_assistant', 'conv-other-aaaa-bbbb-cccc-dddddddddddd')\G;" | awk '/^         response: ?/ { r=$0; in_response=1; next } in_response && /^          sources: / { print r; print; in_response=0; next } in_response { r=r " " $0; next } { print } END { if (in_response) print r }'
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "DROP CHAT MODEL fast_assistant;"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "DROP CHAT MODEL detailed_assistant;"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "DROP CHAT MODEL creative_assistant;"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SHOW CHAT MODELS;"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "DROP TABLE IF EXISTS knowledge_base;"
––– output –––
OK

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant