feat: Add reciprocal rank fusion to MultiRetriever#11220
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub. 1 Skipped Deployment
|
Coverage reportClick to see where and how coverage changed
This report was generated by python-coverage-comment-action |
||||||||||||||||||||||||||||||||||||||||||||||||
Code reviewNo issues found. Checked for bugs and CLAUDE.md compliance. |
bogdankostic
left a comment
There was a problem hiding this comment.
Looking good in general, let's just distinguish the same name _reciprocal_rank_fusion for the utility function and the method of DocumentJoiner.
Also, in DocumentJoiner we allow to set weights for the different retrieval results, should we support that for MultiRetriever as well?
| @@ -196,32 +197,8 @@ def _merge(self, document_lists: list[list[Document]]) -> list[Document]: | |||
| def _reciprocal_rank_fusion(self, document_lists: list[list[Document]]) -> list[Document]: | |||
There was a problem hiding this comment.
The name of the method shadows the name of the imported utility function, maybe we can have different names for both for better readability?
I think it's fine to leave out for now and we can add it later if people ask for it. I expect that most users will be sending these docs downstream to a ranker component anyways. |
Related Issues
MultiRetrieverandTextEmbeddingRetriever#11219 (review)Proposed Changes:
Add
join_modeparam toMultiRetrieverlike inDocumentJoinerwith the default set to rank and de-duplicate via reciprocal rank fusion. Makes the component's runs more consistent and reproducible and produces a decent ranking without needing to specify any additional rankers. RRF must be baked in and can't be done downstream b/c the returned list is flattened so the info needed for RRF is lost.Created new util method to reuse in both components
How did you test it?
Added new unit tests
Notes for the reviewer
Checklist
fix:,feat:,build:,chore:,ci:,docs:,style:,refactor:,perf:,test:and added!in case the PR includes breaking changes.