awsopensearch add normalize for cosine similarity when indexing data#614
awsopensearch add normalize for cosine similarity when indexing data#614norrishuang wants to merge 2 commits intozilliztech:mainfrom
Conversation
…call rate with Cohere dataset
| if self.case_config.metric_type.upper() == "COSINE": | ||
| log.info("cosine dataset need normalize.") | ||
| return True | ||
| return False |
There was a problem hiding this comment.
The need_normalize_cosine function is designed for databases that do NOT natively support cosine similarity metrics. When testing cosine-based datasets, this function normalizes all training and query vectors before tests, to make the metric equivalent to L2 (Euclidean) or IP (Inner Product), ensuring accurate recall calculations. This is a workaround solution when cosine similarity is not directly supported.
For databases that already support cosine similarity, we do not recommend enabling this feature. The goal is to maintain consistent benchmarking across different databases and ensure fair testing conditions.
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: norrishuang The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
awsopensearch add normalize for cosine similarity, solved issue of recall rate with Cohere dataset