enhance: knowhere support metric mhjaccard and index minhash lsh for minhash vector#1207
Conversation
|
@cqy123456 🔍 Important: PR Classification Needed! For efficient project management and a seamless review process, it's essential to classify your PR correctly. Here's how:
For any PR outside the kind/improvement category, ensure you link to the associated issue using the format: “issue: #”. Thanks for your efforts and contribution to the community!. |
|
@cqy123456 what's the difference between this PR and #1203 ? |
|
There were some problems in #1203, so it was rolled back in the release version. |
| if (!file.is_open()) { | ||
| throw std::runtime_error("fail to open file: " + bin_file); | ||
| } | ||
| file.read(reinterpret_cast<char*>(&u32_rows), sizeof(uint32_t)); |
There was a problem hiding this comment.
maybe, I would add a check whether it was possible to read the requested number of bytes, so in case to ensure that the file is not truncated.
| throw std::runtime_error("fail to open file: " + bin_file); | ||
| } | ||
| file.read(reinterpret_cast<char*>(&u32_rows), sizeof(uint32_t)); | ||
| file.read(reinterpret_cast<char*>(&u32_dim), sizeof(uint32_t)); |
| } | ||
| uint32_t n, d; | ||
| file.read(reinterpret_cast<char*>(&n), sizeof(uint32_t)); | ||
| file.read(reinterpret_cast<char*>(&d), sizeof(uint32_t)); |
| const char* y1_b = y1 + element_size * i; | ||
| const char* y2_b = y2 + element_size * i; | ||
| const char* y3_b = y3 + element_size * i; | ||
| dis0 += (std::memcmp(x_b, y0_b, element_size) == 0); |
There was a problem hiding this comment.
(std::memcmp() == 0) ? 1 : 0 please
|
|
||
| template <typename DataType> | ||
| expected<DataSetPtr> | ||
| MinHashLSHNode<DataType>::Search(const DataSetPtr dataset, std::unique_ptr<Config> cfg, |
| .description("hash code is in ann-rss or in mmap file.") | ||
| .set_default(true) | ||
| .for_deserialize(); | ||
| KNOWHERE_CONFIG_DECLARE_FIELD(shared_bloom_filter) |
There was a problem hiding this comment.
Just to discuss, if the bloom filter is constructed in the load phase, it will occupy too much CPU?
There was a problem hiding this comment.
1 cpu, no omp and thread pool
There was a problem hiding this comment.
The upper layer load behavior is concurrent
…minhash vector Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cqy123456, foxspy The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/kind improvement |
issue: milvus-io/milvus#41746