Skip to content

enhance: knowhere support metric mhjaccard and index minhash lsh for minhash vector#1207

Merged
sre-ci-robot merged 1 commit into
zilliztech:mainfrom
cqy123456:mh-support
Jun 6, 2025
Merged

enhance: knowhere support metric mhjaccard and index minhash lsh for minhash vector#1207
sre-ci-robot merged 1 commit into
zilliztech:mainfrom
cqy123456:mh-support

Conversation

@cqy123456
Copy link
Copy Markdown
Collaborator

@cqy123456 cqy123456 commented May 29, 2025

@mergify
Copy link
Copy Markdown

mergify Bot commented May 29, 2025

@cqy123456 🔍 Important: PR Classification Needed!

For efficient project management and a seamless review process, it's essential to classify your PR correctly. Here's how:

  1. If you're fixing a bug, label it as kind/bug.
  2. For small tweaks (less than 20 lines without altering any functionality), please use kind/improvement.
  3. Significant changes that don't modify existing functionalities should be tagged as kind/enhancement.
  4. Adjusting APIs or changing functionality? Go with kind/feature.

For any PR outside the kind/improvement category, ensure you link to the associated issue using the format: “issue: #”.

Thanks for your efforts and contribution to the community!.

@alexanderguzhva
Copy link
Copy Markdown
Collaborator

@cqy123456 what's the difference between this PR and #1203 ?

@cqy123456
Copy link
Copy Markdown
Collaborator Author

There were some problems in #1203, so it was rolled back in the release version.

@mergify mergify Bot added the ci-passed label May 30, 2025
Comment thread include/knowhere/utils.h
if (!file.is_open()) {
throw std::runtime_error("fail to open file: " + bin_file);
}
file.read(reinterpret_cast<char*>(&u32_rows), sizeof(uint32_t));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe, I would add a check whether it was possible to read the requested number of bytes, so in case to ensure that the file is not truncated.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread include/knowhere/utils.h
throw std::runtime_error("fail to open file: " + bin_file);
}
file.read(reinterpret_cast<char*>(&u32_rows), sizeof(uint32_t));
file.read(reinterpret_cast<char*>(&u32_dim), sizeof(uint32_t));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread include/knowhere/utils.h
Comment thread include/knowhere/utils.h
}
uint32_t n, d;
file.read(reinterpret_cast<char*>(&n), sizeof(uint32_t));
file.read(reinterpret_cast<char*>(&d), sizeof(uint32_t));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread include/knowhere/utils.h
Comment thread src/index/minhash/minhash_index_node.cc
Comment thread src/index/minhash/minhash_lsh.h
Comment thread src/index/minhash/minhash_lsh.h
Comment thread src/index/minhash/minhash_lsh.h
Comment thread src/index/minhash/minhash_util.cc Outdated
const char* y1_b = y1 + element_size * i;
const char* y2_b = y2 + element_size * i;
const char* y3_b = y3 + element_size * i;
dis0 += (std::memcmp(x_b, y0_b, element_size) == 0);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(std::memcmp() == 0) ? 1 : 0 please

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread include/knowhere/comp/index_param.h Outdated
Comment thread src/index/minhash/minhash_index_node.cc
Comment thread src/index/minhash/minhash_index_node.cc

template <typename DataType>
expected<DataSetPtr>
MinHashLSHNode<DataType>::Search(const DataSetPtr dataset, std::unique_ptr<Config> cfg,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same. Add exception path

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread src/index/minhash/minhash_index_node.cc
Comment thread src/index/minhash/minhash_lsh_config.h Outdated
.description("hash code is in ann-rss or in mmap file.")
.set_default(true)
.for_deserialize();
KNOWHERE_CONFIG_DECLARE_FIELD(shared_bloom_filter)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to discuss, if the bloom filter is constructed in the load phase, it will occupy too much CPU?

Copy link
Copy Markdown
Collaborator Author

@cqy123456 cqy123456 Jun 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 cpu, no omp and thread pool

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The upper layer load behavior is concurrent

…minhash vector

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
Copy link
Copy Markdown
Collaborator

@foxspy foxspy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@sre-ci-robot
Copy link
Copy Markdown
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cqy123456, foxspy

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@foxspy
Copy link
Copy Markdown
Collaborator

foxspy commented Jun 6, 2025

/kind improvement

@sre-ci-robot sre-ci-robot merged commit cb19847 into zilliztech:main Jun 6, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants