fix: [2.6] guard facade APIs from exceptions by foxspy · Pull Request #1685 · zilliztech/knowhere

foxspy · 2026-06-18T09:42:04Z

issue: #1674

Summary

Add guarded facade calls so C++ exceptions are converted to error-code based results.
Mark public facade APIs as noexcept and preserve existing returned status values.
Add status-category helpers and unit coverage for error-code behavior.

Signed-off-by: xianliang.li <xianliang.li@zilliz.com> (cherry picked from commit 5d2a2fb)

sre-ci-robot · 2026-06-18T09:42:10Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: foxspy

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [foxspy]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

liliu-z · 2026-06-22T11:15:20Z

+            return std::invoke(std::forward<Func>(func), std::forward<Args>(args)...);
+        }
+    } catch (const std::bad_alloc& e) {
+        return detail::GuardedCallFailure<Result>(Status::malloc_error,


The failure-construction path here can itself allocate, so under memory pressure that allocation throws and the exception escapes the guard, reaching terminate — a hole in the exact no-throw guarantee this PR exists to provide. On OOM, a guarded call that should have returned a clean error status instead crashes the whole process. Make the failure path allocation-free / noexcept so it cannot throw.

liliu-z · 2026-06-22T11:15:21Z

+        // when index is mutable, it could happen that data count larger than bitset size, see
+        // https://github.com/zilliztech/knowhere/issues/70
+        // so something must be wrong at caller side when passed bitset size larger than data count
+        if (bitset_.size() > (size_t)this->Count()) {


Wrapping Count() in the guard collapses an internal failure to 0, which is then surfaced as invalid_args/input_error (same pattern in AnnIterator and RangeSearch). This inverts the error category — an inner/server-side failure is misreported as a user-input error, so downstream Milvus makes the wrong retry-vs-reject decision instead of seeing the real error. Propagate the underlying error and category rather than degrading the result to 0.

liliu-z · 2026-06-22T11:15:22Z

-        auto cfg = this->node->CreateConfig();
-        RETURN_IF_ERROR(LoadConfig(cfg.get(), json, knowhere::TRAIN, "Build"));
+Index<T>::BuildAsync(const DataSetPtr dataset, const Json& json, const std::chrono::seconds timeout) noexcept {
+    return GuardedCall([&]() -> std::shared_ptr<Interrupt> {


BuildAsync now returns GuardedCall([&]() -> std::shared_ptr<Interrupt> {...}) (Cardinal at index.cc:43, non-Cardinal at index.cc:70). Because the return type is a bare shared_ptr with no error channel, a throw from make_shared<Interrupt> (OOM) or pool->push (e.g. a shut-down pool) takes the default-constructible branch of GuardedFailure and yields a null pointer; pre-PR these propagated as exceptions. The usual caller pattern BuildAsync(...)->Get() then dereferences null and crashes, converting a catchable exception into a segfault. Return an error-carrying type, or confirm the Milvus caller null-checks the returned Interrupt and document that it can be null.

liliu-z · 2026-06-22T11:15:24Z

+        case knowhere::Status::type_conflict_in_json:
+        case knowhere::Status::invalid_metric_type:
+        case knowhere::Status::empty_index:
+        case knowhere::Status::not_implemented:


index_not_trained/index_already_trained map to input_error (:87-89) and timeout maps to inner_error (:106), and that category directly drives the retry-vs-reject decision. These are state/deadline conditions rather than obvious user-input, so confirm the mapping matches intended Milvus semantics — otherwise a not-trained or timeout outcome could be retried or rejected incorrectly. This is a verify-before-merge sign-off, not a confirmed bug.

liliu-z · 2026-06-22T11:15:25Z


 template <typename T>
-class expected {
+class KNOWHERE_NODISCARD expected {


Marking expected<T> [[nodiscard]] (via KNOWHERE_NODISCARD) is a real compile-surface change, and the in-repo test_diskann.cc fix shows discard sites exist. But knowhere's own build will not break: host C++ -Werror is commented out (compile_flags.cmake:19) so a discard is only a warning, and the GPU build suppresses host warnings via -Xcompiler=-w / --diag-suppress (compile_flags.cmake:31-33), so the cuVS -Werror=all-warnings path is unaffected. The only real risk is downstream Milvus if it compiles with -Werror — sweep it for discarded expected<T> returns before merge.

liliu-z · 2026-06-22T11:15:26Z

-    LOG_KNOWHERE_WARNING_ << "unhandled create config for indexType: " << indexType;
-    return std::make_unique<BaseConfig>();
+IndexStaticFaced<DataType>::CreateConfig(const IndexType& indexType, const IndexVersion& version) noexcept {
+    return GuardedCall([&]() -> std::unique_ptr<BaseConfig> {


IndexStaticFaced::CreateConfig is now noexcept and wraps its body in GuardedCall, so on any exception it returns a null unique_ptr<BaseConfig> (the default-constructible branch). In-repo callers are safe — they route the result through LoadStaticConfig, which this PR teaches to null-check — but the public static facade now hands nullptr to external (Milvus) callers that previously received an exception; any CreateConfig(...)->... dereference becomes a null-deref crash. This is the same null-sentinel hazard flagged for BuildAsync; return an error-carrying type or guarantee/document a non-null default.

fix: guard facade APIs from exceptions

157deba

Signed-off-by: xianliang.li <xianliang.li@zilliz.com> (cherry picked from commit 5d2a2fb)

foxspy added the kind/bug This PR is a bug fix label Jun 18, 2026

sre-ci-robot requested review from chasingegg and cqy123456 June 18, 2026 09:42

sre-ci-robot added approved size/XXL labels Jun 18, 2026

mergify Bot added the dco-passed label Jun 18, 2026

liliu-z reviewed Jun 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: [2.6] guard facade APIs from exceptions#1685

fix: [2.6] guard facade APIs from exceptions#1685
foxspy wants to merge 1 commit into
zilliztech:2.6from
foxspy:backport-pr-1675-to-2.6

foxspy commented Jun 18, 2026

Uh oh!

sre-ci-robot commented Jun 18, 2026

Uh oh!

liliu-z Jun 22, 2026

Uh oh!

liliu-z Jun 22, 2026

Uh oh!

liliu-z Jun 22, 2026

Uh oh!

liliu-z Jun 22, 2026

Uh oh!

liliu-z Jun 22, 2026

Uh oh!

liliu-z Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

foxspy commented Jun 18, 2026

Summary

Uh oh!

sre-ci-robot commented Jun 18, 2026

Uh oh!

liliu-z Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

liliu-z Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

liliu-z Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

liliu-z Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

liliu-z Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

liliu-z Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants