Skip to content

Commit 8903b31

Browse files
committed
feat(faiss): add INDEX_FAISS — vanilla faiss adapter IndexNode
Adds a new IndexNode registered as INDEX_FAISS = "FAISS" that acts as a thin adapter over upstream (vanilla) faiss's index_factory DSL. Users select the concrete faiss index via faiss_index_name; all other knobs are forwarded verbatim to faiss's own ParameterSpace (build) and per-family SearchParametersXxx (search), so any param faiss accepts works without per-index Knowhere wrapper code. Framework change: - BaseConfig::CaptureRawJson(const Json&) virtual hook (default no-op) called from LoadConfig between FormatAndCheck and Config::Load. Lets FaissConfig keep a snapshot of the JSON before Config::Load drops keys it doesn't recognize. Adapter (src/index/faiss/): - FaissConfig: only faiss_index_name is typed-declared; other JSON keys land in raw_params for forwarding. - faiss_dispatch: validates each raw key against faiss-owned whitelists (supported_build_param_names + quantizer_* prefix; supported_search_params per index family), coerces JSON values (accepts stringified numbers/booleans to match Knowhere's native FormatAndCheck leniency), and delegates family-specific field setters to the upstream-bound helper. - FaissIndexNode<DataType> template instantiated for fp32 and bin1. Implements Build, Search (+ BitsetView), Serialize / Deserialize (+ IO_FLAG_MMAP), capability-probed RangeSearch / GetVectorByIds / HasRawData, coarse Size() estimate. AnnIterator and CalcDistByIDs inherit not_implemented from the base class. fp16 / bf16 / int8 / sparse are not registered. Upstream-bound helper (thirdparty/faiss/faiss/cppcontrib/knowhere/): - SearchParamsDispatch.{h,cpp} — faiss-types-only API with make_search_params factory (recurses into PreTransform / Refine), try_set_search_param setter (walks nested wrappers, dispatches to IVF / HNSW / PQ / SVS Vamana), and whitelist queries (supported_search_params, is_supported_build_param). MIT-licensed; no nlohmann::json or Knowhere symbols leaked in. Candidate for a future upstream faiss PR. Behavior contract: - Unknown faiss knobs surface as invalid_args with the offending key in the error message (not silently dropped). - Stringified numeric / boolean values are coerced, matching Knowhere's native Config::FormatAndCheck leniency. - Concurrent searches use per-request SearchParametersXxx — no shared index state mutation. - SVS Vamana support is compiled in when FAISS_ENABLE_SVS is defined (e.g. x86 image builds); otherwise the SVS code paths are omitted cleanly. Tests (tests/ut/test_faiss_vanilla.cc, tag [faiss_vanilla]): 22 Catch2 cases / 136 assertions covering: config capture, factory creation, Flat / IVF / HNSW build+search, BitsetView filtering, serialize / deserialize roundtrip, range search capability probe, GetVectorByIds capability probe, binary BIVF path, OPQ+IVF+PQ (PreTransform recursion), IVF+Refine wrapper, standalone PQ, stringified-value coercion, error surfacing (invalid factory, typo, unknown family knob), Size() estimate, concurrent search isolation. FAISS_ENABLE_SVS builds pick up one additional SVS Vamana test. Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
1 parent d46b58c commit 8903b31

12 files changed

Lines changed: 1558 additions & 0 deletions

File tree

.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,3 +79,7 @@ graph_info.json
7979

8080
# Claude Code plans (local only)
8181
docs/plans/
82+
docs/superpowers/
83+
84+
# Test artifacts (Catch2 tests write serialized indexes into CWD)
85+
*.index

include/knowhere/comp/index_param.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ namespace IndexEnum {
2525
constexpr const char* INVALID = "";
2626

2727
constexpr const char* INDEX_FAISS_BIN_IDMAP = "BIN_FLAT";
28+
constexpr const char* INDEX_FAISS = "FAISS";
2829
constexpr const char* INDEX_FAISS_BIN_IVFFLAT = "BIN_IVF_FLAT";
2930

3031
constexpr const char* INDEX_FAISS_IDMAP = "FLAT";

include/knowhere/config.h

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -655,6 +655,15 @@ class BaseConfig : public Config {
655655
CFG_INT lemur_seed; // random seed for LEMUR
656656
CFG_INT lemur_num_layers; // number of layers in feature_extractor
657657
CFG_BOOL emb_list_rerank; // whether to perform MaxSim reranking after ANN search
658+
659+
/// Optional hook: runs after FormatAndCheck and before Config::Load consumes typed
660+
/// fields. Used by FaissConfig to capture the raw JSON verbatim for pass-through to
661+
/// faiss's ParameterSpace. Default is a no-op; do NOT override unless you need raw
662+
/// JSON (most configs should rely on KNOWHERE_CONFIG_DECLARE_FIELD).
663+
virtual void
664+
CaptureRawJson(const Json& /*json*/) {
665+
}
666+
658667
KNOHWERE_DECLARE_CONFIG(BaseConfig) {
659668
KNOWHERE_CONFIG_DECLARE_FIELD(dim).allow_empty_without_default().description("vector dim").for_train();
660669
KNOWHERE_CONFIG_DECLARE_FIELD(metric_type)

include/knowhere/index/index_table.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,9 @@ static std::set<std::pair<std::string, VecType>> legal_knowhere_index = {
2828
{IndexEnum::INDEX_FAISS_IDMAP, VecType::VECTOR_BFLOAT16},
2929
// {IndexEnum::INDEX_FAISS_IDMAP, VecType::VECTOR_INT8},
3030

31+
{IndexEnum::INDEX_FAISS, VecType::VECTOR_FLOAT},
32+
{IndexEnum::INDEX_FAISS, VecType::VECTOR_BINARY},
33+
3134
{IndexEnum::INDEX_FAISS_IVFFLAT, VecType::VECTOR_FLOAT},
3235
{IndexEnum::INDEX_FAISS_IVFFLAT, VecType::VECTOR_FLOAT16},
3336
{IndexEnum::INDEX_FAISS_IVFFLAT, VecType::VECTOR_BFLOAT16},

src/index/faiss/faiss.cc

Lines changed: 428 additions & 0 deletions
Large diffs are not rendered by default.

src/index/faiss/faiss_config.h

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
// Copyright (C) 2019-2026 Zilliz. All rights reserved.
2+
//
3+
// Licensed under the Apache License, Version 2.0 (the "License"); you may not use this
4+
// file except in compliance with the License. You may obtain a copy of the License at
5+
//
6+
// http://www.apache.org/licenses/LICENSE-2.0
7+
//
8+
// Unless required by applicable law or agreed to in writing, software distributed under
9+
// the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF
10+
// ANY KIND, either express or implied. See the License for the specific language
11+
// governing permissions and limitations under the License.
12+
13+
#pragma once
14+
15+
#include "knowhere/config.h"
16+
17+
namespace knowhere {
18+
19+
class FaissConfig : public BaseConfig {
20+
public:
21+
// Required. faiss DSL understood by faiss::index_factory (fp32) or
22+
// faiss::index_binary_factory (bin1). Examples: "Flat", "IVF1024,PQ16x8",
23+
// "HNSW32,Flat", "BIVF256,Hamming".
24+
CFG_STRING faiss_index_name;
25+
26+
// Captured subset of the incoming JSON: only keys that this config's __DICT__
27+
// does NOT declare (i.e. not owned by Knowhere's native config layer). Those are
28+
// the keys the vanilla faiss adapter forwards to faiss::ParameterSpace
29+
// (build) and per-family SearchParametersXxx (search). Declared keys (k,
30+
// metric_type, trace_id, faiss_index_name, ...) are consumed by Config::Load
31+
// into typed fields and therefore filtered out of raw_params at capture time.
32+
Json raw_params;
33+
34+
KNOHWERE_DECLARE_CONFIG(FaissConfig) {
35+
KNOWHERE_CONFIG_DECLARE_FIELD(faiss_index_name)
36+
.description("faiss factory string, e.g. \"IVF1024,PQ16x8\"")
37+
.allow_empty_without_default()
38+
.for_train()
39+
.for_deserialize()
40+
.for_deserialize_from_file();
41+
}
42+
43+
void
44+
CaptureRawJson(const Json& json) override {
45+
raw_params = Json::object();
46+
for (auto it = json.begin(); it != json.end(); ++it) {
47+
// Skip any key already declared as a typed field on BaseConfig or
48+
// FaissConfig — those are Knowhere's own and will be consumed by
49+
// Config::Load. Everything else is a faiss-bound knob we forward.
50+
if (__DICT__.count(it.key()) == 0) {
51+
raw_params[it.key()] = it.value();
52+
}
53+
}
54+
}
55+
};
56+
57+
} // namespace knowhere

src/index/faiss/faiss_dispatch.cc

Lines changed: 171 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,171 @@
1+
// Copyright (C) 2019-2026 Zilliz. All rights reserved.
2+
//
3+
// Licensed under the Apache License, Version 2.0 (the "License"); you may not use this
4+
// file except in compliance with the License. You may obtain a copy of the License at
5+
//
6+
// http://www.apache.org/licenses/LICENSE-2.0
7+
//
8+
// Unless required by applicable law or agreed to in writing, software distributed under
9+
// the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF
10+
// ANY KIND, either express or implied. See the License for the specific language
11+
// governing permissions and limitations under the License.
12+
13+
#include "index/faiss/faiss_dispatch.h"
14+
15+
#include <faiss/AutoTune.h>
16+
#include <faiss/Index.h>
17+
#include <faiss/IndexBinary.h>
18+
#include <faiss/cppcontrib/knowhere/SearchParamsDispatch.h>
19+
#include <faiss/impl/FaissException.h>
20+
#include <faiss/impl/IDSelector.h>
21+
22+
namespace knowhere::faiss_vanilla {
23+
24+
namespace {
25+
26+
// Coerce a json value into a double for faiss consumption. Accepts:
27+
// - numbers: e.g. 16, 16.0 -> 16.0
28+
// - booleans: true / false -> 1.0 / 0.0
29+
// - stringified numbers: "16" -> 16.0
30+
// - stringified booleans: "true" -> 1.0
31+
// Rejects arrays, objects, null, and unparseable strings. Matches the spirit of
32+
// knowhere::Config::FormatAndCheck's string-to-typed coercion for declared fields,
33+
// so forwarded keys behave consistently with native Knowhere keys.
34+
Status
35+
coerce_to_double(const Json& v, const std::string& key, double* out, std::string* err_msg) {
36+
if (v.is_number()) {
37+
*out = v.get<double>();
38+
return Status::success;
39+
}
40+
if (v.is_boolean()) {
41+
*out = v.get<bool>() ? 1.0 : 0.0;
42+
return Status::success;
43+
}
44+
if (v.is_string()) {
45+
const std::string s = v.get<std::string>();
46+
if (s == "true") {
47+
*out = 1.0;
48+
return Status::success;
49+
}
50+
if (s == "false") {
51+
*out = 0.0;
52+
return Status::success;
53+
}
54+
try {
55+
size_t pos = 0;
56+
double parsed = std::stod(s, &pos);
57+
if (pos == s.size()) {
58+
*out = parsed;
59+
return Status::success;
60+
}
61+
} catch (const std::invalid_argument&) {
62+
} catch (const std::out_of_range&) {
63+
}
64+
}
65+
if (err_msg) {
66+
*err_msg = "faiss vanilla: param '" + key + "' expects a number or boolean; got " + v.dump();
67+
}
68+
return Status::invalid_args;
69+
}
70+
71+
// Apply every key in raw_params to the faiss index. raw_params has already been
72+
// filtered by FaissConfig::CaptureRawJson to exclude keys owned by Knowhere's own
73+
// config layer (fields declared via KNOWHERE_CONFIG_DECLARE_FIELD). We pre-validate
74+
// the remaining keys against the faiss-owned whitelist (supported_build_param_names
75+
// + "quantizer_*" prefix handling) before calling ParameterSpace. A key that fails
76+
// the whitelist (typo, non-faiss param) is rejected with a clear error; a key that
77+
// passes the whitelist but is incompatible with the concrete index type (e.g.
78+
// nprobe on an HNSW) is still caught by ParameterSpace's exception and surfaced
79+
// as invalid_args.
80+
template <typename IndexT>
81+
Status
82+
apply_impl(IndexT* index, const Json& raw_params, std::string* err_msg) {
83+
::faiss::ParameterSpace ps;
84+
for (auto it = raw_params.begin(); it != raw_params.end(); ++it) {
85+
const std::string& key = it.key();
86+
if (!::faiss::cppcontrib::knowhere::is_supported_build_param(key)) {
87+
if (err_msg) {
88+
*err_msg = "faiss vanilla: build param '" + key + "' is not recognized";
89+
}
90+
return Status::invalid_args;
91+
}
92+
double val = 0.0;
93+
auto cst = coerce_to_double(it.value(), key, &val, err_msg);
94+
if (cst != Status::success) {
95+
return cst;
96+
}
97+
try {
98+
ps.set_index_parameter(index, key, val);
99+
} catch (const ::faiss::FaissException& e) {
100+
if (err_msg) {
101+
*err_msg = std::string("faiss rejected param '") + key + "': " + e.what();
102+
}
103+
return Status::invalid_args;
104+
}
105+
}
106+
return Status::success;
107+
}
108+
109+
// Shared logic for search-param builders. `index` can be faiss::Index* or IndexBinary*.
110+
// raw_params has already been filtered by FaissConfig::CaptureRawJson to contain only
111+
// keys NOT declared by Knowhere's typed config. Uses the faiss-owned whitelist
112+
// (supported_search_params) to validate remaining keys, and delegates both the
113+
// SearchParameters-family selection and the per-name field set to the upstream
114+
// helper. Knowhere layer only adds: (1) sel attach, (2) JSON->double conversion,
115+
// (3) clear error wording.
116+
template <typename IndexT>
117+
Status
118+
build_search_params_impl(const IndexT* index, const Json& raw_params, ::faiss::IDSelector* sel,
119+
std::unique_ptr<::faiss::SearchParameters>* out, std::string* err_msg) {
120+
auto params = ::faiss::cppcontrib::knowhere::make_search_params(index);
121+
params->sel = sel;
122+
123+
const auto supported = ::faiss::cppcontrib::knowhere::supported_search_params(index);
124+
for (auto it = raw_params.begin(); it != raw_params.end(); ++it) {
125+
const std::string& key = it.key();
126+
if (!supported.count(key)) {
127+
if (err_msg) {
128+
*err_msg = "faiss vanilla: search param '" + key + "' not supported for this index family";
129+
}
130+
return Status::invalid_args;
131+
}
132+
double val = 0.0;
133+
auto cst = coerce_to_double(it.value(), key, &val, err_msg);
134+
if (cst != Status::success) {
135+
return cst;
136+
}
137+
// Whitelist already guarantees try_set_search_param returns true; treat a
138+
// false here as an invariant breach rather than user error.
139+
(void)::faiss::cppcontrib::knowhere::try_set_search_param(params.get(), key, val);
140+
}
141+
*out = std::move(params);
142+
return Status::success;
143+
}
144+
145+
} // namespace
146+
147+
Status
148+
apply_build_params(::faiss::Index* index, const Json& raw_params, std::string* err_msg) {
149+
return apply_impl(index, raw_params, err_msg);
150+
}
151+
152+
Status
153+
apply_build_params(::faiss::IndexBinary* index, const Json& raw_params, std::string* err_msg) {
154+
return apply_impl(index, raw_params, err_msg);
155+
}
156+
157+
Status
158+
build_search_params(const ::faiss::Index* index, const Json& raw_params, ::faiss::IDSelector* sel,
159+
std::unique_ptr<::faiss::SearchParameters>* out, std::string* err_msg) {
160+
return build_search_params_impl(index, raw_params, sel, out, err_msg);
161+
}
162+
163+
Status
164+
build_search_params(const ::faiss::IndexBinary* index, const Json& raw_params, ::faiss::IDSelector* sel,
165+
std::unique_ptr<::faiss::SearchParameters>* out, std::string* err_msg) {
166+
// IndexBinaryIVF requires SearchParametersIVF; binary side also does not honor
167+
// IDSelector, so attaching sel here is typically a no-op at search time.
168+
return build_search_params_impl(index, raw_params, sel, out, err_msg);
169+
}
170+
171+
} // namespace knowhere::faiss_vanilla

src/index/faiss/faiss_dispatch.h

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
// Copyright (C) 2019-2026 Zilliz. All rights reserved.
2+
//
3+
// Licensed under the Apache License, Version 2.0 (the "License"); you may not use this
4+
// file except in compliance with the License. You may obtain a copy of the License at
5+
//
6+
// http://www.apache.org/licenses/LICENSE-2.0
7+
//
8+
// Unless required by applicable law or agreed to in writing, software distributed under
9+
// the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF
10+
// ANY KIND, either express or implied. See the License for the specific language
11+
// governing permissions and limitations under the License.
12+
13+
#pragma once
14+
15+
#include <memory>
16+
#include <string>
17+
18+
#include "knowhere/config.h"
19+
20+
namespace faiss {
21+
struct Index;
22+
struct IndexBinary;
23+
struct IDSelector;
24+
struct SearchParameters;
25+
} // namespace faiss
26+
27+
namespace knowhere::faiss_vanilla {
28+
29+
// Forwards keys from raw_params to faiss::ParameterSpace::set_index_parameter
30+
// on the given index. Converts faiss exceptions into Status::invalid_args with the
31+
// faiss message in *err_msg.
32+
Status
33+
apply_build_params(::faiss::Index* index, const Json& raw_params, std::string* err_msg);
34+
35+
Status
36+
apply_build_params(::faiss::IndexBinary* index, const Json& raw_params, std::string* err_msg);
37+
38+
// Build a per-request SearchParameters* appropriate for the concrete faiss index
39+
// family. The family dispatch itself lives in faiss::cppcontrib::knowhere (upstream-
40+
// bound helper); this wrapper adds: (1) sel assignment, (2) framework-key filtering,
41+
// (3) JSON value extraction + unknown-key error surfacing.
42+
Status
43+
build_search_params(const ::faiss::Index* index, const Json& raw_params, ::faiss::IDSelector* sel,
44+
std::unique_ptr<::faiss::SearchParameters>* out, std::string* err_msg);
45+
46+
Status
47+
build_search_params(const ::faiss::IndexBinary* index, const Json& raw_params, ::faiss::IDSelector* sel,
48+
std::unique_ptr<::faiss::SearchParameters>* out, std::string* err_msg);
49+
50+
} // namespace knowhere::faiss_vanilla

src/index/index.cc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@ LoadConfig(BaseConfig* cfg, const Json& json, knowhere::PARAM_TYPE param_type, c
3333
auto res = Config::FormatAndCheck(*cfg, json_, msg);
3434
LOG_KNOWHERE_DEBUG_ << method << " config dump: " << json_.dump();
3535
RETURN_IF_ERROR(res);
36+
cfg->CaptureRawJson(json_);
3637
return Config::Load(*cfg, json_, param_type, msg);
3738
}
3839

0 commit comments

Comments
 (0)