feat: add MCI hybrid workflow support#2087
Conversation
- add the MCI index, HGraph hybrid overlay, and external KNNG import path - add eval/export tooling plus benchmark configs for filtered MCI comparisons - document MCI in English and Chinese and add a runnable hybrid example Signed-off-by: zhuangye.yxw <2510035537@qq.com> Assisted-by: GitHub Copilot:GPT-5.4
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🟢 Require kind labelWonderful, this rule succeeded.
🟢 Require version labelWonderful, this rule succeeded.
|
There was a problem hiding this comment.
Code Review
This pull request introduces the MCI index, a dense-vector index utilizing maximal-clique candidate structures and an optional HGraph hybrid overlay for filtered searches. The implementation includes the core algorithm, parameter handling, comprehensive documentation, and benchmark configurations. Feedback identifies a duplicated key in a benchmark YAML file and suggests using YAML anchors to manage configuration duplication. Additionally, the reviewer recommends defining variables in documentation code snippets for clarity, replacing std::getenv and std::cerr with more robust configuration and logging mechanisms, allowing a limited_size of zero in range searches, and refactoring duplicated result-set creation logic into a helper function.
There was a problem hiding this comment.
Pull request overview
Adds the new MCI dense-vector index type (with optional hybrid routing to an external HGraph for broad filtered searches), plus evaluation tooling/docs/examples to support benchmarking and adoption across the repo.
Changes:
- Introduces the MCI index implementation, parameter parsing, factory registration, and unit tests.
- Updates the eval pipeline for filtered searches that may return fewer than
topk, and adds a KNNG export tool. - Adds C++ example and English/Chinese documentation + benchmark YAML presets for MCI and MCI/HGraph hybrid workflows.
Reviewed changes
Copilot reviewed 47 out of 47 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/eval/monitor/recall_monitor.cpp | Updates recall computation to handle filtered results returning fewer than topk. |
| tools/eval/export_knng.cpp | Adds a CLI tool to export a fixed-width KNNG binary by running HGraph searches over base vectors. |
| tools/eval/CMakeLists.txt | Builds and links the new export_knng executable and ensures OpenMP flags are applied. |
| tools/eval/case/search_eval_case.h | Adds cached valid-id lists to support Filter::GetValidIds() in eval runs. |
| tools/eval/case/search_eval_case.cpp | Enables parallel filtered KNN eval and supplies result count + valid-id hints to monitors/filters. |
| src/inner_string_params.h | Adds internal string constant for mci index type. |
| src/factory/index_creators.cpp | Wires MCI into the index factory registration. |
| src/factory/factory_test.cpp | Adds factory coverage to create an MCI index from full parameters. |
| src/constants.cpp | Exposes INDEX_MCI constant for API/config usage. |
| src/algorithm/mci.h | Declares the MCI index class and its hybrid-search hooks. |
| src/algorithm/mci.cpp | Implements MCI build/search/serialize/deserialize, clique enumeration, and hybrid routing to HGraph. |
| src/algorithm/mci_test.cpp | Adds unit tests for MCI build, filtered search, RabitQ one-bit mode, KNNG import, and hybrid overlay. |
| src/algorithm/mci_parameter.h | Defines MCI build/search parameter schemas (including hybrid overlay settings). |
| src/algorithm/mci_parameter.cpp | Implements MCI parameter parsing, validation, JSON conversion, and search param parsing. |
| src/algorithm/mci_parameter_test.cpp | Adds parameter round-trip and compatibility tests for MCI (including hybrid settings). |
| include/vsag/index.h | Extends public IndexType enum with MCI. |
| include/vsag/constants.h | Declares INDEX_MCI in the public constants header. |
| examples/cpp/CMakeLists.txt | Builds the new runnable example for MCI hybrid filtered search. |
| examples/cpp/322_feature_mci_hybrid_filter.cpp | Demonstrates exporting KNNG from HGraph, building MCI, and running hybrid filtered queries routed to HGraph. |
| docs/docs/zh/src/SUMMARY.md | Adds MCI page to Chinese docs navigation. |
| docs/docs/zh/src/resources/index_parameters.md | Documents MCI build/search parameters in Chinese. |
| docs/docs/zh/src/indexes/README.md | Adds MCI to Chinese index overview table. |
| docs/docs/zh/src/indexes/mci.md | Adds Chinese MCI documentation page (incl. hybrid overlay and KNNG format). |
| docs/docs/zh/src/guide/create_index.md | Lists mci in Chinese create-index guide. |
| docs/docs/en/src/SUMMARY.md | Adds MCI page to English docs navigation. |
| docs/docs/en/src/resources/index_parameters.md | Documents MCI build/search parameters in English. |
| docs/docs/en/src/indexes/README.md | Adds MCI to English index overview table. |
| docs/docs/en/src/indexes/mci.md | Adds English MCI documentation page (incl. hybrid overlay and KNNG format). |
| docs/docs/en/src/guide/create_index.md | Lists mci in English create-index guide. |
| benchs/datasets/wufufilter_5m_mci_self_build.yml | Adds benchmark preset for self-built MCI on WUFUFILTER 5M. |
| benchs/datasets/wufufilter_5m_mci_self_build_sq8.yml | Adds benchmark preset variant using SQ8 base quantization. |
| benchs/datasets/wufufilter_5m_mci_self_build_search_sweep.yml | Adds WUFUFILTER 5M MCI search sweep presets. |
| benchs/datasets/wufufilter_5m_mci_hgraph_knng_search_sweep.yml | Adds sweep presets for MCI built from an HGraph-derived KNNG. |
| benchs/datasets/wufufilter_5m_mci_hgraph_knng_build.yml | Adds build+search preset for MCI using an HGraph-derived KNNG. |
| benchs/datasets/wufufilter_5m_mci_hgraph_hybrid_search_sweep.yml | Adds sweep presets for hybrid MCI/HGraph routing experiments. |
| benchs/datasets/wufufilter_5m_hgraph_mci_hybrid_compare_search.yml | Adds side-by-side comparison presets (HGraph vs MCI vs hybrid thresholds). |
| benchs/datasets/wufufilter_5m_hgraph_build.yml | Adds baseline HGraph build preset for WUFUFILTER 5M. |
| benchs/datasets/wufufilter_5m_filtered_hgraph_search_sweep.yml | Adds filtered-search sweep presets for HGraph on WUFUFILTER 5M. |
| benchs/datasets/gist1m_sq8_uniform_pure_baseline.yml | Adds baseline gist1m SQ8-uniform HGraph benchmark presets. |
| benchs/datasets/gist1m_sq8_pure_baseline.yml | Adds baseline gist1m SQ8 HGraph benchmark presets. |
| benchs/datasets/gist1m_base_quantization_sq8_vs_rabitq_build_search.yml | Adds gist1m build+search comparison presets (SQ8 vs RabitQ). |
| benchs/datasets/codefilter_3m_mci_self_build.yml | Adds codefilter 3M MCI benchmark preset(s). |
| benchs/datasets/codefilter_3m_mci_self_build_search_sweep.yml | Adds codefilter 3M MCI search sweep presets. |
| benchs/datasets/codefilter_3m_ivf_build.yml | Adds codefilter 3M IVF build preset for comparison. |
| benchs/datasets/codefilter_3m_hgraph_mci_hybrid_compare_search.yml | Adds codefilter 3M comparison presets across HGraph/MCI/hybrid. |
| benchs/datasets/codefilter_3m_hgraph_build.yml | Adds baseline HGraph build preset for codefilter 3M. |
| benchs/datasets/codefilter_3m_filtered_search_sweep.yml | Adds filtered-search sweep reference presets for codefilter 3M. |
PR review fixes:
- docs (en/zh) mci.md: add missing variable definitions (n, ids, data, q)
to the example snippets so they are self-contained and compilable.
- src/algorithm/mci.cpp: replace std::getenv("VSAG_MCI_BUILD_STATS") and
std::cerr instrumentation with logger::debug so the standard logger
level controls verbosity; relax the limited_size != 0 check; extract
the duplicated heap-to-dataset construction into
MCI::build_dataset_from_heap helper shared by KNN and range search.
- src/algorithm/mci.{h,cpp,_parameter.cpp}: drop the redundant
hgraph_serialized_size field from Serialize/Deserialize; hybrid loading
now relies solely on hgraph_index_path. MCIParameter::ToJson now
serializes hgraph_index_path so JSON round-trips are complete.
- tools/eval/monitor/recall_monitor.cpp: align the result-tuple cast with
SearchEvalCase's actual std::tuple<const int64_t*, ...> type, removing
an undefined-behavior reinterpret_cast.
Tests:
- tests/test_mci.cpp: new functional test suite covering build/search
across metrics and base quantization types, filter search, reorder
(rabitq one-bit), concurrent KNN, the three serialization paths
(binary set / reader set / file), the hybrid overlay path with a
decoupled hgraph index loaded via hgraph_index_path, and a
RandomAllocator robustness case.
Signed-off-by: zhuangye.yxw <2510035537@qq.com>
Assisted-by: CodeFuse:claude-sonnet-4.5
71bfab0 to
73f1641
Compare
Resolve conflict in tools/eval/CMakeLists.txt by keeping both blocks
(PR-side `export_knng` executable and main-side `eval_dataset_test`).
Update MCI includes to reflect hgraph relocation on main:
- src/algorithm/mci.cpp: `hgraph.h` -> `algorithm/hgraph/hgraph.h`
- src/algorithm/mci_parameter.h: `hgraph_parameter.h`
-> `algorithm/hgraph/hgraph_parameter.h`
Verified: full build clean (vsag, eval_performance, eval_dataset_test,
functests, unittests). `build/tests/functests "[mci]"` passes 21173
assertions across 7 test cases.
Signed-off-by: zhuangye.yxw <2510035537@qq.com>
Assisted-by: CodeFuse:claude-sonnet-4.5
clang-format-15 noted layout drift in four files introduced by the MCI feature; this commit applies the formatter output verbatim so the "check_format" CI job stops failing. - examples/cpp/322_feature_mci_hybrid_filter.cpp - src/algorithm/mci.cpp - src/algorithm/mci_test.cpp (also: append trailing newline) - tools/eval/export_knng.cpp Signed-off-by: zhuangye.yxw <2510035537@qq.com> Assisted-by: CodeFuse:claude-sonnet-4.5
Change Type
Linked Issue
N/A (
kind/improvement)What Changed
tools/eval/export_knng.cpp, benchmark YAMLs, and evaluator fixes for filtered searches that return fewer thantopkids.examples/cpp/322_feature_mci_hybrid_filter.cpp.Test Evidence
make fmtmake lintmake testmake cov, run tests, and collect coverageTest details:
Compatibility Impact
mciindex type and related constants.hgraph_index_path; eval recall handling now tolerates filtered searches that return fewer thantopkresults.Performance and Concurrency Impact
Documentation Impact
README.mdDEVELOPMENT.mdCONTRIBUTING.mddocs/docs/en/src/indexes/mci.md,docs/docs/zh/src/indexes/mci.md, related nav/guide/parameter pagesRisk and Rollback
0cd95c72.Checklist
kind/bugandkind/feature; see "Linked Issue" above)[skip ci]prefix)