Skip to content

feat: add MCI hybrid workflow support#2087

Open
LightWant wants to merge 4 commits into
antgroup:mainfrom
LightWant:feat/filter-MCI
Open

feat: add MCI hybrid workflow support#2087
LightWant wants to merge 4 commits into
antgroup:mainfrom
LightWant:feat/filter-MCI

Conversation

@LightWant
Copy link
Copy Markdown
Collaborator

Change Type

  • Bug fix
  • New feature
  • Improvement/Refactor
  • Documentation
  • CI/Build/Infra

Linked Issue

N/A (kind/improvement)

What Changed

  • Added the MCI index implementation, parameter parsing, factory wiring, and unit tests.
  • Added decoupled HGraph hybrid routing so MCI can load an external HGraph index for broad filtered searches.
  • Added tools/eval/export_knng.cpp, benchmark YAMLs, and evaluator fixes for filtered searches that return fewer than topk ids.
  • Added the runnable C++ example examples/cpp/322_feature_mci_hybrid_filter.cpp.
  • Added English and Chinese documentation pages for MCI and linked them from the docs navigation and parameter guide.

Test Evidence

  • make fmt
  • make lint
  • make test
  • make cov, run tests, and collect coverage
  • Other (describe below)

Test details:

cmake --build /root/vsag/build --target unittests 322_feature_mci_hybrid_filter -j2
/root/vsag/build/examples/cpp/322_feature_mci_hybrid_filter
  - Hybrid route: "hgraph"
  - Hybrid valid ratio: 0.5
  - Filtered results: 96, 97, 98, 99, 100

Compatibility Impact

  • API/ABI compatibility: adds the public mci index type and related constants.
  • Behavior changes: MCI hybrid loads HGraph externally via hgraph_index_path; eval recall handling now tolerates filtered searches that return fewer than topk results.

Performance and Concurrency Impact

  • Performance impact: improved flexibility for filtered search routing; no broad regression claim made in this PR.
  • Concurrency/thread-safety impact: none intended.

Documentation Impact

  • No docs update needed
  • Updated docs:
    • README.md
    • DEVELOPMENT.md
    • CONTRIBUTING.md
    • Other: docs/docs/en/src/indexes/mci.md, docs/docs/zh/src/indexes/mci.md, related nav/guide/parameter pages

Risk and Rollback

  • Risk level: medium
  • Rollback plan: revert commit 0cd95c72.

Checklist

  • I have linked the relevant issue (required for kind/bug and kind/feature; see "Linked Issue" above)
  • I have added/updated tests for new behavior or bug fixes
  • I have considered API compatibility impact
  • I have updated docs if behavior/workflow changed
  • My commit messages follow project conventions (Conventional Commits, optional [skip ci] prefix)

- add the MCI index, HGraph hybrid overlay, and external KNNG import path
- add eval/export tooling plus benchmark configs for filtered MCI comparisons
- document MCI in English and Chinese and add a runnable hybrid example

Signed-off-by: zhuangye.yxw <2510035537@qq.com>
Assisted-by: GitHub Copilot:GPT-5.4
Copilot AI review requested due to automatic review settings May 20, 2026 07:16
@LightWant LightWant added kind/improvement Code improvements (variable/function renaming, refactoring, etc. ) version/0.18 labels May 20, 2026
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented May 20, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Require kind label

Wonderful, this rule succeeded.
  • label~=^kind/

🟢 Require version label

Wonderful, this rule succeeded.
  • label~=^version/

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the MCI index, a dense-vector index utilizing maximal-clique candidate structures and an optional HGraph hybrid overlay for filtered searches. The implementation includes the core algorithm, parameter handling, comprehensive documentation, and benchmark configurations. Feedback identifies a duplicated key in a benchmark YAML file and suggests using YAML anchors to manage configuration duplication. Additionally, the reviewer recommends defining variables in documentation code snippets for clarity, replacing std::getenv and std::cerr with more robust configuration and logging mechanisms, allowing a limited_size of zero in range searches, and refactoring duplicated result-set creation logic into a helper function.

Comment thread benchs/datasets/wufufilter_5m_mci_self_build_search_sweep.yml Outdated
Comment thread benchs/datasets/codefilter_3m_hgraph_mci_hybrid_compare_search.yml Outdated
Comment thread docs/docs/en/src/indexes/mci.md
Comment thread docs/docs/en/src/indexes/mci.md Outdated
Comment thread docs/docs/zh/src/indexes/mci.md
Comment thread docs/docs/zh/src/indexes/mci.md Outdated
Comment thread src/algorithm/mci.cpp Outdated
Comment thread src/algorithm/mci.cpp Outdated
Comment thread src/algorithm/mci.cpp Outdated
Comment thread src/algorithm/mci.cpp Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds the new MCI dense-vector index type (with optional hybrid routing to an external HGraph for broad filtered searches), plus evaluation tooling/docs/examples to support benchmarking and adoption across the repo.

Changes:

  • Introduces the MCI index implementation, parameter parsing, factory registration, and unit tests.
  • Updates the eval pipeline for filtered searches that may return fewer than topk, and adds a KNNG export tool.
  • Adds C++ example and English/Chinese documentation + benchmark YAML presets for MCI and MCI/HGraph hybrid workflows.

Reviewed changes

Copilot reviewed 47 out of 47 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tools/eval/monitor/recall_monitor.cpp Updates recall computation to handle filtered results returning fewer than topk.
tools/eval/export_knng.cpp Adds a CLI tool to export a fixed-width KNNG binary by running HGraph searches over base vectors.
tools/eval/CMakeLists.txt Builds and links the new export_knng executable and ensures OpenMP flags are applied.
tools/eval/case/search_eval_case.h Adds cached valid-id lists to support Filter::GetValidIds() in eval runs.
tools/eval/case/search_eval_case.cpp Enables parallel filtered KNN eval and supplies result count + valid-id hints to monitors/filters.
src/inner_string_params.h Adds internal string constant for mci index type.
src/factory/index_creators.cpp Wires MCI into the index factory registration.
src/factory/factory_test.cpp Adds factory coverage to create an MCI index from full parameters.
src/constants.cpp Exposes INDEX_MCI constant for API/config usage.
src/algorithm/mci.h Declares the MCI index class and its hybrid-search hooks.
src/algorithm/mci.cpp Implements MCI build/search/serialize/deserialize, clique enumeration, and hybrid routing to HGraph.
src/algorithm/mci_test.cpp Adds unit tests for MCI build, filtered search, RabitQ one-bit mode, KNNG import, and hybrid overlay.
src/algorithm/mci_parameter.h Defines MCI build/search parameter schemas (including hybrid overlay settings).
src/algorithm/mci_parameter.cpp Implements MCI parameter parsing, validation, JSON conversion, and search param parsing.
src/algorithm/mci_parameter_test.cpp Adds parameter round-trip and compatibility tests for MCI (including hybrid settings).
include/vsag/index.h Extends public IndexType enum with MCI.
include/vsag/constants.h Declares INDEX_MCI in the public constants header.
examples/cpp/CMakeLists.txt Builds the new runnable example for MCI hybrid filtered search.
examples/cpp/322_feature_mci_hybrid_filter.cpp Demonstrates exporting KNNG from HGraph, building MCI, and running hybrid filtered queries routed to HGraph.
docs/docs/zh/src/SUMMARY.md Adds MCI page to Chinese docs navigation.
docs/docs/zh/src/resources/index_parameters.md Documents MCI build/search parameters in Chinese.
docs/docs/zh/src/indexes/README.md Adds MCI to Chinese index overview table.
docs/docs/zh/src/indexes/mci.md Adds Chinese MCI documentation page (incl. hybrid overlay and KNNG format).
docs/docs/zh/src/guide/create_index.md Lists mci in Chinese create-index guide.
docs/docs/en/src/SUMMARY.md Adds MCI page to English docs navigation.
docs/docs/en/src/resources/index_parameters.md Documents MCI build/search parameters in English.
docs/docs/en/src/indexes/README.md Adds MCI to English index overview table.
docs/docs/en/src/indexes/mci.md Adds English MCI documentation page (incl. hybrid overlay and KNNG format).
docs/docs/en/src/guide/create_index.md Lists mci in English create-index guide.
benchs/datasets/wufufilter_5m_mci_self_build.yml Adds benchmark preset for self-built MCI on WUFUFILTER 5M.
benchs/datasets/wufufilter_5m_mci_self_build_sq8.yml Adds benchmark preset variant using SQ8 base quantization.
benchs/datasets/wufufilter_5m_mci_self_build_search_sweep.yml Adds WUFUFILTER 5M MCI search sweep presets.
benchs/datasets/wufufilter_5m_mci_hgraph_knng_search_sweep.yml Adds sweep presets for MCI built from an HGraph-derived KNNG.
benchs/datasets/wufufilter_5m_mci_hgraph_knng_build.yml Adds build+search preset for MCI using an HGraph-derived KNNG.
benchs/datasets/wufufilter_5m_mci_hgraph_hybrid_search_sweep.yml Adds sweep presets for hybrid MCI/HGraph routing experiments.
benchs/datasets/wufufilter_5m_hgraph_mci_hybrid_compare_search.yml Adds side-by-side comparison presets (HGraph vs MCI vs hybrid thresholds).
benchs/datasets/wufufilter_5m_hgraph_build.yml Adds baseline HGraph build preset for WUFUFILTER 5M.
benchs/datasets/wufufilter_5m_filtered_hgraph_search_sweep.yml Adds filtered-search sweep presets for HGraph on WUFUFILTER 5M.
benchs/datasets/gist1m_sq8_uniform_pure_baseline.yml Adds baseline gist1m SQ8-uniform HGraph benchmark presets.
benchs/datasets/gist1m_sq8_pure_baseline.yml Adds baseline gist1m SQ8 HGraph benchmark presets.
benchs/datasets/gist1m_base_quantization_sq8_vs_rabitq_build_search.yml Adds gist1m build+search comparison presets (SQ8 vs RabitQ).
benchs/datasets/codefilter_3m_mci_self_build.yml Adds codefilter 3M MCI benchmark preset(s).
benchs/datasets/codefilter_3m_mci_self_build_search_sweep.yml Adds codefilter 3M MCI search sweep presets.
benchs/datasets/codefilter_3m_ivf_build.yml Adds codefilter 3M IVF build preset for comparison.
benchs/datasets/codefilter_3m_hgraph_mci_hybrid_compare_search.yml Adds codefilter 3M comparison presets across HGraph/MCI/hybrid.
benchs/datasets/codefilter_3m_hgraph_build.yml Adds baseline HGraph build preset for codefilter 3M.
benchs/datasets/codefilter_3m_filtered_search_sweep.yml Adds filtered-search sweep reference presets for codefilter 3M.

Comment thread tools/eval/monitor/recall_monitor.cpp
Comment thread src/algorithm/mci.cpp
Comment thread src/algorithm/mci_parameter.cpp
Comment thread benchs/datasets/wufufilter_5m_mci_self_build_search_sweep.yml Outdated
PR review fixes:
- docs (en/zh) mci.md: add missing variable definitions (n, ids, data, q)
  to the example snippets so they are self-contained and compilable.
- src/algorithm/mci.cpp: replace std::getenv("VSAG_MCI_BUILD_STATS") and
  std::cerr instrumentation with logger::debug so the standard logger
  level controls verbosity; relax the limited_size != 0 check; extract
  the duplicated heap-to-dataset construction into
  MCI::build_dataset_from_heap helper shared by KNN and range search.
- src/algorithm/mci.{h,cpp,_parameter.cpp}: drop the redundant
  hgraph_serialized_size field from Serialize/Deserialize; hybrid loading
  now relies solely on hgraph_index_path. MCIParameter::ToJson now
  serializes hgraph_index_path so JSON round-trips are complete.
- tools/eval/monitor/recall_monitor.cpp: align the result-tuple cast with
  SearchEvalCase's actual std::tuple<const int64_t*, ...> type, removing
  an undefined-behavior reinterpret_cast.

Tests:
- tests/test_mci.cpp: new functional test suite covering build/search
  across metrics and base quantization types, filter search, reorder
  (rabitq one-bit), concurrent KNN, the three serialization paths
  (binary set / reader set / file), the hybrid overlay path with a
  decoupled hgraph index loaded via hgraph_index_path, and a
  RandomAllocator robustness case.

Signed-off-by: zhuangye.yxw <2510035537@qq.com>
Assisted-by: CodeFuse:claude-sonnet-4.5
Copilot AI review requested due to automatic review settings May 30, 2026 06:41
@LightWant LightWant force-pushed the feat/filter-MCI branch 2 times, most recently from 71bfab0 to 73f1641 Compare May 30, 2026 07:53
LightWant added 2 commits May 31, 2026 07:17
Resolve conflict in tools/eval/CMakeLists.txt by keeping both blocks
(PR-side `export_knng` executable and main-side `eval_dataset_test`).

Update MCI includes to reflect hgraph relocation on main:
- src/algorithm/mci.cpp: `hgraph.h` -> `algorithm/hgraph/hgraph.h`
- src/algorithm/mci_parameter.h: `hgraph_parameter.h`
    -> `algorithm/hgraph/hgraph_parameter.h`

Verified: full build clean (vsag, eval_performance, eval_dataset_test,
functests, unittests). `build/tests/functests "[mci]"` passes 21173
assertions across 7 test cases.

Signed-off-by: zhuangye.yxw <2510035537@qq.com>
Assisted-by: CodeFuse:claude-sonnet-4.5
clang-format-15 noted layout drift in four files introduced by the MCI
feature; this commit applies the formatter output verbatim so the
"check_format" CI job stops failing.

- examples/cpp/322_feature_mci_hybrid_filter.cpp
- src/algorithm/mci.cpp
- src/algorithm/mci_test.cpp  (also: append trailing newline)
- tools/eval/export_knng.cpp

Signed-off-by: zhuangye.yxw <2510035537@qq.com>
Assisted-by: CodeFuse:claude-sonnet-4.5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants